[ 
https://issues.apache.org/jira/browse/LUCENE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006730#comment-17006730
 ] 

Adrien Grand commented on LUCENE-9113:
--------------------------------------

I indexed the names of all locations in GeoNames's allCountries.txt files in a 
SORTED field, turned on the info stream and forced merge in the end, repeated 3 
times. Without the patch:
{noformat}
$ grep "to merge doc values" infostream.txt
SM 0 [2020-01-02T10:48:35.245544Z; Lucene Merge Thread #0]: 5271 msec to merge 
doc values [6940171 docs]
SM 0 [2020-01-02T10:48:40.080066Z; Lucene Merge Thread #1]: 4802 msec to merge 
doc values [8537845 docs]
SM 1 [2020-01-02T10:48:58.827231Z; Lucene Merge Thread #0]: 5186 msec to merge 
doc values [6940171 docs]
SM 1 [2020-01-02T10:49:03.463976Z; Lucene Merge Thread #1]: 4614 msec to merge 
doc values [8537845 docs]
SM 2 [2020-01-02T10:49:22.077466Z; Lucene Merge Thread #0]: 5191 msec to merge 
doc values [6940171 docs]
SM 2 [2020-01-02T10:49:26.684538Z; Lucene Merge Thread #1]: 4589 msec to merge 
doc values [8537845 docs]
{noformat}

With the patch:
{noformat}
$ grep "to merge doc values" infostream.txt
SM 0 [2020-01-02T10:46:54.743489Z; Lucene Merge Thread #0]: 4314 msec to merge 
doc values [6940171 docs]
SM 0 [2020-01-02T10:46:56.988413Z; Lucene Merge Thread #1]: 2208 msec to merge 
doc values [8537845 docs]
SM 1 [2020-01-02T10:47:14.433368Z; Lucene Merge Thread #0]: 4206 msec to merge 
doc values [6940171 docs]
SM 1 [2020-01-02T10:47:16.589024Z; Lucene Merge Thread #1]: 2136 msec to merge 
doc values [8537845 docs]
SM 2 [2020-01-02T10:47:33.942020Z; Lucene Merge Thread #0]: 4134 msec to merge 
doc values [6940171 docs]
SM 2 [2020-01-02T10:47:36.134355Z; Lucene Merge Thread #1]: 2174 msec to merge 
doc values [8537845 docs]
{noformat}

> Speed up merging doc values terms dictionaries
> ----------------------------------------------
>
>                 Key: LUCENE-9113
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9113
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The default {{DocValuesConsumer#mergeSortedField}} and 
> {{DocValuesConsumer#mergeSortedSetField}} implementations create a merged 
> view of the doc values producers to merge. Unfortunately, it doesn't override 
> {{termsEnum()}}, whose default implementation of {{next()}} increments the 
> ordinal and calls {{lookupOrd()}} to retrieve the term. Currently, 
> {{lookupOrd()}} doesn't take advantage of its current position, and would 
> seek to the block start and then call {{next()}} up to 16 times to go to the 
> desired term. While there are discussions to optimize lookups to take 
> advantage of the current ord (LUCENE-8836), it shouldn't be required for 
> merging to be efficient and we should instead make {{next()}} call {{next()}} 
> on its sub enums.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to