[ 
https://issues.apache.org/jira/browse/LUCENE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014081#comment-15014081
 ] 

Michael McCandless commented on LUCENE-6901:
--------------------------------------------

OK for the 2D case this patch brings indexing time from 737.1 sec (trunk) to 
441.5 sec (this patch), which is nice :)

Note that the test is entirely single threaded: one indexing thread, 
SerialMergeScheduler.

Trying {{TimSorter}} next ...

> Optimize 1D dimensional value indexing
> --------------------------------------
>
>                 Key: LUCENE-6901
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6901
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk
>
>         Attachments: LUCENE-6901.patch
>
>
> Dimensional values give a smaller index, and faster search times, for 
> indexing ordered byte[] values across one or more dimensions, vs our existing 
> approaches, but the indexing time is substantially slower.
> Since the 1D case is so important/common (numeric fields, range query) I 
> think it's worth optimizing its indexing time.  It should also be possible to 
> optimize the N > 1 dimensions case too, but it's more complex ... we can 
> postpone that.
> So for the 1D case, I changed the merge method to do a merge sort (like 
> postings) of the already sorted segments dimensional values, instead of 
> simply re-indexing all values from the incoming segments, and this was a big 
> speedup.
> I also changed from {{InPlaceMergeSorter}} to {{IntroSorter}} (this is what 
> postings use, and it's faster but still safe) and this was another good 
> speedup, which should also help the > 1D cases.
> Finally, I added a {{BKDReader.verify}} method (currently it's dark: NOT 
> called) that walks the index and then check that every value in each leaf 
> block does in fact fall within what the index expected/claimed.  This is 
> useful for finding bugs!  Maybe we can cleanly fold it into {{CheckIndex}} 
> somehow later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to