[jira] [Created] (LUCENE-6901) Optimize 1D dimensional value indexing

Michael McCandless (JIRA) Thu, 19 Nov 2015 07:50:57 -0800

Michael McCandless created LUCENE-6901:
------------------------------------------


             Summary: Optimize 1D dimensional value indexing
                 Key: LUCENE-6901
                 URL: https://issues.apache.org/jira/browse/LUCENE-6901
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: Trunk


Dimensional values give a smaller index, and faster search times, for indexing 
ordered byte[] values across one or more dimensions, vs our existing 
approaches, but the indexing time is substantially slower.

Since the 1D case is so important/common (numeric fields, range query) I think 
it's worth optimizing its indexing time.  It should also be possible to 
optimize the N > 1 dimensions case too, but it's more complex ... we can 
postpone that.

So for the 1D case, I changed the merge method to do a merge sort (like 
postings) of the already sorted segments dimensional values, instead of simply 
re-indexing all values from the incoming segments, and this was a big speedup.

I also changed from {{InPlaceMergeSorter}} to {{IntroSorter}} (this is what 
postings use, and it's faster but still safe) and this was another good 
speedup, which should also help the > 1D cases.

Finally, I added a {{BKDReader.verify}} method (currently it's dark: NOT 
called) that walks the index and then check that every value in each leaf block 
does in fact fall within what the index expected/claimed.  This is useful for 
finding bugs!  Maybe we can cleanly fold it into {{CheckIndex}} somehow later.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-6901) Optimize 1D dimensional value indexing

Reply via email to