[ 
https://issues.apache.org/jira/browse/LUCENE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707419#comment-13707419
 ] 

Paul Elschot commented on LUCENE-5109:
--------------------------------------

I don't expect to have time to continue this until half way September, so I'd 
rather leave this here, maybe someone can pick it up.

The index is a value index on the upper bit numbers.
I have called the index a value index, because it would also be possible to add 
an index the Elias-Fano index of the unary upper bit numbers. Basically a value 
index indexes the zero upper bits, and an index index would index the one upper 
bits. (A unary number is a series of zero bits followed by a single one bit, 
and the number of zero bits determines the value.)

The patch adds ...ValueIndexed.java files that extend EliasFanoEncoder and 
EliasFanoDecoder.
(EliasFanoEncoder is also changed to use longHex from ToStringUtils, just as in 
LUCENE-5098 .)

EliasFanoEncoderValueIndexed creates an index by value on the upper bits,
and EliasFanoDecoderValueIndexed uses this index in its advanceToValue method.

Both EliasFanoEncoder EliasFanoDecoder have attributes changed from private to 
protected, so they can be used in their subclasses.

The EliasFanoDocIdSet is changed to use the above value indexed versions.
Its testAgainstBitSet has been overriden to be empty (nocommit) because that 
test currently fails.
There are no other tests yet.

The main function of the patch is overriding the method advanceToHighValue in 
EliasFanoDecoderValueIndexed.
The idea is to advance to the high value just before the actual target from the 
index, and then continue as usual.

Unfortunately the value index does not work yet, the testAgainstBitSet fails.
Some tests for the Elias-Fano index itself clearly need to be added first.


Once this index works it is probably better to merge it into EliasFanoEncoder 
and EliasFanoDecoder because of the speed up the index is expected to provide.

I prefer to start like this because without index things are working nicely 
now, even with the  changes from private to protected.





                
> EliasFano value index
> ---------------------
>
>                 Key: LUCENE-5109
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5109
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5109.patch
>
>
> Index upper bits of Elias-Fano sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to