[ 
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698319#comment-13698319
 ] 

Paul Elschot commented on LUCENE-5084:
--------------------------------------

bq.  maybe we should have a static utility method to check that so that 
consumers of this API can opt for a FixedBitSet if their doc set is going to be 
dense?

We could, but in which class? For example, in CachingWrapperFilter it might be 
good to save memory, so it could be there.
Also, would the expected size be the only thing to check for? When decoding 
speed is also important, other DocIdSets might be preferable.


bq.  the ceil of the log in base 2 is computed through a loop
numberOfLeadingZeros is indeed better than a loop. We need the Long variant 
here.

bq. use PackedInts.getMutable to store the low-order bits instead of a raw 
long[]
Can PackedInts.getMutable also be used in a codec? Longs are needed for the 
high bits, see below, and the high and low bits can be conveniently stored next 
to each other in an index.

bq.  shouldn't the iterator's getCost method return efDecoder.numValues instead 
of efEncoder.numValues?
Yes.

bq. Maybe we could just support the encoding of monotonically increasing 
sequences of ints to make things simpler?

I considered a decoder that returns ints but it that would require a lot more 
casting in the decoder.
Decoding the unary encoded high bits is best done on longs, so mixing longs and 
ints in encoder is not really an option.
We could pass the actual NO_MORE_VALUES to be used as an argument to the 
decoder, would that help?

As to why decoding the unary encoded high bits is best done on longs, see 
Algorithm 2 in "Broadword Implementation of Rank/Select Queries", Sebastiano 
Vigna, January 30, 2012, http://vigna.di.unimi.it/ftp/papers/Broadword.pdf .
I also have an initial java implementation of that, but it is not used here 
yet, there are only a few comments in the code here that it might be used. I'll 
open another issue for broadword bit selection later.




                
> EliasFanoDocIdSet
> -----------------
>
>                 Key: LUCENE-5084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5084
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to