[ 
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698122#comment-13698122
 ] 

Adrien Grand edited comment on LUCENE-5084 at 7/2/13 7:29 PM:
--------------------------------------------------------------

I had a deeper look at the patch, here are some thoughts about it:
 - the patch looks great, really, I especially like all the integrity checks!
 - after having read more of it, I think moving it to oal.util.packed makes 
perfect sense
 - the documentation mentions the fact that a FixedBitSet will be more storage 
efficient if {{numValues >= (upperBound/3)}}, maybe we should have a static 
utility method to check that so that consumers of this API can opt for a 
FixedBitSet if their doc set is going to be dense?
 - in EliasFanoEncoder constructor, the ceil of the log in base 2 is computed 
through a loop, can we use Integer.numberOfLeadingZeros instead (as explained 
in 
http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#numberOfLeadingZeros(int)
 )
 - I think it would make sense to use PackedInts.getMutable to store the 
low-order bits instead of a raw long[] (this abstraction will hide the 
complexity of the bit packing or even use a more appropriate structure, eg. if 
the number of bits required is 8, it will use a byte[])
 - I'm not sure but shouldn't the iterator's getCost method return 
efDecoder.numValues instead of efEncoder.numValues?

Even though storing longs is more general, it forces us to use a different 
value to encode NO_MORE_DOCS which adds a little complexity to the DocIdSet 
implementation. Maybe we could just support the encoding of monotonically 
increasing sequences of ints to make things simpler? (just thinking out loud).
                
      was (Author: jpountz):
    I had a more deeper look at the patch, here are some thoughts about it:
 - the patch looks great, really, I especially like all the integrity checks!
 - after having read more of it, I think moving it to oal.util.packed makes 
perfect sense
 - the documentation mentions the fact that a FixedBitSet will be more storage 
efficient if {{numValues >= (upperBound/3)}}, maybe we should have a static 
utility method to check that so that consumers of this API can opt for a 
FixedBitSet if their doc set is going to be dense?
 - in EliasFanoEncoder constructor, the ceil of the log in base 2 is computed 
through a loop, can we use Integer.numberOfLeadingZeros instead (as explained 
in 
http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#numberOfLeadingZeros(int)
 )
 - I think it would make sense to use PackedInts.getMutable to store the 
low-order bits instead of a raw long[] (this abstraction will hide the 
complexity of the bit packing or even use a more appropriate structure, eg. if 
the number of bits required is 8, it will use a byte[])
 - I'm not sure but shouldn't the iterator's getCost method return 
efDecoder.numValues instead of efEncoder.numValues?

Even though storing longs is more general, it forces us to use a different 
value to encode NO_MORE_DOCS which adds a little complexity to the DocIdSet 
implementation. Maybe we could just support the encoding of monotonically 
increasing sequences of ints to make things simpler? (just thinking out loud).
                  
> EliasFanoDocIdSet
> -----------------
>
>                 Key: LUCENE-5084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5084
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to