[
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698122#comment-13698122
]
Adrien Grand commented on LUCENE-5084:
--------------------------------------
I had a more deeper look at the patch, here are some thoughts about it:
- the patch looks great, really, I especially like all the integrity checks!
- after having read more of it, I think moving it to oal.util.packed makes
perfect sense
- the documentation mentions the fact that a FixedBitSet will be more storage
efficient if {{numValues >= (upperBound/3)}}, maybe we should have a static
utility method to check that so that consumers of this API can opt for a
FixedBitSet if their doc set is going to be dense?
- in EliasFanoEncoder constructor, the ceil of the log in base 2 is computed
through a loop, can we use Integer.numberOfLeadingZeros instead (as explained
in
http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#numberOfLeadingZeros(int)
)
- I think it would make sense to use PackedInts.getMutable to store the
low-order bits instead of a raw long[] (this abstraction will hide the
complexity of the bit packing or even use a more appropriate structure, eg. if
the number of bits required is 8, it will use a byte[])
- I'm not sure but shouldn't the iterator's getCost method return
efDecoder.numValues instead of efEncoder.numValues?
Even though storing longs is more general, it forces us to use a different
value to encode NO_MORE_DOCS which adds a little complexity to the DocIdSet
implementation. Maybe we could just support the encoding of monotonically
increasing sequences of ints to make things simpler? (just thinking out loud).
> EliasFanoDocIdSet
> -----------------
>
> Key: LUCENE-5084
> URL: https://issues.apache.org/jira/browse/LUCENE-5084
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Paul Elschot
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]