[
https://issues.apache.org/jira/browse/LUCENE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704798#comment-13704798
]
Adrien Grand commented on LUCENE-5081:
--------------------------------------
New patch:
- renamed implementation to WAH8DocIdSet
- added an index in order to be able to advance() in logarithmic time, this
works pretty much like the old terms index impl by storing the position and doc
ID encoded at every n-th sequence and then using binary search to find
somewhere before the target and close to it,
- even with the index, WAH8DocIdSet is never larger than FixedBitSet by more
than 2% (even when the index interval is 8, which is the lowest accepted value
in the current impl),
- factored some code out of BitVector and OpenBitSetIterator into BitUtil.
I haven't wired this set implementation anywhere yet but I think always being
less than 2% smaller than FixedBitSet and being able to advance in logarithmic
time are nice properties so I'm pretty sure some people will be interested in
using it for their caches. I'm waiting for the other implementations to get
in/improve (eg. when EliasFanoDocIdSet will have an index) to write more
detailed benchmarks to compare speed and memory efficiency of the impls we have
for our caches (Elias-Fano, WAH8, FixedBitSet so far, maybe something based on
PFOR-delta soon too).
Please let me know if you would like to review this patch. Otherwise I will
commit it soon.
> Compress doc ID sets
> --------------------
>
> Key: LUCENE-5081
> URL: https://issues.apache.org/jira/browse/LUCENE-5081
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-5081.patch
>
>
> Our filters use bit sets a lot to store document IDs. However, it is likely
> that most of them are sparse hence easily compressible. Having efficient
> compressed sets would allow for caching more data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]