[
https://issues.apache.org/jira/browse/LUCENE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696339#comment-13696339
]
Adrien Grand commented on LUCENE-5081:
--------------------------------------
bq. Maybe we should anticipate more implementations in the future?
Good point. I'll rename to WAH8DocIdSet (word-aligned hybrid compression on
words of 8 bits).
bq. LUCENE-1969 is another effort to add compressed bit sets to Lucene ...
Thanks for the pointer, I'll look into it.
bq. Is it possible to make it random access at all?
Unfortunately it is not possible. Although it is easy to save space when having
to either support random access or iterate in order, requiring both makes
compression much harder especially if the bit set is not very sparse.
bq. I'm (slowly) working on an implemention of Elias Fano compression,
basically as described in in sections 3 and 4 of this article. [...] I'll open
an issue for this soon.
This looks very interesting! I'm looking forward to seeing how this DocIdSet
would behave!
> Compress doc ID sets
> --------------------
>
> Key: LUCENE-5081
> URL: https://issues.apache.org/jira/browse/LUCENE-5081
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-5081.patch
>
>
> Our filters use bit sets a lot to store document IDs. However, it is likely
> that most of them are sparse hence easily compressible. Having efficient
> compressed sets would allow for caching more data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]