[jira] [Commented] (LUCENE-5983) RoaringDocIdSet

Adrien Grand (JIRA) Mon, 06 Oct 2014 02:01:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160115#comment-14160115
 ]


Adrien Grand commented on LUCENE-5983:
--------------------------------------

bq. Why remove WAH8 & PFOR yet not also Elias-Fano? Because EF compresses the 
most and doesn't perform as bad as those two in most advance() scenarios?

I have to admit I don't know this set as well as the PFOR and WAH8 ones, but 
indeed it seems to compress very efficiently sparse sets and also has the nice 
property that WAH8 and PFOR miss that it is naturally indexed, ie. it doesn't 
need a side-car data-structure in order to be able to skip efficiently.

> RoaringDocIdSet
> ---------------
>
>                 Key: LUCENE-5983
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5983
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5983.patch
>
>
> Robert pointed me to this paper: http://arxiv.org/pdf/1402.6407v4 that 
> describes an interesting way to build doc id sets: The bit space is divided 
> into blocks of 2^16 bits so that you can store the bits which are set either 
> in a short[] (2 bytes per doc ID) or in a FixedBitSet. The choice is easy, if 
> less than 2^12 bits are set, then the short[] representation is more compact 
> otherwise a FixedBitSet would be more compact. It's quite similar to the way 
> that Solr builds DocSets in {{SolrIndexSearcher.getDocSet(DocsEnumState)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5983) RoaringDocIdSet

Reply via email to