[ 
https://issues.apache.org/jira/browse/LUCENE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5150:
---------------------------------

    Attachment: LUCENE-5150.patch

Here is a patch. It reserves an additional bit in the header to say whether the 
encoding should be "inversed" (meaning clean words are actually 0xFF instead of 
0x00).

It should reduce the amount of memory required to build and store dense sets. 
In spite of this change, compression ratios remain the same for sparse sets.

For random dense sets, I observed compression ratios of 87% when the load 
factor is 90% and 20% when the load factor is 99% (vs. 100% before).
                
> WAH8DocIdSet: dense sets compression
> ------------------------------------
>
>                 Key: LUCENE-5150
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5150
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Trivial
>         Attachments: LUCENE-5150.patch
>
>
> In LUCENE-5101, Paul Elschot mentioned that it would be interesting to be 
> able to encode the inverse set to also compress very dense sets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to