[ https://issues.apache.org/jira/browse/LUCENE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-5150: --------------------------------- Attachment: LUCENE-5150.patch Here is a patch. It reserves an additional bit in the header to say whether the encoding should be "inversed" (meaning clean words are actually 0xFF instead of 0x00). It should reduce the amount of memory required to build and store dense sets. In spite of this change, compression ratios remain the same for sparse sets. For random dense sets, I observed compression ratios of 87% when the load factor is 90% and 20% when the load factor is 99% (vs. 100% before). > WAH8DocIdSet: dense sets compression > ------------------------------------ > > Key: LUCENE-5150 > URL: https://issues.apache.org/jira/browse/LUCENE-5150 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Trivial > Attachments: LUCENE-5150.patch > > > In LUCENE-5101, Paul Elschot mentioned that it would be interesting to be > able to encode the inverse set to also compress very dense sets. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org