[ https://issues.apache.org/jira/browse/LUCENE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796034#action_12796034 ]
Paul Elschot commented on LUCENE-2189: -------------------------------------- I've started some work on Simple9, also known as S9. It works like this: A compressed 32-bit word contains 4 status bits and 28 data bits. There are nine different ways of dividing up the 28 data bits: 28 1-bit numbers, 14 2-bit numbers, 9 3-bit numbers (one bit unused), 7 4-bit numbers, 5 5-numbers (three bits unused), 4 7-bit numbers, 3 9-bit numbers (one bit unused), 2 14-bit numbers, or 1 28-bit number. The four status bits store which of the 9 cases is used. Decompression can be done by doing a switch operation on the status bits, where each of the 9 cases applies a fixed bit mask to extract the integers. I've used this paper: Jiangong Zhang, Xiaohui Long, Torsten Suel, "Performance of Compressed Inverted List Caching in Search Engines", WWW 2008 / Refereed Track: Search - Corpus Characterization & Search Performance, Beijing, China A decoder is working decently. The encoder still needs a bit of work, mostly because I initially assumed it would be ok to encode more numbers than given. I'll post a patch later this week, hopefully with a better encoder. >From the paper one can conclude that, when compared to VInt, Simple9 is useful >for encoding: - frequencies and positions, (smaller compressed size, higher decompression speed) - exceptions for PFOR, as in LUCENE-1410, which is blocked by this issue. No performance measurements yet, but comparing the code to the PFOR code supports the view of the paper that Simple9 improves VInt, but will never be as good as PFOR. > Simple9 (de)compression > ----------------------- > > Key: LUCENE-2189 > URL: https://issues.apache.org/jira/browse/LUCENE-2189 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Paul Elschot > Priority: Minor > > Simple9 is an alternative for VInt. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org