[ 
https://issues.apache.org/jira/browse/LUCENE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796034#action_12796034
 ] 

Paul Elschot commented on LUCENE-2189:
--------------------------------------

I've started some work on Simple9, also known as S9. It works like this:

A compressed 32-bit word contains 4 status bits and 28 data bits.
There are nine different ways of dividing up the 28
data bits: 28 1-bit numbers, 14 2-bit numbers, 9 3-bit numbers (one
bit unused), 7 4-bit numbers, 5 5-numbers (three bits unused), 4
7-bit numbers, 3 9-bit numbers (one bit unused), 2 14-bit numbers,
or 1 28-bit number. The four status bits store which of the
9 cases is used. Decompression can be done by doing a switch operation
on the status bits, where each of the 9 cases applies a fixed bit mask to
extract the integers.

I've used this paper:
Jiangong Zhang, Xiaohui Long, Torsten Suel,
"Performance of Compressed Inverted List Caching in Search Engines",
WWW 2008 / Refereed Track: Search - Corpus Characterization & Search
Performance, Beijing, China

A decoder is working decently. The encoder still needs a bit of work,
mostly because I initially assumed it would be ok to encode more numbers
than given. I'll post a patch later this week, hopefully with a better encoder.

>From the paper one can conclude that, when compared to VInt, Simple9 is useful 
>for encoding:
- frequencies and positions, (smaller compressed size, higher decompression 
speed)
- exceptions for PFOR, as in LUCENE-1410, which is blocked by this issue.

No performance measurements yet, but comparing the code to the PFOR code
supports the view of the paper that Simple9 improves VInt, but will never be as 
good
as PFOR.



> Simple9 (de)compression
> -----------------------
>
>                 Key: LUCENE-2189
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2189
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Paul Elschot
>            Priority: Minor
>
> Simple9 is an alternative for VInt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to