[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802335#action_12802335 ]
Paul Elschot commented on LUCENE-1410: -------------------------------------- The only reason why the number of compressed integers is encoded in the block header here is that when I coded it I did not know that this was not necessary in lucene indexes. That also means that the header can be used for different compression methods, for example in the following way: cases encoded in 1st byte: 32 FrameOfRef cases (#frameBits) followed by 3 bytes for #exceptions (0 for BITS, > 0 for PFOR) 16-64 cases for a SimpleNN variant 1-8 cases for run length encoding (for example followed by 3 bytes for length and value) Total #cases is 49-104 or 6-7 bits. Run length encoding is good for terms that occur in every document and for the frequencies of primary keys. The only concern I have is that the instruction cache might get filled up with the code for all these decoding cases. At the moment I don't know how to deal with that other than by adding such cases slowly while doing performance tests all the time. > PFOR implementation > ------------------- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other > Reporter: Paul Elschot > Priority: Minor > Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, > LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, > LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, > TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org