[ 
https://issues.apache.org/jira/browse/LUCENE-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545995#comment-13545995
 ] 

Adrien Grand commented on LUCENE-4643:
--------------------------------------

I made some tests with my compressed TermVectorsFormat and the problem is that 
it sometimes wastes space. For example if all values from a block are between 
-1 and 6, the first patch would require 3 bits whereas the 2nd one + zig-zag 
encoding a level above would require 4 bits per value so I think I should 
rather commit the first patch?


                
> PackedInts: convenience classes to write blocks of packed ints
> --------------------------------------------------------------
>
>                 Key: LUCENE-4643
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4643
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-4643.patch, LUCENE-4643.patch
>
>
> It is often useful to divide a packed stream into fixed blocks which are all 
> compressed independently:
>  * if your sequence of ints is very large, you won't have to buffer 
> everything into memory to compute the required number of bits per value,
>  * the compression ratio will be better in case of rare extreme values.
> The only drawback compared to the original PackedInts API is that the stream 
> cannot be directly used to deserialize a random-access PackedInts.Reader (but 
> for sequential access, this is just fine).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to