[jira] [Updated] (LUCENE-7351) BKDWriter should compress doc ids when all values in a block are the same

Adrien Grand (JIRA) Mon, 27 Jun 2016 07:17:48 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7351:
---------------------------------
    Attachment: LUCENE-7351.patch

I have been experimenting with the attached patch, which compresses doc ids 
based on the number of required bytes to store them (it only specializes 8, 16, 
24 and 32 bits per doc id) and also adds delta-compression for blocks whose 
values are all the same. The IndexAndSearchOpenStreetMaps reported a slow down 
of 1.7% for the box benchmark (72.3 QPS -> 71.1 QPS) but storage requirements 
decreased by 9.1% (635MB -> 577MB). The storage requirements improve even more 
with types that require fewer bytes (LatLonPoint requires 8 bytes per value). 
For instance indexing 10M random half floats with the patch requires 28MB vs 
43MB on master (-35%).

> BKDWriter should compress doc ids when all values in a block are the same
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-7351
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7351
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7351.patch
>
>
> BKDWriter writes doc ids using 4 bytes per document. I think it should 
> compress similarly to postings when all docs in a block have the same packed 
> value. This can happen either when a field has a default value which is 
> common across documents or when quantization makes the number of unique 
> values so small that a large index will necessarily have blocks that all 
> contain the same value (eg. there are only 63490 unique half-float values).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7351) BKDWriter should compress doc ids when all values in a block are the same

Reply via email to