[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7371:
---------------------------------
    Attachment: LUCENE-7371.patch

I first thought the issue was with inlining since the methods have many 
arguments and I had made them bigger, but it turned out that the main issue was 
branch misprediction due the use of vints for encoding the run length since 
runs are almost alternatively less/greater than 127 (the boundary for 1-2 bytes 
with vints). So I capped the run length to 256 in order to be able to use one 
byte for run lengths all the time and things are now faster with compression 
(about 75.1 QPS on master and 78.2 QPS with the patch, a 4% improvement). Disk 
savings are similar to what they were with the previous iteration of the patch 
since the index is now 522MB on disk vs. 521MB with the previous iteration of 
the patch.

> BKDReader could compress values better
> --------------------------------------
>
>                 Key: LUCENE-7371
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7371
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to