[jira] [Updated] (LUCENE-7563) BKD index should compress unused leading bytes

Michael McCandless (JIRA) Sat, 26 Nov 2016 05:02:13 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-7563:
---------------------------------------
    Attachment: LUCENE-7563.patch

Another iteration on the patch; I think it's ready.

I tested on the 20M sparse taxis data set and this change gives a
sizable (~56% - ~59%) reduction in heap usage:

  * sparse-sorted: 6.14 MB -> 2.49 MB

  * sparse: 4.93 MB -> 2.17 MB

  * dense: 4.88 MB -> 2.09 MB



> BKD index should compress unused leading bytes
> ----------------------------------------------
>
>                 Key: LUCENE-7563
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7563
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7563) BKD index should compress unused leading bytes

Reply via email to