[
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-7563:
---------------------------------------
Attachment: LUCENE-7563.patch
Another iteration on the patch; I think it's ready.
I tested on the 20M sparse taxis data set and this change gives a
sizable (~56% - ~59%) reduction in heap usage:
* sparse-sorted: 6.14 MB -> 2.49 MB
* sparse: 4.93 MB -> 2.17 MB
* dense: 4.88 MB -> 2.09 MB
> BKD index should compress unused leading bytes
> ----------------------------------------------
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom
> two bytes in a given segment, we shouldn't store all those leading 0s in the
> index.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]