[
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702891#comment-15702891
]
Adrien Grand commented on LUCENE-7563:
--------------------------------------
bq. Hmm I think I am already doing that?
You are right, I had not read the code correctly.
bq. Oooh that's a great idea! Saves 1 byte per inner node. We need 5 bits for
the prefix I think since it can range 0 .. 16 inclusive, and 3 bits for the
splitDim since it's 0 .. 7 inclusive.
I have been thinking about it more and I think we can make it more general. The
first two bytes that differ are likely close to each other, so if we call their
difference {{firstByteDelta}}, we could pack {{firstByteDelta}}, {{splitDim}}
and {{prefix}} into a single vint (eg. {{(firstByteDelta * (1 + bytesPerDim) +
prefix) * numDims + splitDim}}) that would sometimes only take one byte (quite
often when {{numDims}} and {{bytesPerDim}} are small and rarely in the opposite
case).
bq. but it felt wrong to just pass these packed bytes to the simple text format
...
Agreed. Maybe we should duplicate the curent BKDReader/BKDWriter into a new
impl that would be specific to SimpleText and would not need all those
optimizations so that both impls can evolve separately.
> BKD index should compress unused leading bytes
> ----------------------------------------------
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom
> two bytes in a given segment, we shouldn't store all those leading 0s in the
> index.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]