[
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868354#comment-16868354
]
Ignacio Vera commented on LUCENE-8868:
--------------------------------------
It is small and I think the 1D case will benefit quite a lot. Just wanted to
raise the issue so we think over it.
> New storing strategy for BKD tree leaves with low cardinality
> -------------------------------------------------------------
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf
> is treated the same way as it all values are different. It many cases it can
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1. When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2)
> where two is the estimated size of storing the cardinality. This is an
> overestimation as in some cases you will only need one byte to store the
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note
> that -1 is when all values are equal.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]