[
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873130#comment-16873130
]
ASF subversion and git services commented on LUCENE-8868:
---------------------------------------------------------
Commit 1f4de51f8b937a48afb25b59c1cd05ea8b30a8fa in lucene-solr's branch
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1f4de51 ]
LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality
(#743)
When a leaf has only few distinct values, we store the distinct values with the
cardinality.
> New storing strategy for BKD tree leaves with low cardinality
> -------------------------------------------------------------
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Priority: Major
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf
> is treated the same way as it all values are different. It many cases it can
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1. When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2)
> where two is the estimated size of storing the cardinality. This is an
> overestimation as in some cases you will only need one byte to store the
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note
> that -1 is when all values are equal.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]