iverase opened a new pull request #730: LUCENE-8868: New storing strategy for 
BKD tree leaves with low cardinality
URL: https://github.com/apache/lucene-solr/pull/730
 
 
   # Description
   
   Currently if a leaf on the BKD tree contains only few values, then the leaf 
is treated the same way as it all values are different. It many cases it can be 
much more efficient to store the distinct values with the cardinality.
   
   # Solution
   
   The strategy is the following:
   1. When writing a leaf block the cardinality is computed.
   2. Perform some naive calculation to compute if it is better to store the 
leaf as a low cardinality leaf. The storage cost are calculated as follows:
   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
where two is the estimated size of storing the cardinality. This is an 
overestimation as in some cases you will only need one byte to store the 
cardinality.
   High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
taking into account the runlen compression.
   3. If the tree has low cardinality then we set the compressed dim to -2. 
Note that -1 is when all values are equal.
   
   # Tests
   
   BKD tree is extensively tested already so there is no need to add new tests.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to