jpountz commented on a change in pull request #730: LUCENE-8868: New storing 
strategy for BKD tree leaves with low cardinality
URL: https://github.com/apache/lucene-solr/pull/730#discussion_r295688457
 
 

 ##########
 File path: lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java
 ##########
 @@ -1028,17 +1035,43 @@ private void writeLeafBlockDocs(DataOutput out, int[] 
docIDs, int start, int cou
     DocIdsWriter.writeDocIds(docIDs, start, count, out);
   }
 
-  private void writeLeafBlockPackedValues(DataOutput out, int[] 
commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> 
packedValues) throws IOException {
+  private void writeLeafBlockPackedValues(DataOutput out, int[] 
commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> 
packedValues, int leafCardinality) throws IOException {
     int prefixLenSum = Arrays.stream(commonPrefixLengths).sum();
     if (prefixLenSum == packedBytesLength) {
       // all values in this block are equal
       out.writeByte((byte) -1);
-    } else {
+    } else if (leafCardinality * (packedBytesLength - prefixLenSum + 2)  <= 
count * (packedBytesLength - prefixLenSum)) {
 
 Review comment:
   Am I reading it right that you are counting 2 for the vint? I think you 
could make it 1 instead, the reasoning being that if you vints are 2 bytes on 
average, then it means than your runs are very long (vint start using 2 bytes 
when they are greater than 127) and so the sparse encoding is an obvious win.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to