jpountz commented on a change in pull request #730: LUCENE-8868: New storing
strategy for BKD tree leaves with low cardinality
URL: https://github.com/apache/lucene-solr/pull/730#discussion_r295688457
##########
File path: lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java
##########
@@ -1028,17 +1035,43 @@ private void writeLeafBlockDocs(DataOutput out, int[]
docIDs, int start, int cou
DocIdsWriter.writeDocIds(docIDs, start, count, out);
}
- private void writeLeafBlockPackedValues(DataOutput out, int[]
commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef>
packedValues) throws IOException {
+ private void writeLeafBlockPackedValues(DataOutput out, int[]
commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef>
packedValues, int leafCardinality) throws IOException {
int prefixLenSum = Arrays.stream(commonPrefixLengths).sum();
if (prefixLenSum == packedBytesLength) {
// all values in this block are equal
out.writeByte((byte) -1);
- } else {
+ } else if (leafCardinality * (packedBytesLength - prefixLenSum + 2) <=
count * (packedBytesLength - prefixLenSum)) {
Review comment:
Am I reading it right that you are counting 2 for the vint? I think you
could make it 1 instead, the reasoning being that if you vints are 2 bytes on
average, then it means than your runs are very long (vint start using 2 bytes
when they are greater than 127) and so the sparse encoding is an obvious win.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]