jpountz commented on code in PR #15165:
URL: https://github.com/apache/lucene/pull/15165#discussion_r2331204338


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/PForUtil.java:
##########
@@ -46,34 +45,39 @@ static boolean allEqual(int[] l) {
 
   /** Encode 128 integers from {@code ints} into {@code out}. */
   void encode(int[] ints, DataOutput out) throws IOException {
-    // Determine the top MAX_EXCEPTIONS + 1 values
-    final LongHeap top = new LongHeap(MAX_EXCEPTIONS + 1);
-    for (int i = 0; i <= MAX_EXCEPTIONS; ++i) {
-      top.push(ints[i]);
-    }
-    long topValue = top.top();
-    for (int i = MAX_EXCEPTIONS + 1; i < ForUtil.BLOCK_SIZE; ++i) {
-      if (ints[i] > topValue) {
-        topValue = top.updateTop(ints[i]);
-      }
-    }
-
-    long max = 0L;
-    for (int i = 1; i <= top.size(); ++i) {
-      max = Math.max(max, top.get(i));
+    // histogram of bit widths
+    final int[] histogram = new int[32];
+    int maxBitsRequired = 0;
+    for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
+      final int v = ints[i];
+      final int bits = PackedInts.bitsRequired(v);
+      histogram[bits]++;
+      maxBitsRequired = Math.max(maxBitsRequired, bits);
     }
 
-    final int maxBitsRequired = PackedInts.bitsRequired(max);
-    // We store the patch on a byte, so we can't decrease the number of bits 
required by more than 8
-    final int patchedBitsRequired =
-        Math.max(PackedInts.bitsRequired(topValue), maxBitsRequired - 8);
+    int bestCost = Integer.MAX_VALUE;
+    int patchedBitsRequired = maxBitsRequired;
     int numExceptions = 0;
-    final long maxUnpatchedValue = (1L << patchedBitsRequired) - 1;
-    for (int i = 2; i <= top.size(); ++i) {
-      if (top.get(i) > maxUnpatchedValue) {
-        numExceptions++;
+
+    // We store patch on a byte, so we can't decrease bits by more than 8
+    final int minBits = Math.max(0, maxBitsRequired - 8);
+    int cumulativeExceptions = 0;
+    for (int b = maxBitsRequired; b >= minBits; --b) {
+      // Early termination if too many exceptions
+      if (cumulativeExceptions > MAX_EXCEPTIONS) break;
+
+      // storage cost
+      int cost = (ForUtil.BLOCK_SIZE * b) + (cumulativeExceptions << 4);
+      if (cost < bestCost) {

Review Comment:
   This condition is always true, right? Because going to the previous number 
of bits per value saves 128 bits (1 bit per integer)? And in the worst-case 
scenario, you go from 0 to MAX_EXCEPTIONS exceptions, which increases storage 
for exceptions by 7*16=112 bits?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to