gianm commented on a change in pull request #11004:
URL: https://github.com/apache/druid/pull/11004#discussion_r596269223



##########
File path: 
processing/src/main/java/org/apache/druid/segment/data/VSizeLongSerde.java
##########
@@ -330,7 +329,7 @@ public void write(long value) throws IOException
         curByte = (byte) value;
         first = false;
       } else {
-        curByte = (byte) ((curByte << 4) | ((value >> (numBytes << 3)) & 0xF));
+        curByte = (byte) ((curByte << 4) | ((value >>> (numBytes << 3)) & 
0xF));

Review comment:
       Was this a bug fix? If so: it's on the write side; does that mean there 
might be bad segments out there, or is there some reason that this line 
wouldn't have affected any already-written data that people might have? (Maybe 
negative numbers were never fed to this method.)

##########
File path: 
processing/src/test/java/org/apache/druid/segment/data/VSizeLongSerdeTest.java
##########
@@ -20,132 +20,352 @@
 package org.apache.druid.segment.data;
 
 
+import com.google.common.primitives.Ints;
+import org.apache.druid.java.util.common.StringUtils;
 import org.junit.Assert;
-import org.junit.Before;
 import org.junit.Test;
+import org.junit.experimental.runners.Enclosed;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
 
 import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
 
+@RunWith(Enclosed.class)
 public class VSizeLongSerdeTest

Review comment:
       The change in Mult4Ser suggests that we care about handling negative 
numbers, but this test class doesn't exercise negative numbers very much. (I 
think it only tests Long.MIN_VALUE, in testEveryPowerOfTwo.)
   
   If negative numbers matter, we should extend the test cases in this file to 
cover them better. I'd suggest adding tests to EveryLittleBitTest that are 
similar to testEveryPowerOfTwo and testEveryPowerOfTwoMinusOne, but have the 
sign bit set (i.e. bitwise or with `Long.MIN_VALUE`).
   
   If negative numbers aren't important, I'd suggest blocking them on the write 
side, i.e. have all the LongSerializers throw errors if they are fed negative 
numbers.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/data/VSizeLongSerde.java
##########
@@ -413,9 +412,81 @@ public void close() throws IOException
     }
   }
 
+  /**
+   * Unpack bitpacked long values from an underlying contiguous memory block
+   */
   public interface LongDeserializer
   {
+    /**
+     * Unpack long value at the specified row index
+     */
     long get(int index);
+
+    /**
+     * Unpack a contiguous vector of long values at the specified start index 
of length and adjust them by the supplied
+     * delta base value.
+     */
+    default void getDelta(long[] out, int outPosition, int startIndex, int 
length, long base)

Review comment:
       Are the default implementations ever used? If not, we could remove them.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/data/VSizeLongSerde.java
##########
@@ -413,9 +412,81 @@ public void close() throws IOException
     }
   }
 
+  /**
+   * Unpack bitpacked long values from an underlying contiguous memory block
+   */
   public interface LongDeserializer
   {
+    /**
+     * Unpack long value at the specified row index
+     */
     long get(int index);
+
+    /**
+     * Unpack a contiguous vector of long values at the specified start index 
of length and adjust them by the supplied
+     * delta base value.
+     */
+    default void getDelta(long[] out, int outPosition, int startIndex, int 
length, long base)
+    {
+      for (int i = 0; i < length; i++) {
+        out[outPosition + i] = base + get(startIndex + i);
+      }
+    }
+
+    /**
+     * Unpack a non-contiguous vector of long values at the specified indexes 
and adjust them by the supplied delta base
+     * value.
+     */
+    default int getDelta(long[] out, int outPosition, int[] indexes, int 
length, int indexOffset, int limit, long base)

Review comment:
       Do you have evidence that the `getDelta` and `getTable` methods are 
helpful? (vs. the alternative: first calling a regular bulk `get` method, then 
applying the delta or table adjustment in a loop over the returned arrays)
   
   They complexify the code quite a bit, so we should only include them if they 
are meaningfully better performance-wise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to