imply-cheddar commented on code in PR #15162:
URL: https://github.com/apache/druid/pull/15162#discussion_r1360088821


##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -67,4 +74,32 @@ public HllSketchHolder fromByteBufferSafe(ByteBuffer buffer, 
int numBytes)
         )
     );
   }
+
+  private boolean isSafeToConvertToNullSketch(ByteBuffer buf)
+  {
+    // TODO: Might need a sanity check here, to ensure that position and 
offset makes sense.
+
+    // Get org.apache.datasketches.hll.CurMode. This indicates the type of 
data structure.
+    final int position = buf.position();
+    final int preInts = buf.get(position) & 0X3F; // get(PREAMBLE_INTS_BYTE) & 
0X3F
+    final int curMode = buf.get(position + 7) & 3;    // get(MODE_BYTE) & 
CUR_MODE_MASK
+
+    switch (curMode) {
+      case 0:          // LIST
+        Preconditions.checkArgument(preInts == 2);  // preInts == 
LIST_PREINTS, Sanity

Review Comment:
   `Preconditions.checkArgument` doesn't generate a nice exception for an end 
user and, this usage of `Preconditions` from Guava is one of the sources of 
code-level incompatibility between versions of Guava.  In general, it's best to 
avoid dependence on Guava wherever possible.  For checking an argument like 
this, it might make sense to add a static method on `DruidException` or have 
some other utility type place that will do the check and throw a 
`DruidException` instead of the generic `IllegalArgumentException`.
   
   That said, throwing an exception in this case seems overly zealous.  It 
seems better to just return false if our assumption is broken here and let the 
underlying HLL library do what it will with it.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -48,12 +49,18 @@ public int compare(final HllSketchHolder sketch1, final 
HllSketchHolder sketch2)
   @Override
   public HllSketchHolder fromByteBuffer(final ByteBuffer buf, final int size)
   {
+    if (size == 0 || isSafeToConvertToNullSketch(buf)) {
+      return HllSketchHolder.of((HllSketch) null);
+    }
     return HllSketchHolder.of(HllSketch.wrap(Memory.wrap(buf, 
ByteOrder.LITTLE_ENDIAN).region(buf.position(), size)));
   }
 
   @Override
   public byte[] toBytes(final HllSketchHolder sketch)
   {
+    if (sketch.getSketch() == null || sketch.getSketch().isEmpty()) {

Review Comment:
   Nit: let's rename the variable `sketch` to `holder`.  The code reads funny 
right now.
   
   Not nit: In the case that this if resolves to false and you return from the 
final line in this method, the code will have called `sketch.getSketch()` 3 
times in this method.  That's 2 more times than it has to and there's no 
guarantee that `sketch.getSketch()` isn't doing a bunch of work on each 
independent call.  Yes, you can look at the implementation and see that it's 
going to be returning a materialized copy on each call after the first, but 
this code would be significantly improved if it just called `getSketch()` once 
and then did checks on that return value.  The only time a method should call 
the exact same getter multiple times in the same execution is when it expects 
to get different objects on each call.  If it always expects to get the same 
object, then the code should just get the object once and reuse that reference. 
 The impact here is minimal, but it is incredibly important that all code be 
written thinking about how many times methods are called and what references
  should be getting reused, because there are times when it actually does 
matter.  It's best to have all code follow good practices always than to pick 
and choose for this one.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -67,4 +74,32 @@ public HllSketchHolder fromByteBufferSafe(ByteBuffer buffer, 
int numBytes)
         )
     );
   }
+
+  private boolean isSafeToConvertToNullSketch(ByteBuffer buf)
+  {
+    // TODO: Might need a sanity check here, to ensure that position and 
offset makes sense.

Review Comment:
   The contract is that it should be safe to read `size` bytes from the 
`ByteBuffer`, so just pass size in here and trust that you can read that many 
bytes.



##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -48,12 +49,18 @@ public int compare(final HllSketchHolder sketch1, final 
HllSketchHolder sketch2)
   @Override
   public HllSketchHolder fromByteBuffer(final ByteBuffer buf, final int size)
   {
+    if (size == 0 || isSafeToConvertToNullSketch(buf)) {

Review Comment:
   If the `size` parameter is less than 8, then I believe that the 
`isSafeToConvertToNullSketch` method could end up throwing an exception when it 
reads out the value for `curMode`.  It would likely make sense to pass the size 
parameter into the `isSafe...` method and check the size there as well, 
returning false if it's too small. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to