imply-cheddar commented on code in PR #15162:
URL: https://github.com/apache/druid/pull/15162#discussion_r1360088821
##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -67,4 +74,32 @@ public HllSketchHolder fromByteBufferSafe(ByteBuffer buffer,
int numBytes)
)
);
}
+
+ private boolean isSafeToConvertToNullSketch(ByteBuffer buf)
+ {
+ // TODO: Might need a sanity check here, to ensure that position and
offset makes sense.
+
+ // Get org.apache.datasketches.hll.CurMode. This indicates the type of
data structure.
+ final int position = buf.position();
+ final int preInts = buf.get(position) & 0X3F; // get(PREAMBLE_INTS_BYTE) &
0X3F
+ final int curMode = buf.get(position + 7) & 3; // get(MODE_BYTE) &
CUR_MODE_MASK
+
+ switch (curMode) {
+ case 0: // LIST
+ Preconditions.checkArgument(preInts == 2); // preInts ==
LIST_PREINTS, Sanity
Review Comment:
`Preconditions.checkArgument` doesn't generate a nice exception for an end
user and, this usage of `Preconditions` from Guava is one of the sources of
code-level incompatibility between versions of Guava. In general, it's best to
avoid dependence on Guava wherever possible. For checking an argument like
this, it might make sense to add a static method on `DruidException` or have
some other utility type place that will do the check and throw a
`DruidException` instead of the generic `IllegalArgumentException`.
That said, throwing an exception in this case seems overly zealous. It
seems better to just return false if our assumption is broken here and let the
underlying HLL library do what it will with it.
##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -48,12 +49,18 @@ public int compare(final HllSketchHolder sketch1, final
HllSketchHolder sketch2)
@Override
public HllSketchHolder fromByteBuffer(final ByteBuffer buf, final int size)
{
+ if (size == 0 || isSafeToConvertToNullSketch(buf)) {
+ return HllSketchHolder.of((HllSketch) null);
+ }
return HllSketchHolder.of(HllSketch.wrap(Memory.wrap(buf,
ByteOrder.LITTLE_ENDIAN).region(buf.position(), size)));
}
@Override
public byte[] toBytes(final HllSketchHolder sketch)
{
+ if (sketch.getSketch() == null || sketch.getSketch().isEmpty()) {
Review Comment:
Nit: let's rename the variable `sketch` to `holder`. The code reads funny
right now.
Not nit: In the case that this if resolves to false and you return from the
final line in this method, the code will have called `sketch.getSketch()` 3
times in this method. That's 2 more times than it has to and there's no
guarantee that `sketch.getSketch()` isn't doing a bunch of work on each
independent call. Yes, you can look at the implementation and see that it's
going to be returning a materialized copy on each call after the first, but
this code would be significantly improved if it just called `getSketch()` once
and then did checks on that return value. The only time a method should call
the exact same getter multiple times in the same execution is when it expects
to get different objects on each call. If it always expects to get the same
object, then the code should just get the object once and reuse that reference.
The impact here is minimal, but it is incredibly important that all code be
written thinking about how many times methods are called and what references
should be getting reused, because there are times when it actually does
matter. It's best to have all code follow good practices always than to pick
and choose for this one.
##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -67,4 +74,32 @@ public HllSketchHolder fromByteBufferSafe(ByteBuffer buffer,
int numBytes)
)
);
}
+
+ private boolean isSafeToConvertToNullSketch(ByteBuffer buf)
+ {
+ // TODO: Might need a sanity check here, to ensure that position and
offset makes sense.
Review Comment:
The contract is that it should be safe to read `size` bytes from the
`ByteBuffer`, so just pass size in here and trust that you can read that many
bytes.
##########
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchHolderObjectStrategy.java:
##########
@@ -48,12 +49,18 @@ public int compare(final HllSketchHolder sketch1, final
HllSketchHolder sketch2)
@Override
public HllSketchHolder fromByteBuffer(final ByteBuffer buf, final int size)
{
+ if (size == 0 || isSafeToConvertToNullSketch(buf)) {
Review Comment:
If the `size` parameter is less than 8, then I believe that the
`isSafeToConvertToNullSketch` method could end up throwing an exception when it
reads out the value for `curMode`. It would likely make sense to pass the size
parameter into the `isSafe...` method and check the size there as well,
returning false if it's too small.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]