Copilot commented on code in PR #18170:
URL: https://github.com/apache/pinot/pull/18170#discussion_r3067864111


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -238,14 +263,7 @@ public void seal() {
   }
 
   private void updateHllPlus(Object value) {
-    if (value instanceof BigDecimal) {
-      // Canonicalize BigDecimal as string to avoid scale-related equality 
issues:
-      // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal 
by default,
-      // but their string representations normalize the value for cardinality 
tracking.
-      _hllPlus.offer(((BigDecimal) value).toString());
-    } else {
-      _hllPlus.offer(value);
-    }
+    _hllPlus.offer(value);
   }

Review Comment:
   NoDictColumnStatisticsCollector.updateHllPlus() currently offers the raw 
value object to HLL++. For byte[] this hashes by identity (not content) and for 
BigDecimal it hashes with scale sensitivity, so once exact tracking is disabled 
(threshold exceeded) the approximate cardinality can become incorrect. Consider 
canonicalizing the offered value the same way as trackExactUnique() (e.g., wrap 
byte[] in ByteArray and offer BigDecimal.toString()/toPlainString()).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to