Copilot commented on code in PR #18170:
URL: https://github.com/apache/pinot/pull/18170#discussion_r3067864111
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -238,14 +263,7 @@ public void seal() {
}
private void updateHllPlus(Object value) {
- if (value instanceof BigDecimal) {
- // Canonicalize BigDecimal as string to avoid scale-related equality
issues:
- // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal
by default,
- // but their string representations normalize the value for cardinality
tracking.
- _hllPlus.offer(((BigDecimal) value).toString());
- } else {
- _hllPlus.offer(value);
- }
+ _hllPlus.offer(value);
}
Review Comment:
NoDictColumnStatisticsCollector.updateHllPlus() currently offers the raw
value object to HLL++. For byte[] this hashes by identity (not content) and for
BigDecimal it hashes with scale sensitivity, so once exact tracking is disabled
(threshold exceeded) the approximate cardinality can become incorrect. Consider
canonicalizing the offered value the same way as trackExactUnique() (e.g., wrap
byte[] in ByteArray and offer BigDecimal.toString()/toPlainString()).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]