github-actions[bot] commented on code in PR #63430:
URL: https://github.com/apache/doris/pull/63430#discussion_r3272533556


##########
be/src/exprs/aggregate/aggregate_function_linear_histogram.h:
##########
@@ -174,6 +187,61 @@ struct AggregateFunctionLinearHistogramData {
         auto& column = assert_cast<ColumnString&>(to);
         column.insert_data(buffer.GetString(), buffer.GetSize());
     }
+
+private:
+    double bucket_key(double value) const {
+        double key = (value - offset) / interval;
+        double rounded_key = std::round(key);
+        if (boundary_decimal_scale != NO_BOUNDARY_SCALE) {
+            double raw_boundary = rounded_key * interval + offset;
+            double boundary = normalize_boundary(raw_boundary, 
boundary_decimal_scale);
+            if (value == boundary) {

Review Comment:
   This can still move valid values across the true bucket boundary when 
`decimal_scale()` classifies an interval that is only near a decimal value. For 
example, with `interval = 0.10000000000000002` and `offset = 0`, 
`decimal_scale(interval)` returns scale 1 because `interval * 10` is within the 
epsilon tolerance of `1`. Then `bucket_key(0.1)` normalizes the rounded raw 
boundary back to `0.1`, this equality succeeds, and the value is assigned to 
key `1`, even though `(0.1 - 0) / 0.10000000000000002` is below `1` and should 
floor to bucket `0`. This is distinct from the already-raised key-space 
tolerance issue because the current code removed that tolerance but still gets 
the same kind of mis-bucketing through tolerant scale inference plus 
normalized-boundary equality. Please only enable decimal boundary snapping when 
the interval/offset are exactly represented by the inferred scale, or otherwise 
ensure snapping cannot cross a raw boundary that is greater than the input 
value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to