Re: [PR] fix: percentile_cont interpolation causes NaN for f16 input [datafusion]

via GitHub Wed, 11 Feb 2026 18:25:11 -0800


Jefffrey commented on code in PR #20208:
URL: https://github.com/apache/datafusion/pull/20208#discussion_r2796462050



##########
datafusion/functions-aggregate/src/percentile_cont.rs:
##########
@@ -772,22 +787,49 @@ fn calculate_percentile<T: ArrowNumericType>(
             let (_, upper_value, _) = 
values.select_nth_unstable_by(upper_index, cmp);
             let upper_value = *upper_value;
 
-            // Linear interpolation using wrapping arithmetic
-            // We use wrapping operations here (matching the approach in 
median.rs) because:
-            // 1. Both values come from the input data, so diff is bounded by 
the value range
-            // 2. fraction is between 0 and 1, and INTERPOLATION_PRECISION is 
small enough
-            //    to prevent overflow when combined with typical numeric ranges
-            // 3. The result is guaranteed to be between lower_value and 
upper_value
-            // 4. For floating-point types, wrapping ops behave the same as 
standard ops
+            // Linear interpolation.
+            //
+            // We quantize the fractional component (via 
`INTERPOLATION_PRECISION`) to
+            // minimize output changes for Float32/Float64 compared to the 
previous

Review Comment:
   What previous implementation are we referring to here?



##########
datafusion/functions-aggregate/src/percentile_cont.rs:
##########
@@ -772,22 +787,49 @@ fn calculate_percentile<T: ArrowNumericType>(
             let (_, upper_value, _) = 
values.select_nth_unstable_by(upper_index, cmp);
             let upper_value = *upper_value;
 
-            // Linear interpolation using wrapping arithmetic
-            // We use wrapping operations here (matching the approach in 
median.rs) because:
-            // 1. Both values come from the input data, so diff is bounded by 
the value range
-            // 2. fraction is between 0 and 1, and INTERPOLATION_PRECISION is 
small enough
-            //    to prevent overflow when combined with typical numeric ranges
-            // 3. The result is guaranteed to be between lower_value and 
upper_value
-            // 4. For floating-point types, wrapping ops behave the same as 
standard ops
+            // Linear interpolation.
+            //
+            // We quantize the fractional component (via 
`INTERPOLATION_PRECISION`) to
+            // minimize output changes for Float32/Float64 compared to the 
previous
+            // implementation.
+            //
+            // We perform the arithmetic in f64 and then cast back to the 
native type to
+            // avoid overflowing Float16 intermediates.
             let fraction = index - (lower_index as f64);
-            let diff = upper_value.sub_wrapping(lower_value);
-            let interpolated = lower_value.add_wrapping(
-                diff.mul_wrapping(T::Native::usize_as(
-                    (fraction * INTERPOLATION_PRECISION as f64) as usize,
-                ))
-                .div_wrapping(T::Native::usize_as(INTERPOLATION_PRECISION)),
-            );
-            Some(interpolated)
+            let scaled = (fraction * INTERPOLATION_PRECISION as f64) as usize;

Review Comment:
   If we always cast `INTERPOLATION_PRECISION` to `f64` at all its usages we 
should just define it as `f64`



##########
datafusion/functions-aggregate/src/percentile_cont.rs:
##########
@@ -58,16 +58,14 @@ use datafusion_macros::user_doc;
 
 use crate::utils::validate_percentile_expr;
 
-/// Precision multiplier for linear interpolation calculations.
+/// Precision multiplier used to quantize the fractional component of the
+/// interpolation weight.
 ///
-/// This value of 1,000,000 was chosen to balance precision with overflow 
safety:
-/// - Provides 6 decimal places of precision for the fractional component
-/// - Small enough to avoid overflow when multiplied with typical numeric 
values
-/// - Sufficient precision for most statistical applications
+/// We keep this quantization to minimize output changes for Float32/Float64
+/// compared to the previous implementation.
 ///
-/// The interpolation formula: `lower + (upper - lower) * fraction`
-/// is computed as: `lower + ((upper - lower) * (fraction * PRECISION)) / 
PRECISION`
-/// to avoid floating-point operations on integer types while maintaining 
precision.
+/// Interpolation is performed in f64 and then cast back to the native type to
+/// avoid overflowing Float16 intermediates.

Review Comment:
   I'd still prefer we keep the old documentation as it was and just add on any 
amendments; I don't like how it reads about a "previous implementation" 
considering there is no information about such implementation (without looking 
at the git history)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: percentile_cont interpolation causes NaN for f16 input [datafusion]

Reply via email to