gemini-code-assist[bot] commented on code in PR #38871:
URL: https://github.com/apache/beam/pull/38871#discussion_r3405410730


##########
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java:
##########
@@ -133,7 +147,19 @@ public Coder<VarianceAccumulator> getAccumulatorCoder(
 
   @Override
   public T extractOutput(VarianceAccumulator accumulator) {
-    return decimalConverter.apply(getVariance(accumulator));
+    BigDecimal result = getVariance(accumulator);
+    if (result != null && isStddev) {
+      double doubleVal = result.doubleValue();
+      if (doubleVal < 0.0) {
+        doubleVal = 0.0; // Clamp negative variance due to numerical 
instability
+      }
+      double sqrtVal = Math.sqrt(doubleVal);
+      if (Double.isInfinite(sqrtVal)) {
+        throw new ArithmeticException("Standard deviation overflow: result is 
infinity");
+      }
+      result = BigDecimal.valueOf(sqrtVal);
+    }
+    return decimalConverter.apply(result);
   }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Converting the `BigDecimal` variance to a `double` before taking the square 
root can lead to intermediate overflow or underflow:
   
   1. **Intermediate Overflow**: If the variance is extremely large (e.g., $> 
1.79 \times 10^{308}$), `result.doubleValue()` overflows to 
`Double.POSITIVE_INFINITY`, causing `Math.sqrt` to return 
`Double.POSITIVE_INFINITY` and throwing an `ArithmeticException`. However, the 
actual standard deviation (the square root of the variance) could be well 
within the representable range of `double` (up to $\approx 3.2 \times 10^{616}$ 
variance).
   2. **Intermediate Underflow**: If the variance is extremely small (e.g., $< 
4.9 \times 10^{-324}$), `result.doubleValue()` underflows to `0.0`, resulting 
in a standard deviation of `0.0`, even though the actual standard deviation 
(e.g., $10^{-160}$ for a variance of $10^{-320}$) is perfectly representable as 
a normal double.
   
   We can avoid these intermediate overflow/underflow issues by scaling the 
`BigDecimal` to a safe range before converting to `double` for the square root 
operation, and then scaling the result back.
   
   ```java
     @Override
     public T extractOutput(VarianceAccumulator accumulator) {
       BigDecimal result = getVariance(accumulator);
       if (result != null && isStddev) {
         if (result.compareTo(BigDecimal.ZERO) <= 0) {
           result = BigDecimal.ZERO;
         } else {
           double sqrtVal;
           BigDecimal upperLimit = new BigDecimal("1e300");
           BigDecimal lowerLimit = new BigDecimal("1e-300");
           if (result.compareTo(upperLimit) > 0) {
             sqrtVal = Math.sqrt(result.movePointLeft(300).doubleValue()) * 
1e150;
           } else if (result.compareTo(lowerLimit) < 0) {
             sqrtVal = Math.sqrt(result.movePointRight(300).doubleValue()) * 
1e-150;
           } else {
             sqrtVal = Math.sqrt(result.doubleValue());
           }
           if (Double.isInfinite(sqrtVal)) {
             throw new ArithmeticException("Standard deviation overflow: result 
is infinity");
           }
           result = BigDecimal.valueOf(sqrtVal);
         }
       }
       return decimalConverter.apply(result);
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to