gemini-code-assist[bot] commented on code in PR #38871:
URL: https://github.com/apache/beam/pull/38871#discussion_r3405410730
##########
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java:
##########
@@ -133,7 +147,19 @@ public Coder<VarianceAccumulator> getAccumulatorCoder(
@Override
public T extractOutput(VarianceAccumulator accumulator) {
- return decimalConverter.apply(getVariance(accumulator));
+ BigDecimal result = getVariance(accumulator);
+ if (result != null && isStddev) {
+ double doubleVal = result.doubleValue();
+ if (doubleVal < 0.0) {
+ doubleVal = 0.0; // Clamp negative variance due to numerical
instability
+ }
+ double sqrtVal = Math.sqrt(doubleVal);
+ if (Double.isInfinite(sqrtVal)) {
+ throw new ArithmeticException("Standard deviation overflow: result is
infinity");
+ }
+ result = BigDecimal.valueOf(sqrtVal);
+ }
+ return decimalConverter.apply(result);
}
Review Comment:

Converting the `BigDecimal` variance to a `double` before taking the square
root can lead to intermediate overflow or underflow:
1. **Intermediate Overflow**: If the variance is extremely large (e.g., $>
1.79 \times 10^{308}$), `result.doubleValue()` overflows to
`Double.POSITIVE_INFINITY`, causing `Math.sqrt` to return
`Double.POSITIVE_INFINITY` and throwing an `ArithmeticException`. However, the
actual standard deviation (the square root of the variance) could be well
within the representable range of `double` (up to $\approx 3.2 \times 10^{616}$
variance).
2. **Intermediate Underflow**: If the variance is extremely small (e.g., $<
4.9 \times 10^{-324}$), `result.doubleValue()` underflows to `0.0`, resulting
in a standard deviation of `0.0`, even though the actual standard deviation
(e.g., $10^{-160}$ for a variance of $10^{-320}$) is perfectly representable as
a normal double.
We can avoid these intermediate overflow/underflow issues by scaling the
`BigDecimal` to a safe range before converting to `double` for the square root
operation, and then scaling the result back.
```java
@Override
public T extractOutput(VarianceAccumulator accumulator) {
BigDecimal result = getVariance(accumulator);
if (result != null && isStddev) {
if (result.compareTo(BigDecimal.ZERO) <= 0) {
result = BigDecimal.ZERO;
} else {
double sqrtVal;
BigDecimal upperLimit = new BigDecimal("1e300");
BigDecimal lowerLimit = new BigDecimal("1e-300");
if (result.compareTo(upperLimit) > 0) {
sqrtVal = Math.sqrt(result.movePointLeft(300).doubleValue()) *
1e150;
} else if (result.compareTo(lowerLimit) < 0) {
sqrtVal = Math.sqrt(result.movePointRight(300).doubleValue()) *
1e-150;
} else {
sqrtVal = Math.sqrt(result.doubleValue());
}
if (Double.isInfinite(sqrtVal)) {
throw new ArithmeticException("Standard deviation overflow: result
is infinity");
}
result = BigDecimal.valueOf(sqrtVal);
}
}
return decimalConverter.apply(result);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]