Abacn commented on code in PR #38871:
URL: https://github.com/apache/beam/pull/38871#discussion_r3406322626
##########
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java:
##########
@@ -133,7 +147,19 @@ public Coder<VarianceAccumulator> getAccumulatorCoder(
@Override
public T extractOutput(VarianceAccumulator accumulator) {
- return decimalConverter.apply(getVariance(accumulator));
+ BigDecimal result = getVariance(accumulator);
+ if (result != null && isStddev) {
Review Comment:
Another side note, as also said in the desciption, stddev is simple sqrt of
variance. One line should do the work, and much of the code here are defensive
edge case handling.
##########
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java:
##########
@@ -133,7 +147,19 @@ public Coder<VarianceAccumulator> getAccumulatorCoder(
@Override
public T extractOutput(VarianceAccumulator accumulator) {
- return decimalConverter.apply(getVariance(accumulator));
+ BigDecimal result = getVariance(accumulator);
+ if (result != null && isStddev) {
+ double doubleVal = result.doubleValue();
+ if (doubleVal < 0.0) {
Review Comment:
Just a side note: If there is numerical instability it's likely due to
VarianceAccumulator calculating variance in an unstable form:
https://github.com/apache/beam/blob/6c083df5f8ed39b46aa02e2c90594cc3a7599a21/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceAccumulator.java#L69
A robust form more is `<x^2> - <x>^2` that is to saving sum, square sum,
count in the accumulator.
##########
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/agg/VarianceFn.java:
##########
@@ -133,7 +147,19 @@ public Coder<VarianceAccumulator> getAccumulatorCoder(
@Override
public T extractOutput(VarianceAccumulator accumulator) {
- return decimalConverter.apply(getVariance(accumulator));
+ BigDecimal result = getVariance(accumulator);
+ if (result != null && isStddev) {
+ double doubleVal = result.doubleValue();
+ if (doubleVal < 0.0) {
+ doubleVal = 0.0; // Clamp negative variance due to numerical
instability
+ }
+ double sqrtVal = Math.sqrt(doubleVal);
+ if (Double.isInfinite(sqrtVal)) {
+ throw new ArithmeticException("Standard deviation overflow: result is
infinity");
Review Comment:
Probably jus return Double.Infinite to keep the same behavior as variance
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]