[
https://issues.apache.org/jira/browse/MATH-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986115#comment-17986115
]
Alex Herbert commented on MATH-1678:
------------------------------------
The suggested fix will solve {{|a[i] - b[i]|}} overflow but not {{|a[i]| +
|b[i]|}} overflow, e.g. compute(Double.MAX_VALUE, Double.MAX_VALUE).
The logic in this class is odd:
{code:java}
final double num = JdkMath.abs(a[i] - b[i]);
final double denom = JdkMath.abs(a[i]) + JdkMath.abs(b[i]);
sum += num == 0.0 && denom == 0.0 ? 0.0 : num / denom; {code}
If num == 0 then the result can be returned as zero, regardless of denom. The
only way denom is zero is if both a and b are zero. It is a strange conditional
which seems redundant to check both num and denom where a check of either would
suffice.
This issue can be easily solved by downscaling a and b by 0.5. But this may
lead to 1 ULP errors when a or b, or (a-b) / 2 are sub-normal. A possible
solution is:
{code:java}
public double compute(double[] a, double[] b) {
MathArrays.checkEqualLength(a, b);
double sum = 0;
for (int i = 0; i < a.length; i++) {
if (Double.isFinite(a[i] - b[i])) {
sum += distance(a[i], b[i]);
} else {
sum += distance(a[i] * 0.5, b[i] * 0.5);
}
}
return sum;
}
private static double distance(double a, double b) {
final double num = JdkMath.abs(a - b);
final double denom = JdkMath.abs(a) + JdkMath.abs(b);
return denom == 0.0 ? 0.0 : num / denom;
}
{code}
> CanberraDistance returns NaN for extreme opposite values due to overflow
> ------------------------------------------------------------------------
>
> Key: MATH-1678
> URL: https://issues.apache.org/jira/browse/MATH-1678
> Project: Commons Math
> Issue Type: Bug
> Components: legacy
> Affects Versions: 3.6.1
> Reporter: Ruiqi Dong
> Priority: Minor
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The {{CanberraDistance.compute()}} method returns {{NaN}} when computing the
> distance between extreme opposite values (e.g., {{Double.MAX_VALUE}} and
> {{{}Double.MAX_VALUE{}}}). This occurs due to floating-point overflow during
> the calculation.
> When computing the Canberra distance for opposite extreme values:
> * The numerator {{|a[i] - b[i]|}} overflows to {{Infinity}}
> * The denominator {{|a[i]| + |b[i]|}} also overflows to {{Infinity}}
> * The division {{Infinity / Infinity}} results in {{NaN}}
> Mathematically, the Canberra distance between {{MAX_VALUE}} and {{MAX_VALUE}}
> should be 1.0, not {{{}NaN{}}}.
> Test Case:
> @Test
> void testComputeWithDenominatorOverflow() {
> CanberraDistance canberraDistance = new CanberraDistance();
> double[] a = \{Double.MAX_VALUE};
> double[] b = \{Double.MAX_VALUE};
> assertEquals(1.0, canberraDistance.compute(a, b), 1e-12,
> "Should handle extreme opposite values correctly");
> }
> Test Result:
> [*ERROR*]
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow
> -- Time elapsed: 0.043 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: Should handle denominator overflow
> correctly ==> expected: <1.0> but was: <NaN>
> at
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:86)
> at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1024)
> at
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow(CanberraDistanceTest.java:123)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> Suggested Fix:
> @Override
> public double compute(double[] a, double[] b) {
> MathArrays.checkEqualLength(a, b);
> double sum = 0;
> for (int i = 0; i < a.length; i++) {
> // Check for potential overflow with opposite extreme values
> if (Math.abs(a[i]) > Double.MAX_VALUE / 2 &&
> Math.abs(b[i]) > Double.MAX_VALUE / 2 &&
> Math.signum(a[i]) != Math.signum(b[i])) {
> // Return mathematically correct result for opposite extreme
> values
> sum += 1.0;
> } else {
> final double num = JdkMath.abs(a[i] - b[i]);
> final double denom = JdkMath.abs(a[i]) + JdkMath.abs(b[i]);
> sum += num == 0.0 && denom == 0.0 ? 0.0 : num / denom;
> }
> }
> return sum;
> }
>
> *Impact:* While extreme values like {{Double.MAX_VALUE}} are rare in
> practice, a robust mathematical library should handle all valid inputs
> correctly. The current behavior violates the mathematical definition of
> Canberra distance and could cause issues in downstream applications that
> don't expect {{NaN}} values from distance calculations.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)