[ 
https://issues.apache.org/jira/browse/MATH-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986125#comment-17986125
 ] 

Ruiqi Dong commented on MATH-1678:
----------------------------------

Thank you for the detailed analysis. I agree that overflow with opposite 
extreme values is indeed a common edge case issue. Between the two approaches 
you mentioned, I think returning a mathematically reasonable value through 
scaling would be more user-friendly than throwing an exception, as it maintains 
the continuity of the distance function and allows algorithms using this 
distance measure to continue working without special error handling. However, I 
understand there are trade-offs with both approaches, and I appreciate your 
thorough consideration of this issue.

> CanberraDistance returns NaN for extreme opposite values due to overflow
> ------------------------------------------------------------------------
>
>                 Key: MATH-1678
>                 URL: https://issues.apache.org/jira/browse/MATH-1678
>             Project: Commons Math
>          Issue Type: Bug
>          Components: legacy
>    Affects Versions: 3.6.1
>            Reporter: Ruiqi Dong
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The {{CanberraDistance.compute()}} method returns {{NaN}} when computing the 
> distance between extreme opposite values (e.g., {{Double.MAX_VALUE}} and 
> {{{}Double.MAX_VALUE{}}}). This occurs due to floating-point overflow during 
> the calculation.
> When computing the Canberra distance for opposite extreme values:
>  * The numerator {{|a[i] - b[i]|}} overflows to {{Infinity}}
>  * The denominator {{|a[i]| + |b[i]|}} also overflows to {{Infinity}}
>  * The division {{Infinity / Infinity}} results in {{NaN}}
> Mathematically, the Canberra distance between {{MAX_VALUE}} and {{MAX_VALUE}} 
> should be 1.0, not {{{}NaN{}}}.
> Test Case:
> @Test
> void testComputeWithDenominatorOverflow() {
>     CanberraDistance canberraDistance = new CanberraDistance();
>     double[] a = \{Double.MAX_VALUE};
>     double[] b = \{Double.MAX_VALUE};
>     assertEquals(1.0, canberraDistance.compute(a, b), 1e-12,
>         "Should handle extreme opposite values correctly");
> }
> Test Result:
> [*ERROR*] 
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow
>  -- Time elapsed: 0.043 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: Should handle denominator overflow 
> correctly ==> expected: <1.0> but was: <NaN>
>  at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>  at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>  at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>  at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:86)
>  at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1024)
>  at 
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow(CanberraDistanceTest.java:123)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.util.ArrayList.forEach(ArrayList.java:1259)
>  at java.util.ArrayList.forEach(ArrayList.java:1259)
> Suggested Fix:
> @Override
> public double compute(double[] a, double[] b) {
>     MathArrays.checkEqualLength(a, b);
>     double sum = 0;
>     for (int i = 0; i < a.length; i++) {
>         // Check for potential overflow with opposite extreme values
>         if (Math.abs(a[i]) > Double.MAX_VALUE / 2 && 
>             Math.abs(b[i]) > Double.MAX_VALUE / 2 && 
>             Math.signum(a[i]) != Math.signum(b[i])) {
>             // Return mathematically correct result for opposite extreme 
> values
>             sum += 1.0;
>         } else {
>             final double num = JdkMath.abs(a[i] - b[i]);
>             final double denom = JdkMath.abs(a[i]) + JdkMath.abs(b[i]);
>             sum += num == 0.0 && denom == 0.0 ? 0.0 : num / denom;
>         }
>     }
>     return sum;
> }
>  
> *Impact:* While extreme values like {{Double.MAX_VALUE}} are rare in 
> practice, a robust mathematical library should handle all valid inputs 
> correctly. The current behavior violates the mathematical definition of 
> Canberra distance and could cause issues in downstream applications that 
> don't expect {{NaN}} values from distance calculations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to