[ 
https://issues.apache.org/jira/browse/MATH-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986116#comment-17986116
 ] 

Alex Herbert commented on MATH-1678:
------------------------------------

Note that overflow between two opposite value numbers is a common issue that is 
frequently unsupported as an edge case. For example the JDKs RandomGenerator 
implementations do not support ranges larger than Double.MAX_VALUE:
{noformat}
jshell
|  Welcome to JShell -- Version 17.0.15
|  For an introduction type: /help intro
jshell> var rng = new SplittableRandom()
rng ==> java.util.SplittableRandom@7530d0a

jshell> var d = Double.MAX_VALUE
d ==> 1.7976931348623157E308

jshell> rng.nextDouble(-d/2, d/2)
$7 ==> 8.707465608097585E307

jshell> rng.nextDouble(-d, d)
|  Exception java.lang.IllegalArgumentException: bound must be greater than 
origin
|        at RandomSupport.checkRange (RandomSupport.java:218)
|        at RandomGenerator.nextDouble (RandomGenerator.java:615)
|        at (#8:1)
{noformat}

An alternative would be to throw an exception.

> CanberraDistance returns NaN for extreme opposite values due to overflow
> ------------------------------------------------------------------------
>
>                 Key: MATH-1678
>                 URL: https://issues.apache.org/jira/browse/MATH-1678
>             Project: Commons Math
>          Issue Type: Bug
>          Components: legacy
>    Affects Versions: 3.6.1
>            Reporter: Ruiqi Dong
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The {{CanberraDistance.compute()}} method returns {{NaN}} when computing the 
> distance between extreme opposite values (e.g., {{Double.MAX_VALUE}} and 
> {{{}Double.MAX_VALUE{}}}). This occurs due to floating-point overflow during 
> the calculation.
> When computing the Canberra distance for opposite extreme values:
>  * The numerator {{|a[i] - b[i]|}} overflows to {{Infinity}}
>  * The denominator {{|a[i]| + |b[i]|}} also overflows to {{Infinity}}
>  * The division {{Infinity / Infinity}} results in {{NaN}}
> Mathematically, the Canberra distance between {{MAX_VALUE}} and {{MAX_VALUE}} 
> should be 1.0, not {{{}NaN{}}}.
> Test Case:
> @Test
> void testComputeWithDenominatorOverflow() {
>     CanberraDistance canberraDistance = new CanberraDistance();
>     double[] a = \{Double.MAX_VALUE};
>     double[] b = \{Double.MAX_VALUE};
>     assertEquals(1.0, canberraDistance.compute(a, b), 1e-12,
>         "Should handle extreme opposite values correctly");
> }
> Test Result:
> [*ERROR*] 
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow
>  -- Time elapsed: 0.043 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: Should handle denominator overflow 
> correctly ==> expected: <1.0> but was: <NaN>
>  at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>  at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>  at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>  at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:86)
>  at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1024)
>  at 
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow(CanberraDistanceTest.java:123)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.util.ArrayList.forEach(ArrayList.java:1259)
>  at java.util.ArrayList.forEach(ArrayList.java:1259)
> Suggested Fix:
> @Override
> public double compute(double[] a, double[] b) {
>     MathArrays.checkEqualLength(a, b);
>     double sum = 0;
>     for (int i = 0; i < a.length; i++) {
>         // Check for potential overflow with opposite extreme values
>         if (Math.abs(a[i]) > Double.MAX_VALUE / 2 && 
>             Math.abs(b[i]) > Double.MAX_VALUE / 2 && 
>             Math.signum(a[i]) != Math.signum(b[i])) {
>             // Return mathematically correct result for opposite extreme 
> values
>             sum += 1.0;
>         } else {
>             final double num = JdkMath.abs(a[i] - b[i]);
>             final double denom = JdkMath.abs(a[i]) + JdkMath.abs(b[i]);
>             sum += num == 0.0 && denom == 0.0 ? 0.0 : num / denom;
>         }
>     }
>     return sum;
> }
>  
> *Impact:* While extreme values like {{Double.MAX_VALUE}} are rare in 
> practice, a robust mathematical library should handle all valid inputs 
> correctly. The current behavior violates the mathematical definition of 
> Canberra distance and could cause issues in downstream applications that 
> don't expect {{NaN}} values from distance calculations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to