[
https://issues.apache.org/jira/browse/MATH-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986116#comment-17986116
]
Alex Herbert commented on MATH-1678:
------------------------------------
Note that overflow between two opposite value numbers is a common issue that is
frequently unsupported as an edge case. For example the JDKs RandomGenerator
implementations do not support ranges larger than Double.MAX_VALUE:
{noformat}
jshell
| Welcome to JShell -- Version 17.0.15
| For an introduction type: /help intro
jshell> var rng = new SplittableRandom()
rng ==> java.util.SplittableRandom@7530d0a
jshell> var d = Double.MAX_VALUE
d ==> 1.7976931348623157E308
jshell> rng.nextDouble(-d/2, d/2)
$7 ==> 8.707465608097585E307
jshell> rng.nextDouble(-d, d)
| Exception java.lang.IllegalArgumentException: bound must be greater than
origin
| at RandomSupport.checkRange (RandomSupport.java:218)
| at RandomGenerator.nextDouble (RandomGenerator.java:615)
| at (#8:1)
{noformat}
An alternative would be to throw an exception.
> CanberraDistance returns NaN for extreme opposite values due to overflow
> ------------------------------------------------------------------------
>
> Key: MATH-1678
> URL: https://issues.apache.org/jira/browse/MATH-1678
> Project: Commons Math
> Issue Type: Bug
> Components: legacy
> Affects Versions: 3.6.1
> Reporter: Ruiqi Dong
> Priority: Minor
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The {{CanberraDistance.compute()}} method returns {{NaN}} when computing the
> distance between extreme opposite values (e.g., {{Double.MAX_VALUE}} and
> {{{}Double.MAX_VALUE{}}}). This occurs due to floating-point overflow during
> the calculation.
> When computing the Canberra distance for opposite extreme values:
> * The numerator {{|a[i] - b[i]|}} overflows to {{Infinity}}
> * The denominator {{|a[i]| + |b[i]|}} also overflows to {{Infinity}}
> * The division {{Infinity / Infinity}} results in {{NaN}}
> Mathematically, the Canberra distance between {{MAX_VALUE}} and {{MAX_VALUE}}
> should be 1.0, not {{{}NaN{}}}.
> Test Case:
> @Test
> void testComputeWithDenominatorOverflow() {
> CanberraDistance canberraDistance = new CanberraDistance();
> double[] a = \{Double.MAX_VALUE};
> double[] b = \{Double.MAX_VALUE};
> assertEquals(1.0, canberraDistance.compute(a, b), 1e-12,
> "Should handle extreme opposite values correctly");
> }
> Test Result:
> [*ERROR*]
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow
> -- Time elapsed: 0.043 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: Should handle denominator overflow
> correctly ==> expected: <1.0> but was: <NaN>
> at
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:86)
> at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1024)
> at
> org.apache.commons.math4.legacy.ml.distance.CanberraDistanceTest.testComputeWithDenominatorOverflow(CanberraDistanceTest.java:123)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> Suggested Fix:
> @Override
> public double compute(double[] a, double[] b) {
> MathArrays.checkEqualLength(a, b);
> double sum = 0;
> for (int i = 0; i < a.length; i++) {
> // Check for potential overflow with opposite extreme values
> if (Math.abs(a[i]) > Double.MAX_VALUE / 2 &&
> Math.abs(b[i]) > Double.MAX_VALUE / 2 &&
> Math.signum(a[i]) != Math.signum(b[i])) {
> // Return mathematically correct result for opposite extreme
> values
> sum += 1.0;
> } else {
> final double num = JdkMath.abs(a[i] - b[i]);
> final double denom = JdkMath.abs(a[i]) + JdkMath.abs(b[i]);
> sum += num == 0.0 && denom == 0.0 ? 0.0 : num / denom;
> }
> }
> return sum;
> }
>
> *Impact:* While extreme values like {{Double.MAX_VALUE}} are rare in
> practice, a robust mathematical library should handle all valid inputs
> correctly. The current behavior violates the mathematical definition of
> Canberra distance and could cause issues in downstream applications that
> don't expect {{NaN}} values from distance calculations.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)