[jira] [Comment Edited] (NUMBERS-156) SafeNorm 3D overload

Alex Herbert (Jira) Sun, 16 May 2021 15:44:04 -0700


    [ 
https://issues.apache.org/jira/browse/NUMBERS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345769#comment-17345769
 ]


Alex Herbert edited comment on NUMBERS-156 at 5/16/21, 10:43 PM:
-----------------------------------------------------------------

Thanks for the great benchmark.

So extLinear has 1 ULP max error and the Kahan summation has 2 ULP max error. 
The standard precision methods have max ULP of 8 or 6. So the extended 
precision has an effect that we can measure with this benchmark.

The enorm has worse max ULP and std dev ULP than enormMod and is also slower. 
So this is a definite case to switch to enormMod.

If you compare the old enorm method to the enormModKahan method then the speed 
slow down is not so much at short lengths. I am puzzled by the slowdown at long 
lengths for the enormModKahan method. But it may be that at longer lengths we 
see less of an effect from the JMH overhead in the timings. The relative 
difference between the methods at lengths 40 to 100 appear constant.

The mid data (exponent of -10 to 10) is the most important data here. In this 
case you can gain a lot of accuracy by using the enormModKahan method with a 
performance loss that is noticeable but the speed is in the same order of 
magnitude.

The Kahan summation requires an extra 3 additions per term. The method could be 
changed to use the extended precision method used in LinearCombination to use 5 
extra additions per term. I am not sure if this will make the method more 
robust. The wikipedia page mentions alternatives to Kahan summation, some of 
which will use this two sum algorithm from Knuth.

The code is essentially the same but uses the ExtendedPrecision class to 
compute the compensation:
{code:java}
public static double value(double[] v) {
    // Sum of big, normal and small numbers with 2-fold extended precision 
summation
    double s1 = 0;
    double s2 = 0;
    double s3 = 0;
    double c1 = 0;
    double c2 = 0;
    double c3 = 0;
    for (int i = 0; i < v.length; i++) {
        final double x = Math.abs(v[i]);
        if (x > 0x1.0p500) {
            // Scale down big numbers
            final double y = square(x * 0x1.0p-600) + c1;
            final double t = s1 + y;
            c1 = ExtendedPrecision.twoSumLow(s1, y, t);
            s1 = t;
        } else if (x < 0x1.0p-500) {
            // Scale up small numbers
            final double y = square(x * 0x1.0p600) + c3;
            final double t = s3 + y;
            c3 = ExtendedPrecision.twoSumLow(s3, y, t);
            s3 = t;
        } else {
            // Unscaled
            final double y = square(x) + c2;
            final double t = s2 + y;
            c2 = ExtendedPrecision.twoSumLow(s2, y, t);
            s2 = t;
        }
    }
    // The highest sum is the significant component. Add the next significant.
    // Adapted from LinearCombination dot2s summation.
    if (s1 != 0) {
        s2 = s2 * 0x1.0p-600 * 0x1.0p-600;
        c2 = c2 * 0x1.0p-600 * 0x1.0p-600;
        final double sum = s1 + s2;
        // Add the round-off from the main sum to the other round-off components
        c1 += ExtendedPrecision.twoSumLow(s1, s2, sum) + c2;
        return Math.sqrt(sum + c1) * 0x1.0p600;
    } else if (s2 != 0) {
        s3 = s3 * 0x1.0p-600 * 0x1.0p-600;
        c3 = c3 * 0x1.0p-600 * 0x1.0p-600;
        final double sum = s2 + s3;
        // Add the round-off from the main sum to the other round-off 
components 
        c2 += ExtendedPrecision.twoSumLow(s2, s3, sum) + c3;
        return Math.sqrt(sum + c2);
    }
    return Math.sqrt(s3) * 0x1.0p-600;
}
{code}
Could you try this method? It may not change the speed as the twoSumLow method 
is a single line of code that may be efficiently inlined.

I would be inclined to use an extended precision computation. Note that the 
JDKs Math.hypot(x, y) method is done in extended precision. However that uses 
split multiplication of the inputs to create the squares and so would be closer 
to the LinearCombination variant here rather than just the extended precision 
summation.

 


was (Author: alexherbert):
Thanks for the great benchmark.

So extLinear has 1 ULP max error and the Kahan summation has 2 ULP max error. 
The standard precision methods have max ULP of 8 or 6. So the extended 
precision has an effect that we can measure with this benchmark.

The enorm has worse max ULP and std dev ULP than enormMod and is also slower. 
So this is a definite case to switch to enormMod.

If you compare the old enorm method to the enormModKahan method then the speed 
slow down is not so much at short lengths. I am puzzled by the slowdown at long 
lengths for the enormModKahan method. But it may be that at longer lengths we 
see less of an effect from the JMH overhead in the timings. The relative 
difference between the methods at lengths 40 to 100 appear constant.

The mid data (exponent of -10 to 10) is the most important data here. In this 
case you can gain a lot of accuracy by using the enormModKahan method with a 
performance loss that is noticeable but the speed is in the same order of 
magnitude.

The Kahan summation requires an extra 3 additions per term. The method could be 
changed to use the extended precision method used in LinearCombination to use 5 
extra additions per term. I am not sure if this will make the method more 
robust. The wikipedia page mentions alternatives to Kahan summation, some of 
which will use this two sum algorithm from Knuth.

The code is essentially the same but uses the ExtendedPrecision class to 
compute the compensation:
{code:java}
public static double value(double[] v) {
    // Sum of big, normal and small numbers with 2-fold extended precision 
summation
    double s1 = 0;
    double s2 = 0;
    double s3 = 0;
    double c1 = 0;
    double c2 = 0;
    double c3 = 0;
    for (int i = 0; i < v.length; i++) {
        final double x = Math.abs(v[i]);
        if (x > 0x1.0p500) {
            // Scale down big numbers
            final double y = square(x * 0x1.0p-600);
            final double t = s1 + y;
            c1 = ExtendedPrecision.twoSumLow(s1, y, t);
            s1 = t;
        } else if (x < 0x1.0p-500) {
            // Scale up small numbers
            final double y = square(x * 0x1.0p600);
            final double t = s3 + y;
            c3 = ExtendedPrecision.twoSumLow(s3, y, t);
            s3 = t;
        } else {
            // Unscaled
            final double y = square(x);
            final double t = s2 + y;
            c2 = ExtendedPrecision.twoSumLow(s2, y, t);
            s2 = t;
        }
    }
    // The highest sum is the significant component. Add the next significant.
    // Adapted from LinearCombination dot2s summation.
    if (s1 != 0) {
        s2 = s2 * 0x1.0p-600 * 0x1.0p-600;
        c2 = c2 * 0x1.0p-600 * 0x1.0p-600;
        final double sum = s1 + s2;
        // Add the round-off from the main sum to the other round-off components
        c1 += ExtendedPrecision.twoSumLow(s1, s2, sum) + c2;
        return Math.sqrt(sum + c1) * 0x1.0p600;
    } else if (s2 != 0) {
        s3 = s3 * 0x1.0p-600 * 0x1.0p-600;
        c3 = c3 * 0x1.0p-600 * 0x1.0p-600;
        final double sum = s2 + s3;
        // Add the round-off from the main sum to the other round-off 
components 
        c2 += ExtendedPrecision.twoSumLow(s2, s3, sum) + c3;
        return Math.sqrt(sum + c2);
    }
    return Math.sqrt(s3) * 0x1.0p-600;
}
{code}
Could you try this method? It may not change the speed as the twoSumLow method 
is a single line of code that may be efficiently inlined.

I would be inclined to use an extended precision computation. Note that the 
JDKs Math.hypot(x, y) method is done in extended precision. However that uses 
split multiplication of the inputs to create the squares and so would be closer 
to the LinearCombination variant here rather than just the extended precision 
summation.

 

> SafeNorm 3D overload
> --------------------
>
>                 Key: NUMBERS-156
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-156
>             Project: Commons Numbers
>          Issue Type: Improvement
>            Reporter: Matt Juntunen
>            Priority: Major
>         Attachments: performance-all.png, performance-len-1-5.png
>
>
> We should create an overload of {{SafeNorm.value}} that accepts 3 arguments 
> to potentially improve performance for 3D vectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (NUMBERS-156) SafeNorm 3D overload

Reply via email to