[
https://issues.apache.org/jira/browse/NUMBERS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345769#comment-17345769
]
Alex Herbert edited comment on NUMBERS-156 at 5/16/21, 10:43 PM:
-----------------------------------------------------------------
Thanks for the great benchmark.
So extLinear has 1 ULP max error and the Kahan summation has 2 ULP max error.
The standard precision methods have max ULP of 8 or 6. So the extended
precision has an effect that we can measure with this benchmark.
The enorm has worse max ULP and std dev ULP than enormMod and is also slower.
So this is a definite case to switch to enormMod.
If you compare the old enorm method to the enormModKahan method then the speed
slow down is not so much at short lengths. I am puzzled by the slowdown at long
lengths for the enormModKahan method. But it may be that at longer lengths we
see less of an effect from the JMH overhead in the timings. The relative
difference between the methods at lengths 40 to 100 appear constant.
The mid data (exponent of -10 to 10) is the most important data here. In this
case you can gain a lot of accuracy by using the enormModKahan method with a
performance loss that is noticeable but the speed is in the same order of
magnitude.
The Kahan summation requires an extra 3 additions per term. The method could be
changed to use the extended precision method used in LinearCombination to use 5
extra additions per term. I am not sure if this will make the method more
robust. The wikipedia page mentions alternatives to Kahan summation, some of
which will use this two sum algorithm from Knuth.
The code is essentially the same but uses the ExtendedPrecision class to
compute the compensation:
{code:java}
public static double value(double[] v) {
// Sum of big, normal and small numbers with 2-fold extended precision
summation
double s1 = 0;
double s2 = 0;
double s3 = 0;
double c1 = 0;
double c2 = 0;
double c3 = 0;
for (int i = 0; i < v.length; i++) {
final double x = Math.abs(v[i]);
if (x > 0x1.0p500) {
// Scale down big numbers
final double y = square(x * 0x1.0p-600) + c1;
final double t = s1 + y;
c1 = ExtendedPrecision.twoSumLow(s1, y, t);
s1 = t;
} else if (x < 0x1.0p-500) {
// Scale up small numbers
final double y = square(x * 0x1.0p600) + c3;
final double t = s3 + y;
c3 = ExtendedPrecision.twoSumLow(s3, y, t);
s3 = t;
} else {
// Unscaled
final double y = square(x) + c2;
final double t = s2 + y;
c2 = ExtendedPrecision.twoSumLow(s2, y, t);
s2 = t;
}
}
// The highest sum is the significant component. Add the next significant.
// Adapted from LinearCombination dot2s summation.
if (s1 != 0) {
s2 = s2 * 0x1.0p-600 * 0x1.0p-600;
c2 = c2 * 0x1.0p-600 * 0x1.0p-600;
final double sum = s1 + s2;
// Add the round-off from the main sum to the other round-off components
c1 += ExtendedPrecision.twoSumLow(s1, s2, sum) + c2;
return Math.sqrt(sum + c1) * 0x1.0p600;
} else if (s2 != 0) {
s3 = s3 * 0x1.0p-600 * 0x1.0p-600;
c3 = c3 * 0x1.0p-600 * 0x1.0p-600;
final double sum = s2 + s3;
// Add the round-off from the main sum to the other round-off
components
c2 += ExtendedPrecision.twoSumLow(s2, s3, sum) + c3;
return Math.sqrt(sum + c2);
}
return Math.sqrt(s3) * 0x1.0p-600;
}
{code}
Could you try this method? It may not change the speed as the twoSumLow method
is a single line of code that may be efficiently inlined.
I would be inclined to use an extended precision computation. Note that the
JDKs Math.hypot(x, y) method is done in extended precision. However that uses
split multiplication of the inputs to create the squares and so would be closer
to the LinearCombination variant here rather than just the extended precision
summation.
was (Author: alexherbert):
Thanks for the great benchmark.
So extLinear has 1 ULP max error and the Kahan summation has 2 ULP max error.
The standard precision methods have max ULP of 8 or 6. So the extended
precision has an effect that we can measure with this benchmark.
The enorm has worse max ULP and std dev ULP than enormMod and is also slower.
So this is a definite case to switch to enormMod.
If you compare the old enorm method to the enormModKahan method then the speed
slow down is not so much at short lengths. I am puzzled by the slowdown at long
lengths for the enormModKahan method. But it may be that at longer lengths we
see less of an effect from the JMH overhead in the timings. The relative
difference between the methods at lengths 40 to 100 appear constant.
The mid data (exponent of -10 to 10) is the most important data here. In this
case you can gain a lot of accuracy by using the enormModKahan method with a
performance loss that is noticeable but the speed is in the same order of
magnitude.
The Kahan summation requires an extra 3 additions per term. The method could be
changed to use the extended precision method used in LinearCombination to use 5
extra additions per term. I am not sure if this will make the method more
robust. The wikipedia page mentions alternatives to Kahan summation, some of
which will use this two sum algorithm from Knuth.
The code is essentially the same but uses the ExtendedPrecision class to
compute the compensation:
{code:java}
public static double value(double[] v) {
// Sum of big, normal and small numbers with 2-fold extended precision
summation
double s1 = 0;
double s2 = 0;
double s3 = 0;
double c1 = 0;
double c2 = 0;
double c3 = 0;
for (int i = 0; i < v.length; i++) {
final double x = Math.abs(v[i]);
if (x > 0x1.0p500) {
// Scale down big numbers
final double y = square(x * 0x1.0p-600);
final double t = s1 + y;
c1 = ExtendedPrecision.twoSumLow(s1, y, t);
s1 = t;
} else if (x < 0x1.0p-500) {
// Scale up small numbers
final double y = square(x * 0x1.0p600);
final double t = s3 + y;
c3 = ExtendedPrecision.twoSumLow(s3, y, t);
s3 = t;
} else {
// Unscaled
final double y = square(x);
final double t = s2 + y;
c2 = ExtendedPrecision.twoSumLow(s2, y, t);
s2 = t;
}
}
// The highest sum is the significant component. Add the next significant.
// Adapted from LinearCombination dot2s summation.
if (s1 != 0) {
s2 = s2 * 0x1.0p-600 * 0x1.0p-600;
c2 = c2 * 0x1.0p-600 * 0x1.0p-600;
final double sum = s1 + s2;
// Add the round-off from the main sum to the other round-off components
c1 += ExtendedPrecision.twoSumLow(s1, s2, sum) + c2;
return Math.sqrt(sum + c1) * 0x1.0p600;
} else if (s2 != 0) {
s3 = s3 * 0x1.0p-600 * 0x1.0p-600;
c3 = c3 * 0x1.0p-600 * 0x1.0p-600;
final double sum = s2 + s3;
// Add the round-off from the main sum to the other round-off
components
c2 += ExtendedPrecision.twoSumLow(s2, s3, sum) + c3;
return Math.sqrt(sum + c2);
}
return Math.sqrt(s3) * 0x1.0p-600;
}
{code}
Could you try this method? It may not change the speed as the twoSumLow method
is a single line of code that may be efficiently inlined.
I would be inclined to use an extended precision computation. Note that the
JDKs Math.hypot(x, y) method is done in extended precision. However that uses
split multiplication of the inputs to create the squares and so would be closer
to the LinearCombination variant here rather than just the extended precision
summation.
> SafeNorm 3D overload
> --------------------
>
> Key: NUMBERS-156
> URL: https://issues.apache.org/jira/browse/NUMBERS-156
> Project: Commons Numbers
> Issue Type: Improvement
> Reporter: Matt Juntunen
> Priority: Major
> Attachments: performance-all.png, performance-len-1-5.png
>
>
> We should create an overload of {{SafeNorm.value}} that accepts 3 arguments
> to potentially improve performance for 3D vectors.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)