[ 
https://issues.apache.org/jira/browse/NUMBERS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334854#comment-17334854
 ] 

Alex Herbert commented on NUMBERS-156:
--------------------------------------

SafeNorm has some issues.

I think the constants to check for over/underflow in SafeNorm are for 32-bit 
single precision floats. The original source implementation should be checked. 
At lot of the time scaling is done when a double would easily handle the 
unscaled sum.

Second the scaling of large or small values is done using divisions by an input 
value. This is not exact. So as soon as you start scaling there will be loss of 
precision over a standard unscaled sum, i.e.
{code:java}
double a = SafeNorm.value(IntStream.range(1, 100).mapToDouble(i -> i / 
Math.PI).toArray()) * 0x1.0p500;
double b = SafeNorm.value(IntStream.range(1, 100).mapToDouble(i -> i * 
0x1.0p500 / Math.PI).toArray());
// Standard sum
double c = Math.sqrt(IntStream.range(1, 100).mapToDouble(i -> i * 0x1.0p500 / 
Math.PI).map(x -> x * x).sum());
double d = Math.sqrt(IntStream.range(1, 100).mapToDouble(i -> i / 
Math.PI).map(x -> x * x).sum()) * 0x1.0p500;
System.out.println(a);
System.out.println(b);
System.out.println(c);
System.out.println(d);
assert c == d;
assert a != b;
assert a != c;
{code}
Outputs:
{noformat}
5.970579281879817E152
5.970579281879819E152
5.970579281879818E152
5.970579281879818E152
{noformat}
The actual values only differ by a few ULP but it shows SafeNorm is not stable 
to scaling and not the same as the unscaled sum.

I built a safe norm for the RNG line sampler I am working on. Currently it 
scales based on the maximum value. Ideally you scale up based on the minimum 
value (as the minimum value may underflow when squared in the case where the 
maximum will not) but this requires more checks.
{code:java}
    /**
     * Compute the Euclidean length of the vector. The input is scaled up or 
down
     * based on the maximum value to avoid over/underflow.
     *
     * @param x Vector
     * @return the length
     */
    static double length(double[] x) {
        double max = Math.abs(x[0]);
        for (int i = 1; i < x.length; i++) {
            // Math.max will return NaN for any NaN input
            max = Math.max(max, Math.abs(x[i]));
        }
        if (!Double.isFinite(max) || max == 0) {
            // No scaling possible for zero, infinite or NaN input
            return max; 
        }

        // Perform scaling
        double forward;
        double reverse;
        if (max > 0x1.0p500) {
            // Scale down big values
            forward = 0x1.0p-600;
            reverse = 0x1.0p600;
        } else if (max < 0x1.0p-500) {
            // Scale up small values
            forward = 0x1.0p600;
            reverse = 0x1.0p-600;
        } else {
            forward = reverse = 1.0;
        }

        double sum = 0;
        for (int i = 0; i < x.length; i++) {
            sum += square(x[i] * forward);
        }
        return Math.sqrt(sum) * reverse;
    }

    /**
     * Return the valued squared ({@code x^2}).
     *
     * @param x the value
     * @return the squared value
     */
    private static double square(double x) {
        return x * x;
    }
{code}
If you want a truly stable result then without having to use extended precision 
in the sum you can sort the squares and sum them. This minimises loss of bits 
when summing as you always add increasingly bigger values. The summation can be 
done in extended precision to ensure carry of the round-off term. It also 
ensures that permutations of the input vector have the same length. This 
requires storing an array of the absolute magnitudes and sorting. After the 
sort you get for free the maximum and minimum to detect for scaling.

This method could be inlined for length 3 to avoid the array with a few if 
statements for an inline sort of 3 magnitudes.
{code:java}
    /**
     * Compute the Euclidean length of the vector. The input is scaled up or 
down
     * based on the maximum/minimum value to avoid over/underflow. The result is
     * the same for permutations of the input.
     *
     * @param v Vector
     * @return the length
     */
    static double length(double[] v) {
        final double[] x = new double[v.length];
        for (int i = 0; i < x.length; i++) {
            x[i] = Math.abs(v[i]);
        }
        // NaN will move to the end of the array
        Arrays.sort(x);
        final double max = x[x.length - 1];
        if (!Double.isFinite(max) || max == 0) {
            // No scaling possible for zero, infinite or NaN input
            return max;
        }

        // Perform scaling
        double forward;
        double reverse;
        if (max > 0x1.0p500) {
            // Scale down big values
            forward = 0x1.0p-600;
            reverse = 0x1.0p600;
        } else if (x[0] < 0x1.0p-500) {
            // Scale up small values
            forward = 0x1.0p600;
            reverse = 0x1.0p-600;
        } else {
            forward = reverse = 1.0;
        }

        double sum = 0;
        for (int i = 0; i < x.length; i++) {
            // This summation could be done in extended precision...
            sum += square(x[i] * forward);
        }
        return Math.sqrt(sum) * reverse;
    }
{code}
To check the exact result I would look into using Java 9 for the test code and 
BigDecimal sqrt. You will need a method to compute the ULP difference between 
doubles. Since all lengths should be positive this reduces to:
{code:java}
Double.doubleToLongBits(expected) - Double.doubleToLongBits(actual); // Will be 
signed
{code}
Then you can check the mean and standard deviation of the ULP difference from 
the actual result over a large number of random vectors. I expect the mean to 
be 0 but the standard deviation will be higher for the less precise methods.

If you are building a test suite of different methods then you can create the 
same functions as above but using an extended precision sum for comparison, for 
example using [Kahan 
summation|https://en.wikipedia.org/wiki/Kahan_summation_algorithm].

 

 

 

> SafeNorm 3D overload
> --------------------
>
>                 Key: NUMBERS-156
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-156
>             Project: Commons Numbers
>          Issue Type: Improvement
>            Reporter: Matt Juntunen
>            Priority: Major
>
> We should create an overload of {{SafeNorm.value}} that accepts 3 arguments 
> to potentially improve performance for 3D vectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to