[jira] [Commented] (NUMBERS-193) Add support for extended precision floating-point numbers

Alex Herbert (Jira) Mon, 07 Aug 2023 06:36:06 -0700


    [ 
https://issues.apache.org/jira/browse/NUMBERS-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751677#comment-17751677
 ]


Alex Herbert commented on NUMBERS-193:
--------------------------------------

Here is the current API for the new DD class:
{code:java}
public final class DD extends java.lang.Number implements java.io.Serializable {
  public static final DD ONE;
  public static final DD ZERO;
  public static DD of(double);
  public static DD of(int);
  public static DD of(long);
  public static DD from(java.math.BigDecimal);
  public static DD ofSum(double, double);
  public static DD ofDifference(double, double);
  public static DD ofProduct(double, double);
  public static DD ofSquare(double);
  public static DD fromQuotient(double, double);
  public double x();
  public double xx();
  public boolean isFinite();
  public double doubleValue();
  public float floatValue();
  public int intValue();
  public long longValue();
  public java.math.BigDecimal bigDecimalValue();
  public DD negate();
  public DD add(double);
  public DD add(DD);
  public DD subtract(double);
  public DD subtract(DD);
  public DD multiply(double);
  public DD multiply(DD);
  public DD square();
  public DD divide(double);
  public DD divide(DD);
  public DD reciprocal();
  public DD sqrt();
  public DD ldexp(int);
  public DD frexp(int[]);
  public DD pow(int);
  public DD pow(int, long[]);
  public boolean equals(java.lang.Object);
  public int hashCode();
  public java.lang.String toString();
}
{code}
h2. API Notes
 * Factory methods to create a DD instance that are exact use the {{of}} 
prefix. This includes all the arithmetic constructors (sum, difference, 
product, square)
 * The fromQuotient method cannot be exact if the quotient is a recurring or 
irrational number. But it is the closest possible representation. It uses the 
{{from}} prefix to indicate that the conversion may be an inexact 
representation.
 * The other factory method that cannot be exact is the conversion from a 
BigDecimal. This also uses the {{from}} prefix.
 * Conversion to primitives is provided to support the Number interface. It is 
useful to be able to convert the number to a long with full signed 63-bit 
precision.
 * Conversion to int or long is performed as if casting a double. This 
truncates any value outside the range of the whole number to the largest 
representable whole number. This is a different conversion than BigDecimal 
which converts to a BigInteger and returns the lowest 32 or 64 bits:
{code:java}
int i = DD.of(1L << 32).intValue(); // Integer.MAX_VALUE
int j = BigDecimal.valueOf(1L << 32).intValue(); // 0{code}

 * The method frexp scales the number to a fractional representation in [0.5, 
1) and returns the base 2 exponent. It is named after the corresponding c 
function which does the same. It is useful to scale numbers before computation 
to avoid over and underflow.
 * The corresponding rescaling function is ldexp, also named after the c 
function for scaling. This could be renamed to scalb which is the JDK Math 
equivalent.
 * The pow(int, long[]) method returns the power result as a scaled fraction 
with a base 2 exponent. It can compute any integer power without under or 
overflow. It is a specialised method used in the original DD implementation in 
Statistics. It could be left in Statistics. Moving it here consolidates all the 
DD code to one place which should be easier to maintain.
 * The literature refers to a double-double number x using different notation. 
For general extended floating-point numbers a suffix for the part is useful: 
(x0, x1), (x0, x1, x2), (x0, x1, x2, x3). The alternative is to increase the 
character cardinality for each part: (x, xx), (x, xx, xxx), (x, xx, xxx, xxxx). 
Here I have used the cardinality method. This is less extensible for an 
arbitrary depth number. It does have the advantage that the initial x part is 
the closest double representation of the entire value x. We could be java 
verbose and use a getter: getPart0, getPart1, etc. Or for the double-double 
number use the alternative for x with high and low part suffixes as (x_hi, 
x_lo) or shorter (hi, lo).
 * There is a toString to convert to a formatted tuple: (x, xx). Actual number 
representations should use bigDecimalValue().toString().
 * There is no parse(String) method. This prevents using parsing to create an 
invalid number. All computations assume that x is the most significant part and 
xx is the round-off such that x == x + xx. For simplicity it is easier to not 
add a parse method rather than have a parse method: (a) raise exceptions for 
using an input that is not a normalised double-double number; (b) read the 
input and then normalise it; (c) any other choice.
 * Sorting/comparison is not supported. You can do this via the BigDecimal 
representation.

h2. Accuracy & Performance of implementations

The implementations in the DD class are quite accurate. I tried to improve 
their accuracy using additional computations. These have a performance cost.

Here is the accuracy of the methods on 10 million computations using 100,000 
random DD numbers in +/-[1, 2) scaled by base 2 exponent in [0, 106). The 
scaling is important to generate different levels of overlap for the add 
operation. I did not eliminate duplicates from the square and reciprocal stats 
for simplicity. The accuracy is measured using the effective ULP of the DD 
class, which is a relative error epsilon of 2^-106.
|| ||Standard|| ||Accurate||
|Operation|RMS|Max|RMS|Max|
|add(DD)|0.2637|2.3910|0.0000|0.0000|
|add(double)|0.1888|1.9537|0.0000|0.0000|
|multiply(DD)|0.6003|4.3672|0.0000|0.0000|
|multiply(double)|0.3083|2.8018|0.0000|0.0000|
|divide(DD)|0.3320|2.9507|0.0829|0.9662|
|divide(double)|0.0000|0.0000| | |
|square|0.6864|3.8654|0.0000|0.0000|
|reciprocal|0.3371|1.8826|0.1391|0.9903|
|pow|16 * (n-1)| |~0|1|
|sqrt| |2.03| | |

Notes
 * The accurate methods are able to achieve perfect rounding (<0.5 epsilon) for 
all but the divide method. I tried some variants and could not improve it. Note 
that a standard divide by a double is accurate anyway. It is division by a DD 
number that is problematic.
 * The standard methods will provide at least 102-bit accuracy for all 
operations on typical input data.
 * The reference paper for the Statistics implementation which used DD provided 
error bounds of 4 eps^2 for add and 16 eps^2 for multiply with eps being 2^-53. 
This test data has 2.4 eps^2 for add and 4.4 eps^2 for multiply. Using random 
input may not hit worse case situations for intermediate rounding during the 
computations.
 * The data does not contain the pow or sqrt methods. BigDecimal pow is limited 
to n in +/-999999999. The DD pow has no limits. The accuracy of a DD pow using 
repeat squaring by multiplication is on average 16 * (n-1) epsilon. The 
accurate pow method uses triple-double multiplication and is 1 ulp accurate. 
Only a few examples of final rounding error have been observed.
 * The sqrt method cannot be easily tested vs BigDecimal in the current JDK 8 
codebase as sqrt was added to BigDecimal in JDK 9. On random data computed 
using BigDecimal sqrt the max error is 2.03 epsilon. I do not have an accurate 
variant of this method.

Here are the approximate run times of the methods benchmarked using JMH with 
1000 samples for each operation (using JDK 17). These figures have had a 
baseline subtracted which simply created new DD objects, thus the numbers can 
be compared for relative speed.
||Operation||Standard||Accurate||Relative||
|add(DD)|710.6|2921.6|4.11|
|add(double)|427.1|653.1|1.53|
|twoSum(double, double)|263.7| |0.62 (to add(double))|
|fastTwoSum(double, double)|211.5| |0.50 (to add(double))|
|mutiply(DD)|804.7|4379.9|5.44|
|multiply(double)|713.4|2535.8|3.55|
|checkedMultiply(DD)|1609.0| |2.00 (to multiply(DD))|
|checkedMultiply(double)|1416.7| |1.99 (to multiply(double))|
|square|538.8|2448.9|4.55|
|divide(DD)|19848.5|34205.5|1.72|
|divide(double)|16497.6| | |
|reciprocal|15154.2|28342.2|1.87|

Notes
 * The checkedMultiply method adds additional scaling to the input arguments to 
prevent intermediate overflow. Since the input numbers will never overflow this 
is costing a 2-fold performance reduction to have additional branch points and 
checks that are always false. Given inputs that may overflow, the performance 
may be worse as multiple branches will be required to execute.
 * The twoSum (6 flops) and fastTwoSum (3 flops) methods add two double values, 
compute a round-off and return a DD. These are the building blocks of 
arithmetic with double-double numbers. They provide a reference for the speed 
of a Kaham summation (fast-two-sum), or summation using Number's Sum class 
(two-sum).
 * The square methods are 0.67x and 0.56x the run-time of a multiply of a DD 
with itself. This is a useful optimisation.
 * The reciprocal methods are 0.76x and 0.83x the run-time of a divide of 
DD.ONE by the argument. This is a useful optimisation given the slow speed of 
divide operations.
 * Divide is slow. It is best avoided if an alternative formulation of the 
computation is possible.

A speed test of the pow method was performed on 500 random x values that did 
not under/overflow for n in [-1022, 1022]:
||implementation||Score||Relative||
|accuratePowScaled|201981.7|4.11|
|pow|42668.7|0.87|
|powScaled|49130.1| |

These values have not had a baseline subtracted. But the methods are slow 
enough that creation of a DD is insignificant. The unscaled power computation 
is faster than the scaled power. So it useful to have both, despite the fact 
that the scaled pow result can be converted to unscaled using the ldexp (scalb) 
scaling method.

A comparison was made of the speed of the scaled pow computations verses  
arbitrary precision numbers (BigDecimal or Commons Math3 Dfp) with their 
precision limited to 34 decimal digits. All these methods can compute a finite 
output for any argument by using a scaled representation for the result. The 
benchmark used a value of x and n that will not and will overflow.
||implementation||n||x||Score||Relative||
|BigDecimal|1000|1.414651294|6340.7|73.80|
|Dfp|1000|1.414651294|19487.8|226.81|
|accuratePowScaled|1000|1.414651294|394.1|4.59|
|powScaled|1000|1.414651294|85.9| |
|BigDecimal|10000|1.414651294|7366.1|63.10|
|Dfp|10000|1.414651294|21136.5|181.05|
|accuratePowScaled|10000|1.414651294|486.3|4.17|
|powScaled|10000|1.414651294|116.7| |

The Dfp class is very slow. The DD pow methods are magnitudes faster than 
BigDecimal. This includes the 1 ULP accurate scaled pow.
h2. Standard vs Accurate

I have put all the accurate methods into a DDMath class with the following API:
{code:java}
public final class DDMath {
  public static DD add(DD, double);
  public static DD add(DD, DD);
  public static DD subtract(DD, double);
  public static DD subtract(DD, DD);
  public static DD multiply(DD, double);
  public static DD multiply(DD, DD);
  public static DD square(DD);
  // No public static DD divide(DD, double);
  // as the 'standard' method is ~exact
  public static DD divide(DD, DD);
  public static DD reciprocal(DD);
  public static DD pow(DD, int, long[]);
}
{code}
Note that the "accurate" methods are 1.5 to 5.5 times slower than the 
"standard" methods for an increase of 2-3 bits accuracy in the round-off 
component. In practice this extra accuracy may never be needed if the ultimate 
goal of using a DD number is to achieve 1 ULP accuracy in a computation that 
outputs a double number. Even accumulating the max error on each operation 
would require hundreds, thousands or millions (in the case of only add 
operations) of chained operations to see a 2 ULP accuracy in the final double 
value.

Thus it may not be worthwhile exposing the accurate methods in a public API.

Note that accurateAdd is used internally in DD in the standard divide 
operations. This impacts performance by ~10% and noticeably improves accuracy. 
A full divide has 2 multiply(DD), 1 accurateAdd(DD), 1 add(DD) and some extra 
accounting two-sum operations. This is why the divide method is slow and the 
cost of accurateAdd is small here. There are no uses internally for accurate 
multiply or divide.
h2. Issues

Placing the DD class in its own package means it would have to import 
numbers.core in order to implement the NativeOperators interface. This 
interface is for basic arithmetic operations such as add, multiply, divide, 
pow. The interface is used by numbers.field to allow an algorithm to be written 
for a generic field.

If the ext package imports core, then it excludes core using the ext package to 
support the extended precision computations performed in the Sum class, 
otherwise there is a circular dependency.

Options to use DD as a field:
 # Move all this to the core package.
 # Create a wrapper for DD in the field package so it can have a DDField class.

h2. Extended precision in the rest of Commons

The concept encapsulated in the DD class is used in several other places in 
commons. Here are some locations where the used of DD multiplication or DD 
addition are known.
||Component||Support||
|Numbers Gamma|Computation in Erf method is fully supported.|
|Statistics Inference|Source of the DD implementation. Fully supported if the 
scaled power computation is included.|
|Statistics Distribution|Computes sqrt(2 * a * a). Fully Supported.
Computes a * sqrt(2 * pi). This uses a precomputed sqrt(2pi) constant pre-split 
for double multiplication. Using DD is fully supported but would be slower. The 
computation is in a class constructor so is a negligible one-off cost.
Computes exp(-0.5*x*x). Fully supported by using DD square.|
|Numbers Core|Extended precision is used by Sum. An internal helper class 
implements the building blocks of DD arithmetic: two-sum; two-product; 
two-square. Dekker's split of a 53-bit double to two values. All these are used 
internally by DD but not public.
 
The raw building blocks methods of DD arithmetic are not supported. Updating 
Sum to use the DD class for multiplication and summation may have performance 
implications as the Sum class is very optimised for dot product computations. 
The building blocks for DD arithmetic could be exposed publicly to remove code 
duplication. This is also solved by moving the DD class to numbers core.|
|Numbers Complex|Uses a summation of a 5 component extended precision 
floating-point number. This is partially supported. The parts of the number can 
be computed using DD. The summation using DD would not be accurate enough. It 
requires exposure of the fast-two-sum and two-sum methods. These are trivial 
single line functions and code duplication to avoid an external dependency here 
seems justified.
Also uses splitting of numbers for effective DD multiplication. This is 
supported.
Note: A previous attempt using the Core Sum class to implement the computation 
failed to pass the unit tests for accuracy.|
|Math AccurateMath|Unknown.
 
Many computation use splitting of a 53-bit mantissa to extend accuracy of the 
product. However the split is often not equal, e.g. 20-bits and 33-bits and 
changing computations to use the DD class may introduce errors in the 
polynomial approximations.|
h2. Summary

I propose:
 * Adding the DD class in its current state to a new module
 * Exposing the accurate versions of arithmetic operators in a helper class. A 
use case may exist that I have not considered and these methods are still 
reasonably fast (compared to BigDecimal) and are mostly double-double exact (or 
within 1 rounding error).
 * Any use of DD as a field would require a wrapper class to implement the 
required NativeOperators interface
 * Updating downstream projects to use DD when possible
 * Possibly extending the current public API to expose more raw building block 
methods for extended floating point arithmetic (fast-two-sum; two-sum; 
two-product).
 * Possibly adding a triple-double and/or quad double and/or arbitrary length  
implementation. I currently have no use case for these so it would be an 
academic exercise. The possible existence of e.g. a quad-double does possibly 
impact package location and names for the parts of the number by allowing 
standardising across implementations in a single package location.

Please discuss any of the above, the following, or anything else:
 * _*Putting DD in numbers core*_
 * Names for factory constructor methods, i.e. of and from
 * Names for the parts of the number: (x, xx) vs (x0, x1) vs (hi, lo) vs other
 * Hiding the accurate arithmetic methods
 * Hiding the scaled power method (but is required for Statistics inference)
 * Hiding the accurate scaled power method (this would require it remaining in 
Statistics inference package)

 

> Add support for extended precision floating-point numbers
> ---------------------------------------------------------
>
>                 Key: NUMBERS-193
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-193
>             Project: Commons Numbers
>          Issue Type: New Feature
>          Components: ext
>            Reporter: Alex Herbert
>            Assignee: Alex Herbert
>            Priority: Major
>              Labels: full-time, gsoc2023, part-time
>
> Add implementations of extended precision floating point numbers.
> An extended precision floating point number is a series of floating-point 
> numbers that are non-overlapping such that:
> {noformat}
> double-double (a, b):
> |a| > |b|
> a == a + b{noformat}
> Common representations are double-double and quad-double (see for example 
> David Bailey's paper on a quad-double library: 
> [QD|https://www.davidhbailey.com/dhbpapers/qd.pdf]).
> Many computations in the Commons Numbers and Statistics libraries use 
> extended precision computations where the accumulated error of a double would 
> lead to complete cancellation of all significant bits; or create intermediate 
> overflow of integer values.
> This project would formalise the code underlying these use cases with a 
> generic library applicable for use in the case where the result is expected 
> to be a finite value and using Java's BigDecimal and/or BigInteger negatively 
> impacts performance.
> An example would be the average of long values where the intermediate sum 
> overflows or the conversion to a double loses bits:
> {code:java}
> long[] values = {Long.MAX_VALUE, Long.MAX_VALUE}; 
> System.out.println(Arrays.stream(values).average().getAsDouble()); 
> System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
>     .reduce(BigDecimal.ZERO, BigDecimal::add)
>     .divide(BigDecimal.valueOf(values.length)).doubleValue());
> long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE}; 
> System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble());
>  System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
>     .reduce(BigDecimal.ZERO, BigDecimal::add)
>     .divide(BigDecimal.valueOf(values2.length)).doubleValue());
> {code}
> Outputs:
> {noformat}
> -1.0
> 9.223372036854776E18
> 0.0
> -0.5{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NUMBERS-193) Add support for extended precision floating-point numbers

Reply via email to