[jira] [Commented] (NUMBERS-193) Add support for extended precision floating-point numbers

Alex Herbert (Jira) Thu, 29 Jun 2023 04:21:04 -0700


    [ 
https://issues.apache.org/jira/browse/NUMBERS-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738504#comment-17738504
 ]


Alex Herbert commented on NUMBERS-193:
--------------------------------------

I have created a JMH benchmark for 3 variants of the internal DD class in the 
Statistics distribution package:
||Implementation||Description||
|static mutable|Has mutable fields for the high and low part of the 
double-double number. All methods are static and write results to a mutable 
result argument.
 
The current implementation in Statistics.|
|static immutable|Has immutable (final) fields for the high and low part. All 
methods are static and return a new DD instance.|
|OO immutable|Has immutable (final) fields for the high and low part. All 
methods are instance and return a new DD instance.|

I tested the implementations using the computation for the one-sided one-sample 
Kolmogorov-Smirnov distribution. This accepts arguments n for the sample size 
and x for the statistic (in [0, 1]). All implementations compute the same 
result.

The computation can be performed two ways and the choice is based on the 
arguments. The parameters tested have been chosen to exercise the more time 
consuming code path in the computation, or a range of parameters to exercise 
both code paths.

Using a simple counter I tracked the number of object instances created for the 
single test cases and the range test case:
||n||x min||x max||samples||Objects||
|10000|0.01| |1|282500|
|10000|0.05| |1|510159|
|100000|0.01| |1|2578505|
|100000|0.05| |1|4300223|
|1000|0.001|0.5|500|8543836|

For the range test case the samples were linearly spaced between x min and max. 
The table shows that the cases create a non-trivial amount of new instances of 
a double-double number.

Note: Object tracking was done for reference. All JMH tests had this code 
removed. The mutable implementation will create only 3 object instances per 
computation.
h2. Results

Tested using JDK 11:

OpenJDK 64-Bit Server VM (build 11.0.19+7-post-Ubuntu-0ubuntu120.04.1, mixed 
mode, sharing).

*Single Cases*
||n||x||static mutable||static immutable||Relative||OO immutable||Relative||
|10000|0.01|4758677.4|4766630.6|1.002|4888738.5|1.027|
| |0.05|3833546.9|3929037.5|1.025|4058091.9|1.059|
|100000|0.01|51637595.6|52368277.9|1.014|53024335.0|1.027|
| |0.05|34571369.7|34994716.9|1.012|35030851.3|1.013|

*Range Case*
||n||lx||ux||samples||static mutable||static immutable||Relative||OO 
immutable||Relative||
|1000|0.001|0.5|500|113759933|114414561|1.006|110909383|0.975|
h2. Observations

There is very little speed difference across the implementations. For the 
single cases the mutable implementation is slightly faster. For the case using 
500 samples across a range the OO immutable implementation is slightly faster. 
This is despite the large number of new objects created. These objects are 
transient within the computation method and therefore should be optimised in 
memory allocation by the JVM using escape analysis.

These relative differences have been observed to change slightly across 
different parameterisations of the tests to favour one or other of the 
implementations. The differences are small enough to justify rewriting the DD 
class as an immutable object with little or no loss of performance.

Note that I tested on JDK 17 and the results were similar. No test was done on 
JDK 8 as the Numbers JMH project where I tried the code targets Java 9 (for 
reasons unrelated to this code).
h2. Method Implementations

There are different ways to implement some of the computations on a 
double-double number. These compromise accuracy for speed. The following table 
shows methods I tested with the accuracy reported using the epsilon (eps) of a 
double: 2^-53:
||Method||Description||
|fastAdd|Error of 4 eps^2|
|add|Error of 1 eps^2|
|fastMultiply|multiplication without checks; error of 16 eps^2|
|multiply|multiplication with checks for intermediate overflow and non-finite 
input arguments; error of 16 eps^2|

The operations were tested using random double-double numbers with the high 
part in the range [-1, 1], i.e. not hitting edge cases for the multiply checks.
||Method||samples||regular||fast||Relative||
|add|1000|17772.6|12952.4|0.728783508119157|
|multiply|1000|13820.5|13354.6|0.966289411350109|

There is a significant difference in speed between fastAdd and add. The 
difference between the fastMultiply and multiply is small.
h3. Notes on Add

The two implementations of add were created for use in the Kolmogorov-Smirnov 
computation. One computation has a summation of thousands of terms. Here 
fastAdd can be used as any cumulative error is unlikely to affect the final 
rounding of the double-double to a double. The other computation has a 
summation of alternately signed terms where correct maintenance of the 
round-off is more important for the final rounding. Thus there is a use case 
for both. But adding both to an API may be confusing.
h3. Note on Multiply

Note that the fastMultiply and multiply compute the same result. The only 
difference is that for a single operation that will compute a non-finite result 
the fastMultiply will compute NaN and multiply will compute the correct 
infinity. This difference may be significant for a single operation. But when 
the double-double number is used as part of a larger set of arithmetic the 
difference will be mute. This is due to the fact that addition of infinite 
double-double numbers will create NaN. This results from subtraction of same 
signed infinite values during round-off computations, e.g. Inf - Inf == NaN.

To support correct propagation of infinite values through all the operations 
(add, subtract, multiply, divide) would entail a significantly larger and more 
complex implementation of the double-double number. I would suggest that this 
support is omitted and the double-double used for computations well within the 
finite range of a double. This can be assisted by provision of a method to 
represent the number scaled to the range [0.5, 1) with a base 2 exponent 
([frexp|https://www.cplusplus.com/reference/cmath/frexp/]) and to scale numbers 
by a power of 2 ([ldexp|https://www.cplusplus.com/reference/cmath/ldexp/]; 
scalb).

Note that the use of scaling to a normal range and maintenance of a separate 
exponent is used within the Kolmogorov-Smirnov computation. This is in part 
simplified by knowing that individual terms should be probabilities when 
rescaled after intermediate computation, thus overflow is avoided and underflow 
computes a p-value of zero.
h1. Conclusion

Implementation of an immutable double-double extended precision floating-point 
number is viable. There is no significant performance difference to the current 
internal implementation in Statistics.

I propose adding a public version of the class to a new package for extended 
precision numbers, e.g.:
{noformat}
o.a.c.numbers.ext : DD
{noformat}
The initial implementation can forego any checks for overflow and thus carry a 
warning that NaNs will be generated by standard IEEE arithmetic that creates 
infinite values.
h2. TBD
 * Whether to support multiple implementations of operations, such as add and 
fastAdd
 * Final package name and class name

> Add support for extended precision floating-point numbers
> ---------------------------------------------------------
>
>                 Key: NUMBERS-193
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-193
>             Project: Commons Numbers
>          Issue Type: New Feature
>            Reporter: Alex Herbert
>            Priority: Major
>              Labels: full-time, gsoc2023, part-time
>
> Add implementations of extended precision floating point numbers.
> An extended precision floating point number is a series of floating-point 
> numbers that are non-overlapping such that:
> {noformat}
> double-double (a, b):
> |a| > |b|
> a == a + b{noformat}
> Common representations are double-double and quad-double (see for example 
> David Bailey's paper on a quad-double library: 
> [QD|https://www.davidhbailey.com/dhbpapers/qd.pdf]).
> Many computations in the Commons Numbers and Statistics libraries use 
> extended precision computations where the accumulated error of a double would 
> lead to complete cancellation of all significant bits; or create intermediate 
> overflow of integer values.
> This project would formalise the code underlying these use cases with a 
> generic library applicable for use in the case where the result is expected 
> to be a finite value and using Java's BigDecimal and/or BigInteger negatively 
> impacts performance.
> An example would be the average of long values where the intermediate sum 
> overflows or the conversion to a double loses bits:
> {code:java}
> long[] values = {Long.MAX_VALUE, Long.MAX_VALUE}; 
> System.out.println(Arrays.stream(values).average().getAsDouble()); 
> System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
>     .reduce(BigDecimal.ZERO, BigDecimal::add)
>     .divide(BigDecimal.valueOf(values.length)).doubleValue());
> long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE}; 
> System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble());
>  System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
>     .reduce(BigDecimal.ZERO, BigDecimal::add)
>     .divide(BigDecimal.valueOf(values2.length)).doubleValue());
> {code}
> Outputs:
> {noformat}
> -1.0
> 9.223372036854776E18
> 0.0
> -0.5{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NUMBERS-193) Add support for extended precision floating-point numbers

Reply via email to