[
https://issues.apache.org/jira/browse/NUMBERS-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738504#comment-17738504
]
Alex Herbert commented on NUMBERS-193:
--------------------------------------
I have created a JMH benchmark for 3 variants of the internal DD class in the
Statistics distribution package:
||Implementation||Description||
|static mutable|Has mutable fields for the high and low part of the
double-double number. All methods are static and write results to a mutable
result argument.
The current implementation in Statistics.|
|static immutable|Has immutable (final) fields for the high and low part. All
methods are static and return a new DD instance.|
|OO immutable|Has immutable (final) fields for the high and low part. All
methods are instance and return a new DD instance.|
I tested the implementations using the computation for the one-sided one-sample
Kolmogorov-Smirnov distribution. This accepts arguments n for the sample size
and x for the statistic (in [0, 1]). All implementations compute the same
result.
The computation can be performed two ways and the choice is based on the
arguments. The parameters tested have been chosen to exercise the more time
consuming code path in the computation, or a range of parameters to exercise
both code paths.
Using a simple counter I tracked the number of object instances created for the
single test cases and the range test case:
||n||x min||x max||samples||Objects||
|10000|0.01| |1|282500|
|10000|0.05| |1|510159|
|100000|0.01| |1|2578505|
|100000|0.05| |1|4300223|
|1000|0.001|0.5|500|8543836|
For the range test case the samples were linearly spaced between x min and max.
The table shows that the cases create a non-trivial amount of new instances of
a double-double number.
Note: Object tracking was done for reference. All JMH tests had this code
removed. The mutable implementation will create only 3 object instances per
computation.
h2. Results
Tested using JDK 11:
OpenJDK 64-Bit Server VM (build 11.0.19+7-post-Ubuntu-0ubuntu120.04.1, mixed
mode, sharing).
*Single Cases*
||n||x||static mutable||static immutable||Relative||OO immutable||Relative||
|10000|0.01|4758677.4|4766630.6|1.002|4888738.5|1.027|
| |0.05|3833546.9|3929037.5|1.025|4058091.9|1.059|
|100000|0.01|51637595.6|52368277.9|1.014|53024335.0|1.027|
| |0.05|34571369.7|34994716.9|1.012|35030851.3|1.013|
*Range Case*
||n||lx||ux||samples||static mutable||static immutable||Relative||OO
immutable||Relative||
|1000|0.001|0.5|500|113759933|114414561|1.006|110909383|0.975|
h2. Observations
There is very little speed difference across the implementations. For the
single cases the mutable implementation is slightly faster. For the case using
500 samples across a range the OO immutable implementation is slightly faster.
This is despite the large number of new objects created. These objects are
transient within the computation method and therefore should be optimised in
memory allocation by the JVM using escape analysis.
These relative differences have been observed to change slightly across
different parameterisations of the tests to favour one or other of the
implementations. The differences are small enough to justify rewriting the DD
class as an immutable object with little or no loss of performance.
Note that I tested on JDK 17 and the results were similar. No test was done on
JDK 8 as the Numbers JMH project where I tried the code targets Java 9 (for
reasons unrelated to this code).
h2. Method Implementations
There are different ways to implement some of the computations on a
double-double number. These compromise accuracy for speed. The following table
shows methods I tested with the accuracy reported using the epsilon (eps) of a
double: 2^-53:
||Method||Description||
|fastAdd|Error of 4 eps^2|
|add|Error of 1 eps^2|
|fastMultiply|multiplication without checks; error of 16 eps^2|
|multiply|multiplication with checks for intermediate overflow and non-finite
input arguments; error of 16 eps^2|
The operations were tested using random double-double numbers with the high
part in the range [-1, 1], i.e. not hitting edge cases for the multiply checks.
||Method||samples||regular||fast||Relative||
|add|1000|17772.6|12952.4|0.728783508119157|
|multiply|1000|13820.5|13354.6|0.966289411350109|
There is a significant difference in speed between fastAdd and add. The
difference between the fastMultiply and multiply is small.
h3. Notes on Add
The two implementations of add were created for use in the Kolmogorov-Smirnov
computation. One computation has a summation of thousands of terms. Here
fastAdd can be used as any cumulative error is unlikely to affect the final
rounding of the double-double to a double. The other computation has a
summation of alternately signed terms where correct maintenance of the
round-off is more important for the final rounding. Thus there is a use case
for both. But adding both to an API may be confusing.
h3. Note on Multiply
Note that the fastMultiply and multiply compute the same result. The only
difference is that for a single operation that will compute a non-finite result
the fastMultiply will compute NaN and multiply will compute the correct
infinity. This difference may be significant for a single operation. But when
the double-double number is used as part of a larger set of arithmetic the
difference will be mute. This is due to the fact that addition of infinite
double-double numbers will create NaN. This results from subtraction of same
signed infinite values during round-off computations, e.g. Inf - Inf == NaN.
To support correct propagation of infinite values through all the operations
(add, subtract, multiply, divide) would entail a significantly larger and more
complex implementation of the double-double number. I would suggest that this
support is omitted and the double-double used for computations well within the
finite range of a double. This can be assisted by provision of a method to
represent the number scaled to the range [0.5, 1) with a base 2 exponent
([frexp|https://www.cplusplus.com/reference/cmath/frexp/]) and to scale numbers
by a power of 2 ([ldexp|https://www.cplusplus.com/reference/cmath/ldexp/];
scalb).
Note that the use of scaling to a normal range and maintenance of a separate
exponent is used within the Kolmogorov-Smirnov computation. This is in part
simplified by knowing that individual terms should be probabilities when
rescaled after intermediate computation, thus overflow is avoided and underflow
computes a p-value of zero.
h1. Conclusion
Implementation of an immutable double-double extended precision floating-point
number is viable. There is no significant performance difference to the current
internal implementation in Statistics.
I propose adding a public version of the class to a new package for extended
precision numbers, e.g.:
{noformat}
o.a.c.numbers.ext : DD
{noformat}
The initial implementation can forego any checks for overflow and thus carry a
warning that NaNs will be generated by standard IEEE arithmetic that creates
infinite values.
h2. TBD
* Whether to support multiple implementations of operations, such as add and
fastAdd
* Final package name and class name
> Add support for extended precision floating-point numbers
> ---------------------------------------------------------
>
> Key: NUMBERS-193
> URL: https://issues.apache.org/jira/browse/NUMBERS-193
> Project: Commons Numbers
> Issue Type: New Feature
> Reporter: Alex Herbert
> Priority: Major
> Labels: full-time, gsoc2023, part-time
>
> Add implementations of extended precision floating point numbers.
> An extended precision floating point number is a series of floating-point
> numbers that are non-overlapping such that:
> {noformat}
> double-double (a, b):
> |a| > |b|
> a == a + b{noformat}
> Common representations are double-double and quad-double (see for example
> David Bailey's paper on a quad-double library:
> [QD|https://www.davidhbailey.com/dhbpapers/qd.pdf]).
> Many computations in the Commons Numbers and Statistics libraries use
> extended precision computations where the accumulated error of a double would
> lead to complete cancellation of all significant bits; or create intermediate
> overflow of integer values.
> This project would formalise the code underlying these use cases with a
> generic library applicable for use in the case where the result is expected
> to be a finite value and using Java's BigDecimal and/or BigInteger negatively
> impacts performance.
> An example would be the average of long values where the intermediate sum
> overflows or the conversion to a double loses bits:
> {code:java}
> long[] values = {Long.MAX_VALUE, Long.MAX_VALUE};
> System.out.println(Arrays.stream(values).average().getAsDouble());
> System.out.println(Arrays.stream(values).mapToObj(BigDecimal::valueOf)
> .reduce(BigDecimal.ZERO, BigDecimal::add)
> .divide(BigDecimal.valueOf(values.length)).doubleValue());
> long[] values2 = {Long.MAX_VALUE, Long.MIN_VALUE};
> System.out.println(Arrays.stream(values2).asDoubleStream().average().getAsDouble());
> System.out.println(Arrays.stream(values2).mapToObj(BigDecimal::valueOf)
> .reduce(BigDecimal.ZERO, BigDecimal::add)
> .divide(BigDecimal.valueOf(values2.length)).doubleValue());
> {code}
> Outputs:
> {noformat}
> -1.0
> 9.223372036854776E18
> 0.0
> -0.5{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)