[
https://issues.apache.org/jira/browse/NUMBERS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330797#comment-17330797
]
Alex Herbert commented on NUMBERS-142:
--------------------------------------
As discussed on the mailing list regarding geometry and numbers [1] I have
updated LinearCombination to use the dot2s algorithm. This replaces the current
2-fold precision algorithm that uses array allocation to maintain round-off
errors. The affect should be a marginal increase in speed with no major loss of
robustness in dot products involving cancellation.
Changes are in [PR 90.|https://github.com/apache/commons-numbers/pull/90]
[1] [Re: [geometry]
1.0-beta2|https://lists.apache.org/thread.html/r103ce078505e618511d0f9f1a7d75a0804abbf115ae6ff376a39ca20%40%3Cdev.commons.apache.org%3E]
> Improve LinearCombination accuracy during summation of the round-off errors
> ---------------------------------------------------------------------------
>
> Key: NUMBERS-142
> URL: https://issues.apache.org/jira/browse/NUMBERS-142
> Project: Commons Numbers
> Issue Type: Improvement
> Components: arrays
> Affects Versions: 1.0
> Reporter: Alex Herbert
> Assignee: Alex Herbert
> Priority: Minor
> Attachments: array_performance.jpg, cond_no.jpg, dot.jpg,
> error_vs_condition_no.jpg, inline_perfomance.jpg, linear_cached.jpg,
> linear_inline_vs_array.jpg
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The algorithm in LinearCombination is an implementation of dot2 from [Ogita
> el al|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547]
> (algorithm 5.3). There is a subtle difference in that the original dot2
> algorithm sums the round-off from the products and the round-off from the
> product summation together. The method in LinearCombination sums them
> separately (using an extra variable) and adds them at the end. This actually
> improves the accuracy under conditions where the round-off is of greater sum
> than the products, as described below.
> The dot2 algorithm suffers when the total sum is close to 0 but the
> intermediate sum is far enough from zero that there is a large difference
> between the exponents of summed terms and the final result. In this case the
> sum of the round-off is more important than the sum of the products which due
> to massive cancellation is zero. The case is important for Complex numbers
> which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1
> such that log(1) would be ~zero yet the true logarithm is representable to
> very high precision.
> This can be protected against by using the dotK algorithm of Ogita et al with
> K>2. This saves all the round-off parts from each product and the running
> sum. These are subject to an error free transform that repeatedly adds
> adjacent pairs to generate a new split pair with a closer upper and lower
> part. Over time this will order the parts from low to high and these can be
> summed low first for an error free dot product.
> Using this algorithm with a single pass (K=3 for dot3) removes the
> cancellation error observed in the mentioned use case. Adding a single pass
> over the parts changes the algorithm from 25n floating point operations
> (flops) to 31n flops for the sum of n products.
> A second change for the algorithm is to switch to using
> [Dekker's|https://doi.org/10.1007/BF01397083] algorithm (Dekker, 1971) to
> split the number. This extracts two 26-bit mantissas from a 53-bit mantis (in
> IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to
> be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27:
> big = (2^s+1) * a
> a_hi = (big - (big - a))
> The extra bit of precision is carried into the sign of the low part of the
> split number.
> This is in contrast to the method in LinearCombination that uses a simple
> mask on the long representation to obtain the a_hi part in 26-bits and the
> lower part will be 27 bits.
> The advantage of Dekker's method is it produces 2 parts with 26 bits in the
> mantissa that can be multiplied exactly. The disadvantage is the potential
> for overflow requiring a branch condition to check for extreme values.
> It also appropriately handles very small sub-normal numbers that would be
> masked to create a 0 high part with all the non-zero bits left in the low
> part using the current method. This will have knock on effects on split
> multiplication which requires the high part to be larger.
> A simple change to the current implementation to use Dekker's split improves
> the precision on a wide range of test data (not shown).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)