Hi Alex. Thanks a lot for this work (and all the "Complex" stuff)!
Le ven. 10 janv. 2020 à 18:18, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > I have been investigating the accuracy of the LinearCombination class. This > was because I was using it to perform high precision computations in the > Complex class and it did not achieve the desired precision. > > The basis of the class is to compute dot products: sum a_i b_i. It does this > by splitting numbers into a higher precision representation, multiplying them > and using additions to track the round-off of the standard summation. This is > then added at the end. > > The case where it fails is for a linear combination whose total sum is close > to 0 but the intermediate sum is far enough from zero that there is a large > difference between the exponents of summed terms and the final result. The > case is important for Complex numbers which require a computation of > log1p(x^2+y^2-1) when x^2+y^2 is close to 1 such that log(1) would be ~zero > and the true logarithm is representable to very high precision. > > LinearCombination has a few things that could be changed. However I note that > the class has been written to avoid conditional statements on magnitude and > so far I have not done a speed analysis of the implications of changes. I can > only estimate based on a guess that 1 conditional (|x| < |y|) is equal to 3 > floating point operations (flops). This is a figure used by Ogita et al > (2005) in the paper upon which LinearCombination is based [1]. > > For reference I have also referred to Dekker (1971) [2] and Shewchuk (1997) > [3]. Dekker is widely used as the originator of split product and split > addition where a double of 53-bits is split into two doubles of 26-bits each > (1-bit is moved to the sign component) and the terms multiplied separately > with exact precision. Shewchuk's paper expands on splits into 2 parts to > produce algorithms that express a double as an expansion of n-parts. In > either case the sum of the expansion is the original number without loss of > accuracy. Dekker’s split addition requires knowing which part is larger in > magnitude. An alternative is an algorithm from Knuth that uses more flops but > does not require ordering comparators. > > The algorithm in LinearCombination is an implementation of dot2 from Ogita el > al (algorithm 5.3). There is subtle difference in that the original dot2 > algorithm sums the round-off from the products and the round-off from the > product summation together. The method in LinearCombination sums them > separately (using an extra variable) and adds them at the end. This actually > increases precision in my testing as the round-offs are on a different scale. > > Here are the things I have tried changing: > > 1. Using Dekker’s algorithm to split the number. As stated earlier this > extracts two 26-bit mantissas from a 53-bit mantis (in IEEE 754 the leading > bit in front of the of the 52-bit mantissa is assumed to be 1). This is done > by multiplication by 2^s+1 with s = ceil(53/2) = 27: > > big = (2^s+1) * a > a_hi = (big - (big - a)) > > The extra bit of precision is carried into the sign of the low part of the > split number. > > This is in contrast to the method in LinearCombination that uses a simple > mask on the long representation to obtain the a_hi part in 26-bits and the > lower part will be 27 bits. > > The advantage of Dekker’s method is it produces 2 parts with 26 bits in the > mantissa that can be multiplied exactly. The disadvantage is the potential > for overflow. > > It also appropriately handles very small sub-normal numbers that would be > masked to create a 0 high part with all the bits in the low part using the > current method. This will have knock on effects on split multiplication which > wants the high part to be larger. > > 2. Using Dekker’s dot product algorithm. This maintains a running split sum > of the individual products. It requires ordering comparison per summation to > ensure the error term is appropriately maintained. > > 3. I have tried a variant with higher order summation using 3 doubles to > represent the expanded split sum. This would be a Klein summation [4] of the > sum. However although the implementation was better it was not significantly > better and I will not show results. > > 4. Using the dotK algorithm (algorithm 5.10) of Ogita et al. This method > stores all the round-off parts from the products and sums in an array. If > simply summed at the end it will match dot2 (K=2). When K>2 then repeated > loops over the array are used with a split sum to create an error free > transformation of the data. This is then summed. The transformation has the > effect that data is distilled so that it is less effected by differences in > the scale of each part to be summed. > > The paper by Ogita state that sum3 is approximately 2.2x the run-time of > sum2. The error is that of a K-fold working precision summation. Thus the sum > is 3-fold precision verses 2-fold precision. > > Results > > I created increasingly large arrays composed of the quartet of numbers [a, b, > c, d] with the condition that: > > a^2 + b^2 = 1 > c^2 + d^2 = 1 > > These are multiplied as the dot product [a, b, c, d] * [a, b, -c, -d] > > So when summed the result should be 0. But there is a lot of floating-point > error. I compute the correct result using BigDecimal. The scale of BigDecimal > is clipped to -324 such that the BigDecimal does not go under > Double.MIN_VALUE. This never happens in the test conditions AFAIK but it > creates a realistic limit for the computation. I measure the error using > units of least precision (ulps). > > In the following results the first column is the total length of the data. > Then each method is broken into: > > Mean ulps > Standard deviation ulps > Median ups > Percentile 99 ulps > Percentile 99.9 ulps > Percentile 100 ulps (i.e. max error) > Error count (i.e. when ulps != 0) > Sign error (when the result has the wrong sign) > > For all metrics lower is better. 100000 samples were run. > > The methods are: > > 1. LinearCombination.value > 2. LinearCombination.value modified to use Dekker’s split > 3. Dekker’s dot product > 4. Dot2s as described in Ogita el al > 5. DotK with K=2 > 6. DotK with K=3 > > 4 : 2.76,50.3,0,32,272,8220,40510,0 : 1.17,28.7,0,15,128,5888,21154,0 : > 1.05,37.4,0,9,118,8192,19680,0 : 1.39,39.0,0,16,128,6144,23002,0 : > 1.21,28.9,0,16,128,5888,21961,0 : 0.00,0.00,0,0,0,0,0,0 > 8 : 9.01,1.66e+03,1,37,352,524288,53469,0 : 1.96,57.2,0,18,168,12288,33874,0 > : 1.46,55.9,0,15,128,16384,31287,0 : 2.32,58.1,0,23,240,12288,40110,0 : > 2.42,118,0,20,192,32768,35870,0 : 0.00,0.00,0,0,0,0,0,0 > 12 : 5.04,280,1,40,426,73728,58035,0 : 2.53,88.2,0,23,256,23032,40911,0 : > 1.96,96.3,0,16,160,23032,36646,0 : 3.36,103,0,31,272,15872,49167,0 : > 2.70,89.2,0,24,266,23032,43062,0 : 0.00,0.00,0,0,0,0,0,0 > 16 : 4.93,207,1,40,428,45784,59624,0 : 3.44,198,0,26,248,45784,44578,0 : > 1.88,63.1,0,16,152,13824,38925,0 : 4.05,106,1,36,404,13968,53602,0 : > 3.64,198,0,29,276,45784,46609,0 : 0.00,0.00,0,0,0,0,0,0 > 20 : 5.37,209,1,44,449,42880,61169,0 : 3.45,139,0,30,300,31744,47876,0 : > 2.05,85.4,0,18,191,24576,40393,0 : 4.55,133,1,42,448,31744,57253,0 : > 3.73,141,0,32,336,31744,49605,0 : 0.00,0.00,0,0,0,0,0,0 > 24 : 5.41,241,1,44,430,66816,61880,0 : 3.98,225,0,30,294,66816,49771,0 : > 1.89,43.5,0,18,192,7936,41070,0 : 4.96,226,1,42,432,66816,59481,0 : > 4.20,226,1,32,332,66816,51326,0 : 0.00,0.00,0,0,0,0,0,0 > 28 : 7.27,424,1,48,499,106904,62698,0 : 4.37,180,1,35,388,41368,52005,0 : > 2.34,74.7,0,19,198,14048,41758,0 : 6.86,330,1,48,461,61440,61403,0 : > 4.58,181,1,38,396,41368,53506,0 : 0.00,0.00,0,0,0,0,0,0 > 32 : 7.99,802,1,50,467,245760,63450,0 : 7.73,1.00e+03,1,37,350,311296,53600,0 > : 2.36,83.7,0,20,226,16384,42182,0 : 6.23,251,1,52,502,49152,62968,0 : > 7.94,1.00e+03,1,39,366,311296,54999,0 : 0.00,0.00,0,0,0,0,0,0 Table are not nice to read in email. :-} I'll trust the conclusion. > > Conclusions > > Dot3 is perfect on this data. The algorithm effectively handles the poorly > composed dot product as the final summation reorders the round-off errors. > > The LinearCombination method can be made approximately 30% more accurate > (measured by a drop in mean, median and 99 percentile) by switching to > Dekker’s split algorithm. > > The Dekker-split variant of LinearCombination outperforms the dot2s and dot2 > algorithm from the Ogita paper. All methods are doing a standard precision > summation of the round-off terms. Dot2s combines alternating product > round-off and sum round-off in a single term. Dot2 will sum all product > round-off and continue to add the summation round-off. LinearCombination > keeps these separate thus starts summing the sum round-off from zero. In the > case where the total sum is fluctuating around zero this is an advantage and > the method has better performance. > > Dekker’s dot product outperforms all the other methods that use a summation > to 2-fold precision. This method requires a (|a| < |b|) comparison for each > summation and 1 additional float addition over LinearCombination.value. They > share 7muitply+10add per split product and 8add per split sum. So 25n flops > for length n as per Ogita et al and so Dekker’s method is 29n flops (assuming > an absolute float compare is 3 flops). Ogita et al state that the float > compare for Dekker’s two-sum addition requires 6 flops but is up to 50% > slower then the method from Knuth requiring 6 flops without branching. I have > not verified this but with a modern processor the branch prediction and > parallel computation per cycle may change this (the paper was in 2005). > > The dot3 algorithm is fine with this test. I think there is a case for > including this in LinearCombination when a higher accuracy is required > without switching to BigDecimal. Here is an example of the code for dotK. I > did not include the split, product and sum methods for brevity: > > /** > * Dot product to K-fold precision. > * > * @param a Factors. > * @param b Factors. > * @param k The precision. > * @return \( \sum_i a_i b_i \). > * @throws IllegalArgumentException if the sizes of the arrays are different. > */ > public static double dotK(double[] a, > double[] b, int k) { > if (a.length != b.length) { > throw new IllegalArgumentException("Dimension mismatch: " + a.length > + " != " + b.length); > } > > final int len = a.length; > > if (len == 1) { > // Revert to scalar multiplication. > return a[0] * b[0]; > } > > // Store all round-off parts > final double[] r = new double[len * 2]; > > // p is the standard scalar product sum initialised with the first product > double p = a[0] * b[0]; > r[0] = productLow(a[0], b[0], p); > > // Remaining split products added to the current sum and round-off stored > for (int i = 1; i < len; i++) { > final double h = a[i] * b[i]; > r[i] = productLow(a[i], b[i], h); > > final double x = p + h; > r[i + len - 1] = sumLow(p, h, x); > p = x; > } > > if (Double.isNaN(p)) { > // Either we have split infinite numbers or some coefficients were > NaNs, > // return the naive implementation for the IEEE754 result > return p; > } > > // Sum the round-off with the standard sum as the final component > r[r.length - 1] = p; > return sumK(r, k - 1); > } > > /** > * Sum to K-fold precision. > * > * @param p Data to sum. > * @param k The precision. > * @return the sum > */ > private static double sumK(double[] p, int k) { > // k=1 will skip the vector transformation and sum in standard precision. > for (int i = 1; i < k; i++) { > vecSum(p); > } > double sum = 0; > for (final double pi : p) { > sum += pi; > } > return sum; > } > > /** > * Error free vector transformation for summation. > * > * @param p Data. > */ > private static void vecSum(double[] p) { > for (int i = 1; i < p.length; i++) { > final double x = p[i] + p[i - 1]; > p[i - 1] = sumLow(p[i], p[i - 1], x); > p[i] = x; > } > } > > It is possible to efficiently unroll this for small vectors as has been done > in the class already. I did this in the Complex class while testing and the > sumK part looks like this: > > // Cascade summation of the terms p_i > s = p0 + p1; > p0 = sumLow(p0, p1, s); > p1 = s; > s += p2; > p1 = sumLow(p1, p2, s); > p2 = s; > s += p3; > p2 = sumLow(p2, p3, s); > p3 = s; > Etc … > return p0 + p1 + p2 + p3 + ... > > > For discussion: > > 1. Leave LinearCombination alone. > > 2. Update LinearCombination to use Dekker’s split algorithm and the rest > leave unchanged. The split algorithm must be guarded against overflow. This > adds two float comparison operations in extraction of the split part: > > if (value > SAFE_UPPER || value < -SAFE_UPPER) { > // handle overflow > return …; > } > // normal conversion > final double c = MULTIPLIER * value; > return c - (c - value); > > > In the nominal case this condition is false and the rest of the split should > be the same speed as the current conversion to long bits, masking and back to > a double. > > The change would improve the accuracy by ensuring split multiply has no error > due to loss of spurious round-off bits from the final bit of the original > mantissa. > > I note that cm4.util.FastMathCalc uses this method but the factor is (2^30) > and not the correct (2^27 + 1). > > 3. Add a method value(double[] a. double[] b, int k) with k-fold precision to > complement the current 2-fold precision value(double[] a. double[] b). > > 4. Switch to Dekker’s algorithm. However I have no error analysis of whether > this is comparable on different test sets. It is just better here. > > There is a MatLab script in Ogita et al to generate vectors for > multiplication. This may be worth investigating. For now I would not > recommend switching to Dekker’s dot product unless there is evidence it is > generally applicable. > > 5. Add methods value3(…) that use 3-fold precision and unroll the loops for > small vectors. > > > I think that option 2 is preferred as all split multiply methods should use > the correct split from Dekker. > > I would vote for option 3 to add a higher precision version if a user > desires. Since this may be 2x slower it may not be the best option for the > default. > > I would opt to implement option 5 and test it with JMH to see if dot3 is 2.2x > slower than dot2 in the JVM. > > > Thus most of this can be put into a JMH module and tested against the current > class to see the performance effect. > > The type of changes that may end up in LinearCombination comes down to > whether you want precision over speed. IIUC, precision (without resorting to "BigDecimal") was the purpose of "LinearCombination". +1 for making the appropriate changes to the existing "value" method. [I'd suggest to open a JIRA issue and mention the rest of the alternatives there, for future reference; but I wouldn't add them to the API until there is a need for it.] Thanks, Gilles > > > Opinions welcome. > > Alex > > > [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547 > <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547> > [2] https://doi.org/10.1007/BF01397083 <https://doi.org/10.1007/BF01397083> > [3] > http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps > > <http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps> > [4] > https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements > <https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org