Marc Glisse <marc.gli...@inria.fr> writes: > On Sat, 28 Oct 2017, Niels Möller wrote: > >> It would be nice if we could find a portable way to add two floating >> point values without rounding up. Would something like this work? >> >> s = a + b; /* Assume a > b */ >> r = (s - a) - b; /* No rounding expected here. */ >> if (r > 0) >> s -= 2*r; > > Not sure where the 2*r is coming from. And I am not very optimistic > that there is such a simple formula that works for any rounding mode, > though I could easily be wrong.
Sorry for the confusion, I mistakenly thought that since the rounding error affects the lsb, it would be of the same order. But I now realize that r may be much smaller, and affect the lsb only after carry propagation through low (otherwise ignored) mantissa of b. So on second thought, I don't think we can fix the rounding by only examining a, b, and s. We could compute rounded_b = b + r, and then we know that s == a + b_rounded, with no rounding, but that still doesn't tell us the position of the lsb, only that it's somewhere between the least signficant 1-bit of b_rounded and the most significant 1-bit of r. > There is also the question of how portable exactly it needs to be. C99 > nextafter can be helpful. So if (r > 0) s = nextafter(s, -infinity) might work. Except that (i) c99 might not be available on obscure systems where this code is used, and (ii) we try to avoid depending on linking with -lm. > It may be easier to assert that FLT_RADIX==2 and use DBL_MANT_DIG to > avoid any rounding. That's probably the sane way to do it. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel