ni...@lysator.liu.se (Niels Möller) writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> If we can get away with that, it gets a lot >> simpler, >> >> if (r >= q_0) /* Needs to be branch free */ >> { >> q--; >> r += d_1; >> } >> return q + (r >= d_1); > > I've given it a try. Below patch almost works (as in "passes most but > not all tests")... I don't yet know if the failure is due to lack of > accuracy, or some less subtle type of bug.
The first failure was for an input with n2 == d1, n1 == d1-1. static void test_divappr_2(void) { mp_limb_t q = divappr_2(0x8000000000000001ULL, 0xffffffffbfffffffULL, 0x8000000000000001ULL, 0xffffffffc0000000ULL, 0xfffffffffffffff8ULL); ASSERT_ALWAYS (q != 0); } Then quotient should be B-1, but divappr tried to return the one of quotient B, which then fails due to overflow. The below variant passes tests: static inline mp_limb_t divappr_2 (mp_limb_t n2, mp_limb_t n1, mp_limb_t d1, mp_limb_t d0, mp_limb_t dinv) { mp_limb_t q1, q0, r1, t1, t0, mask; umul_ppmm (q1, q0, n2, dinv); add_ssaaaa (q1, q0, q1, q0, n2, n1); umul_ppmm (t1, t0, q1, d0); q1++; /* May overflow */ r1 = n1 - d1 * q1 - t1; /* FIXME: See if we can get sufficient accuracy without this? */ t0 += d0; r1 -= (t0 < d0); mask = - (r1 >= q0); q1 += mask; r1 += mask & d1; /* Add t0-related carry? */ #if 0 q1 += (r1 >= d1); #else /* FIXME: Take care of this case more efficiently, need analysis of the case n1 == d1. */ if (r1 >= d1 && q1 != GMP_NUMB_MAX) q1++; #endif return q1; } This needs more analysis. Since r in this code is also approximate, we get subtleties when incrementing or decrementing r. Incrementing r may be needed to ensure that we don't produce a quotient which is too small in the context of schoolbook division. But in the failure case, that pushes r1 over the limit and triggers the final r1 >= d1 check, even though q1 was already maximal B - 1. And we can't make arbitrary tweaks to r1, since we must at all times ensure that the r1 >= q0 test is accurate, and that's pretty tight already. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel