t...@gmplib.org (Torbjörn Granlund) writes: > Ah, so you're thinking of not just a+b*tab_b[...] but > a*tab_a[...]_b*tab_b[...]? I understand the latter much less, and don't > know how to deal with the introduced spurious factors which seem > inevitable.
If you do a <-- s a + t b, b unchanged then you can get spurious factors from s. And in the case of the binary algorithm, that's fine if s is a power of 2. (I think you mentioned that earlier: Try to find k so that 2^k mod b is a few bits smaller than b, and set a <-- 2^k a - q b, to increase euclid progress). If we update both a and b with a matrix multiply, (a;b) <-- M (a;b), we can get spurious factors from the determinant of M. In an earlier mail I sketched a variant that update both a and b only in a few of the cases, and then only using matrices where the determinant happens to be a power of 2. > I see two complications: > > 1. We need an efficient way to find out which of |u| and |v| is > smallest, for CMP_SWAP. But we can likely get away with something a > bit sloppy, and accept arbitrary results in the case that u and v are > "close", e.g., same bit size. > > We do that now. We just compare the top limbs. Only assuming u and v unsigned, right? For two's complement, we need to take absolute values into account in the comparison in some way. > 2. For the variant without branches, we'd need an mpn_addmul_1s working > with two's complement. It could be implemented as mpn_addmul_1 + > mpn_cnd_sub, but not sure if there's some easy way to implement with > a single pass, somehow sign extending on the fly. > > This might be easy or it might be hard. We have signed 64b x 64b -> > 128b on some machines (including x86) which might be useful. I doubt sign multiply would be that useful. An instruction to do signed 64b x unsigned 64b --> signed 128b, would be nice, but I don't know any arch having that. > > cy = mpn_addmul_1 (up, vp, n, m); > > a = up[0]; > > count_trailing_zeros (cnt, a); > > One could move the count_trailing_zeros before mpn_addmul_1 by > duplicating the mul, a = up[0] + m * vp[0]. Would make more sense if we > also have an addmul + shift loop. > > I don't follow your reasoning here. Just suggesting that we'd may get slightly less latency with a = up[0] - m*vp[0]; count_trailing_zeros (cnt, a); cy = mpn_addmul_1 (up, vp, n, m); Maybe not so useful on it's own, but with assembly, there are further opportunities. > This addmul_1-only, positive-only table is a strange gcd animal. It > will never yield U = 0. Individual limbs are not unlikely to become > zero. Ooops. Then one definitely need a different stop condition than comparing something to zero. > My latest loop loks like this: > > while (n > 2) > { > mp_limb_t a, b, m, cy; > int cnt, ucnt, vcnt; > > CMP_SWAP (up, vp, n); > > a = up[0]; > b = vp[0]; > m = tabp[((a & mask) << (4 - 1)) + ((b & mask) >> 1)]; > > count_leading_zeros (ucnt, up[n - 1]); > count_leading_zeros (vcnt, vp[n - 1]); If your willing to have both count_leading_zeros and count_trailing_zeros in the loop, maybe it's simpler and more efficient with a table-based binary euclid? CND_SWAP count_leading_zeros(cnt, up[n-1] | vp[n-1]) q = tab[...high bits...] if (q odd) u <-- u - q v else u <-- (q+1) u - v count_trailing_zeros(cnt, up[0]); /* At least one */ u >>= cnt; (plus handling of less likely special cases). It seems like an mpn_rsbmul_1 would be a useful primitive for several variants. > 2. The idea with m = (CNST_LIMB(1) << cnt - 1) - m is to chop of a bit > also in the top end of U, not just get >= 5 bits of zeros in the low > end. It is a generalisation of the tables I played with before for > gcd_11; any entry can be offset by a multiple of 2^5 = 32 without > making less than 5 low bit become 0. I actually believe this is a > great idea, but I haven't checked if it is as effective as I think it > is... Sounds like the opposite of binary euclid. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel