> Should we rename div1 to put it into the GMP name space? How about > mpn_div_11? (We could encode in its name that it is for use for small > q, but I'd suggest that we don't do that.) > > That would allow for some assembly experiments.
If you think assembly will make a big difference (which seems likely), that makes sense. And define it only when it's faster than a a div instruction generated by the compiler? I believe METHOD 1 could almost always be pointless if METHOD 3's straight line code is made really efficient. That should in particular be true if the q = 1 code is removed from hgcd2, for machines with good multiplication throughput. $ ./tune/speed -c -p1000000 -s1 mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3 overhead 6.00 cycles, precision 1000000 units of 8.33e-10 secs, CPU freq 1200.00 MHz mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3 1 1492.62 2064.18 #1375.18 So this is 50% speedup over the old method. Not bad! :-) Comments on the patch and old code: The normalisation code only works if operands are not already normalised. Cannot they be that? Should be revive the commented-out div2? It looks good to me, the comment about that it is slower than the branchy div2 is probably from the era where branchy div1/div2 was great. (I suspect that to be when Vax and m68020 was hot.) I think there might be a slight bug or inefficiency here (from hgcd2.c): ASSERT (ah >= bh); ah -= bh; if (ah < (CNST_LIMB (1) << (GMP_LIMB_BITS / 2 + 1))) break; if (ah <= bh) { /* Use q = 1 */ u01 += u00; u11 += u10; } else { mp_double_limb_t rq = div1 (ah, bh); If, in the the last 'if' above ah = bh, we have as a matter of fact q = 2 (with zero remainder). If I read the code right that will be doscovered one iteration later. If seem (ah < bh) is the correct criterion. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel