I pushed new files for 2-adic division, using the agreed-upon semantics and interface. I expanded _q and _qr to a pure _r variant, in order to lower the register pressure for asm variants (no need for a qp parameter).
The only asm file so far is for AMD Zen. This is a more thorough implementation than our old redc_1.asm code. Thew new code has special loops for operands up to 8 limbs, and also does software pipelining of the quotient computation. (The final quotient limb computation will be wasted, but that's no real harm.) The new code runs just a tad bit slower than plain mul_basecase. We should use this in lieu of redc_1 in mpn/generic/powm.c and mpn/generic/sec_powm.c. It is a non-trivial thing, as redc_1 and sbpi1_bdiv_r leaves the remainder in different places; redc_1 puts it in place of the low input dividend limbs while sbpi1_bdiv_r puts it in toward the upper end of the same operand. The end goal is to get rid pf the redc_* interfaces completely. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list [email protected] https://gmplib.org/mailman/listinfo/gmp-devel
