ni...@lysator.liu.se (Niels Möller) writes: > I've written several div1 in asm (arm v5 method 2, 64-bit arm v8 method > 2 and 3, 64-bit x86 method 2).
Nice.
x64-div_11-m2.asm
Description: Binary data
arm32-div_11-m2.asm
Description: Binary data
arm64-div_11-m2.asm
Description: Binary data
arm64-div_11-m3.asm
Description: Binary data
ppc64-div_11-m2.asm
Description: Binary data
> I now believe a table-based thing which gets the quotient right > 90% of > the time will be the best approach. What should the table look like? If one looks up only the divisor bits, (i.e., a reciprocal table) one gets better precision with smaller table size. If one looks up bits of both inputs, table gets larger. One would then tabulate something like 1uuuuu / 1.ddddd, and shift the result depending on the bit size difference. Both variants might be considered. It would seem that getting a quotient directly, i.e., indexing with bits from both dividend and divisor would be fastest. But forming such an index might take some cycles extra compared to indexing with just the divisor. The extra multiply needed when indexing with just the divisor might be faster. We cannot expect the quotient to close to correct when n / d is large, i.e., a single adjustment step will not be sufficient. We need to detect that (perhaps by comparing the count_leading_zeros of the dividend and divisor) and do a full /. That'll be rare, so mispredicton penalty will be low. -- Torbjörn Please encrypt, key id 0xC8601622
_______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel