ni...@lysator.liu.se (Niels Möller) writes:

  > I've written several div1 in asm (arm v5 method 2, 64-bit arm v8 method
  > 2 and 3, 64-bit x86 method 2).  

  Nice.

Attachment: x64-div_11-m2.asm
Description: Binary data

Attachment: arm32-div_11-m2.asm
Description: Binary data

Attachment: arm64-div_11-m2.asm
Description: Binary data

Attachment: arm64-div_11-m3.asm
Description: Binary data

Attachment: ppc64-div_11-m2.asm
Description: Binary data

  > I now believe a table-based thing which gets the quotient right > 90% of
  > the time will be the best approach.

  What should the table look like? If one looks up only the divisor bits,
  (i.e., a reciprocal table) one gets better precision with smaller table
  size. If one looks up bits of both inputs, table gets larger. One would
  then tabulate something like 1uuuuu / 1.ddddd, and shift the result
  depending on the bit size difference.

Both variants might be considered.  It would seem that getting a
quotient directly, i.e., indexing with bits from both dividend and
divisor would be fastest.  But forming such an index might take some
cycles extra compared to indexing with just the divisor.  The extra
multiply needed when indexing with just the divisor might be faster.

We cannot expect the quotient to close to correct when n / d is large,
i.e., a single adjustment step will not be sufficient.  We need to
detect that (perhaps by comparing the count_leading_zeros of the
dividend and divisor) and do a full /.  That'll be rare, so mispredicton
penalty will be low.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to