Re: [fpc-devel] Division optimisations

J. Gareth Moreton via fpc-devel Fri, 10 Sep 2021 13:59:57 -0700

I suppose in truth, I can, and that in itself is probably fairlycross-platform (although I'll stick with x86 for the moment and get thatworking). Sometimes the simple solution eludes me! Is there anything Ineed to take into account when it comes to range checking (that is, if athird party tries to compile a unit with range checking enabled), since"numerator * $AAAAAAAB" when constrained to 32 bits will almost alwaysoverflow?


Gareth aka. Kit


On 10/09/2021 20:53, Florian Klämpfl via fpc-devel wrote:

Am 10.09.21 um 21:17 schrieb J. Gareth Moreton via fpc-devel:
Hi everyone,
I'm looking at ways to optimise div and mod, starting with x86 andthen probably AArch64. The obvious one is attempting to merge "Q :=N div D; R := N mod D;", where D is a variable (but invariant betweenthe two instructions), since DIV returns the quotient in R/EAX andthe remainder in R/EDX in a single operation, or converting thelatter equation to "R := N - (Q * D);" if D is a constant.
However, inspired somewhat by "Hacker's Delight", I would like tofirst see if I can optimise the Boolean condition "(X mod C) = 0",where C is a constant. By calculating the multiplicative reciprocalof C (it may or may not be equal to the 'magic div' constant), youcan perform it with just a multiplication and a comparison - forexample, when dividing by 3 and returning the remainder:
mov (numerator),%reg1
mov $AAAAAAAB,%reg2 { 3 * $AAAAAAAB = 1 (mod 2^32) }
imul %reg1,%reg2
cmp $55555555,%reg2 { 2^32 div 3 = $55555555 }
If %reg2 is less than or equal to $55555555, then the numerator is anexact multiple of 3, and if it's greater, then it is not an exactmultiple. The proof for this is explained in Hacker's Delight, butrelies on the fact that 3 and 2^32 are relatively prime and the exactmultiples of 3 multiplied by 3's reciprocal modulo 2^32 map onto thevalues 0 to $55555555 (if the divisor is even, which means it's notrelatively prime to 2^32, you have to do a bit of trickery with a bitrotation, but done properly, it's only 1 extra instruction).
I'm trying to think of a way to make this clean and flexible,especially where future expansion is concerned. One idea I had wasto create a new platform-specific node such as "tx86divisible", whichtakes an integer variable (x) and an integer constant (c) and returnsTrue if x mod c = 0, and "(X mod C) = 0" code is converted to thisnode via tx86addnode.simplify (the node used for comparisons), so itcan be quickly converted into the optimal code inpass_generate_code. The other option is to do this conversion inpass_generate_code, where a new node type isn't required but might bea little trickier to make cross-platform... if it's possible to make"tx86divisible" completely cross-platform - that is, have animplementation on every target - the node conversion code only has toexist in a single place, thus improving maintainability.
What do you suggest?
Can't you generate a mul and cmp node in tx86addnode.simplify whichsimulates this behavior?
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Division optimisations

Reply via email to