Re: [fpc-devel] Register renaming and false dependency question

J. Gareth Moreton via fpc-devel Mon, 18 Oct 2021 02:14:34 -0700

What I'm figuring is that it should be fine for -Os, since thatoptimises for size over speed, and -Os uses "or $-1, %rax" for setting aregister to -1 instead of "mov $-1, %rax". I copied this behaviour fromMSVC after finding a thread about the false dependency that OR causes(Florian warned me of the false dependency when I proposed theoptimisation regardless of size or speed).

Outside of -Os, if the immediate previous instruction uses the sameregister, then it's guaranteed to have been loaded and there shouldn'tbe a pipeline stall, although out-of-order execution may cause someproblems. This one is going to be interesting to time.

In regards to how frequently it appears in code, if it turns out thatit's very infrequent (normally I compile and analyse the RTL for thispurpose, and sometimes the compiler itself), I likely won't code it,although in the case of the "fast mod" algorithm, it appears veryfrequently, so should probably be coded directly, especially as themultiplication can take 3 cycles, depending on the processor, and is thelimiting factor.


Gareth aka. Kit

On 17/10/2021 23:53, Stefan Glienke via fpc-devel wrote:

According to compiler explorer clang, gcc and msvc compile this to thesame code with -O3 as FPC does. So I would assume that is fine.
Am 17.10.2021 um 13:25 schrieb J. Gareth Moreton via fpc-devel:
Hi everyone,
While reading up on some algorithms, I came across a recommendationof using a shorter arithmetic function to change the value of aconstant in a register rather than loading the new value directly. However, the algorithm assumes a RISC-like processor, so I'm not sureif it applies to an Intel x86-64 processor. Consider the following:
movq $0xaaaaaaaaaaaaaaab,%rax
imulq   %rax,%rcx
movq $0x5555555555555555,%rax
cmpq    %rax,%rcx
setle  %al
This algorithm sets %al to 1 if %rcx is divisible by 3, and 0 if it'snot, and was compiled from the following Pascal code (under -O3, but-O1 produces almost exactly the same):
function IsDivisible3(Numerator: QWord): Boolean;
begin
  Result := (Numerator * $AAAAAAAAAAAAAAAB) <= $5555555555555555;
end;
(One of my merge requests produces this code from "Result := (x mod3) = 0")
My question is this: can "movq $0x5555555555555555,%rax" be replacedwith "shrq $0x1,%rax" without incurring an additional pipelinestall? The MOV instruction takes 10 bytes to store, while "SHR 1"takes only 3. Given that %rax is used beforehand and the CMPinstruction has to wait until the IMUL instruction has finishedexecuting, logic tells me that I can get away with it here, but I'mnot sure if the metric to go by is the execution speed of IMUL (i.e.the IMUL instruction is the limiting factor before CMP can beexecuted), or the simple fact that the previous value of %rax wasused and will be loaded with $AAAAAAAAAAAAAAAB by the time it comesto load it with a new value.
Gareth aka. Kit
_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Register renaming and false dependency question

Reply via email to