Re: [fpc-devel] Producing assembly with less branches?

Stefan Glienke Sun, 19 Jul 2020 18:48:53 -0700

Am 20.07.2020 um 02:37 schrieb J. Gareth Moreton:

On 19/07/2020 22:37, Stefan Glienke wrote:
clang and gcc emit this - I would guess they detect quite some commonpatterns like this.
 ...
  cmp     eax, edx
  mov     edx, -1
  setg    al
  movzx   eax, al
  cmovl   eax, edx
  ret
I think I can make improvements to that already! (Note the sequenceabove and below are in Intel notation)
CMP   EAX, EDX
MOV EAX, 0 ; Note: don't use XOR EAX, EAX because this scrambles theFLAGS register
MOV   EDX, -1
SETG   AL
CMOVL EAX, EDX
RET
I believe that executes one cycle faster (20% faster for the entiresequence) on modern processors because it shortens the dependencychain that exists between "SETG AL; MOVZX EAX, AL; CMOVL EAX, EDX". Itmight require some testing though to be sure.

That is what clang does (the first snippet I posted) by using ecx forthe 0 and it does so with the shorter xor before the cmp which resultsin 16bytes of code - gcc is 17, yours 19.

Anyhow they all don't differ in execution speed but are 2.5 times fasterthan the double cmp and cond jump galore ;)



--
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Producing assembly with less branches?

Reply via email to