On 19/07/2020 22:37, Stefan Glienke wrote:
clang and gcc emit this - I would guess they detect quite some common patterns like this.

 ...
  cmp     eax, edx
  mov     edx, -1
  setg    al
  movzx   eax, al
  cmovl   eax, edx
  ret

I think I can make improvements to that already! (Note the sequence above and below are in Intel notation)

CMP   EAX, EDX
MOV   EAX, 0 ; Note: don't use XOR EAX, EAX because this scrambles the FLAGS register
MOV   EDX, -1
SETG   AL
CMOVL EAX, EDX
RET

I believe that executes one cycle faster (20% faster for the entire sequence) on modern processors because it shortens the dependency chain that exists between "SETG AL; MOVZX EAX, AL; CMOVL EAX, EDX". It might require some testing though to be sure.

The difficulties with CMOV is that it can only write to registers (and not 8-bit ones) and can read from memory addresses, but not write to them.  If there are registers free at that point in the code though, one could potentially write the constants to temporary registers beforehand, and then assign them to the registers that matter via CMOV (e.g. as shown above with the -1 value).

I'm all for improving the generated assembly language where I can.  There are some traps that one has to be careful of though, usually involving false dependencies.  For example, when setting registers to -1, some compilers would use "OR EAX, -1" instead of "MOV EAX, -1" on account of it taking fewer bytes to encode.  Both Visual C++ and GCC did this at one point, but this causes a false dependency with the previous value of EAX so would incur a performance penalty.

The final thing to remember is that, by default, i386 will produce code that will run on the oldest 80386 processors.  CMOV was only introduced with the Intel Pentium Pro in 1995.  If compiling for x86_64, or if you specify compiler parameters to set the minimum processor support, then CMOV will be used.

(It also just made me realise that Pass 2 of the peephole optimiser would not work with virtual registers because of CMOV's restriction in that it can't write to memory addresses, including the stack)

Gareth aka. Kit

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to