Re: [fpc-devel] x86: Efficiency of opposing CMOVs

Florian Klämpfl via fpc-devel Sat, 16 Apr 2022 02:19:07 -0700


> Am 16.04.2022 um 06:49 schrieb J. Gareth Moreton via fpc-devel 
> <[email protected]>:
> 
> Hi everyone,
> 
> In the x86_64 assembly dumps, I frequently come across combinations such as 
> the following:
> 
>     cmpl    %ebx,%edx
>     cmovll    %ebx,%eax
>     cmovnll    %edx,%eax
> 
> This is essentially the tertiary C operator "x = cond ? trueval : falseval", 
> or in Pascal "if (cond) then x := trueval else x := falseval;".  However, 
> because the CMOV instructions have exact opposite conditions, is it better to 
> optimise it into this?
> 
>     movl    %ebx,%eax
>     cmpl    %ebx,%edx
>     cmovnll    %edx,%eax
> 
> It's smaller, but is it actually faster (or the same speed)?  At the very 
> least, the two CMOV instructions depend on the CMP instruction being 
> completed, but I'm not sure if the second CMOV depends on the first one being 
> evaluated (because of %eax).  With the second block of code, the MOV and CMP 
> instructions can execute simultaneously.
> 
> My educated guess tells me that MOV/CMP/CMOV(~c) is faster than 
> CMP/CMOVc/CMOV(~c), but I haven't been able to find an authoritive source on 
> this yet.


cmov is normally slow, so the latter should be slower, a brief test shows this 
also.

$ cat tbench1.pp


procedure p;
var
  a,b,c : array[0..100] of longint;
  i,j,e,f,g : longint;
begin
    for j:=low(a) to high(a) do
      begin
        a[j]:=random(10);
        b[j]:=random(10);
      end;
    for i:=1 to 10000000 do
      for j:=low(a) to high(a) do
        begin
          e:=a[j];
          f:=b[j];
          g:=e;
          if e<f then
            g:=f;
          c[j]:=g;
        end;
end;

begin
  p;
end.

$ time ./tbench1

real    0m0.752s
user    0m0.748s
sys     0m0.004s


$ cat tbench2.pp
procedure p;
var
  a,b,c : array[0..100] of longint;
  i,j,e,f,g : longint;
begin
    for j:=low(a) to high(a) do
      begin
        a[j]:=random(10);
        b[j]:=random(10);
      end;
    for i:=1 to 10000000 do
      for j:=low(a) to high(a) do
        begin
          e:=a[j];
          f:=b[j];
          if e<f then
            g:=f
          else
            g:=e;
          c[j]:=g;
        end;
end;

begin
  p;
end.


$ time ./tbench2

real    0m0.997s
user    0m0.997s
sys     0m0.000s

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] x86: Efficiency of opposing CMOVs

Reply via email to