Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
Hi everyone,

I've found a potential optimisation for conditions of the form "(x <> 0) and (y <> 0)", which are very common because this is semantically equivalent to "Assigned(x) and Assigned(y)", for example, and such a construct is generated implicity in TObject.Destroy, for example.  The node tree for TObject.Destroy example (after the first pass) is as follows (it checks the value of Self and VMT):

The node tree inside the if-node's condition branch is as follows:

<andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done" complexity="3">
   <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" complexity="1">       <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done" complexity="0">
          <symbol>self</symbol>
       </loadn>
      <niln resultdef="TObject" pos="329,9" flags="nf_pass1_done" complexity="0">
       </niln>
    </unequaln>
   <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" complexity="1">       <typeconvn resultdef="$char_pointer" pos="329,9" flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0" convtype="tc_equal">          <typeconvn resultdef="Pointer" pos="329,9" flags="nf_pass1_done" complexity="0" convtype="tc_equal">             <loadn resultdef="&lt;no type symbol&gt;" pos="329,9" flags="nf_pass1_done" complexity="0">
                <symbol>vmt</symbol>
             </loadn>
          </typeconvn>
       </typeconvn>
      <pointerconstn resultdef="$char_pointer" pos="329,9" flags="nf_pass1_done,nf_explicit" complexity="0">
          <value>$0000000000000000</value>
       </pointerconstn>
    </unequaln>
</andn>

On x86-64_win64, the following assembly language is generated (smart pipelining and out-of-order execution MIGHT permit it to be executed in 3 cycles, but it's more likely going to take 4 or 5):

     testq    %rbx,%rbx
     setneb   %al
     testq    %rsi,%rsi
     setneb   %dl
     andb     %dl,%al

DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not B", and "not (A and B) = not A or not B", and using these, we can develop much more efficient assembly language (which will almost certainly take 3 cycles to run, and is much smaller):

     movq     %rbx,%rdx
     orq      %rsi,%rdx
     seteb    %al

For this particular routine, %rsi and %dl/%rdx are not used afterwards, and can be simplified further (this bit will probably be a peephole optimisation), dropping the cycle count to 2:

     orq      %rbx,%rsi
     seteb    %al

I am not sure how this is equal:

consider rbx=1, esi=1: current code results in al=1; the last sample results in al=0. Or do I miss something?
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to