Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
Hi everyone,
I've found a potential optimisation for conditions of the form "(x <> 0)
and (y <> 0)", which are very common because this is semantically
equivalent to "Assigned(x) and Assigned(y)", for example, and such a
construct is generated implicity in TObject.Destroy, for example. The
node tree for TObject.Destroy example (after the first pass) is as
follows (it checks the value of Self and VMT):
The node tree inside the if-node's condition branch is as follows:
<andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done" complexity="3">
<unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
complexity="1">
<loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done"
complexity="0">
<symbol>self</symbol>
</loadn>
<niln resultdef="TObject" pos="329,9" flags="nf_pass1_done"
complexity="0">
</niln>
</unequaln>
<unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
complexity="1">
<typeconvn resultdef="$char_pointer" pos="329,9"
flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0"
convtype="tc_equal">
<typeconvn resultdef="Pointer" pos="329,9"
flags="nf_pass1_done" complexity="0" convtype="tc_equal">
<loadn resultdef="<no type symbol>" pos="329,9"
flags="nf_pass1_done" complexity="0">
<symbol>vmt</symbol>
</loadn>
</typeconvn>
</typeconvn>
<pointerconstn resultdef="$char_pointer" pos="329,9"
flags="nf_pass1_done,nf_explicit" complexity="0">
<value>$0000000000000000</value>
</pointerconstn>
</unequaln>
</andn>
On x86-64_win64, the following assembly language is generated (smart
pipelining and out-of-order execution MIGHT permit it to be executed in
3 cycles, but it's more likely going to take 4 or 5):
testq %rbx,%rbx
setneb %al
testq %rsi,%rsi
setneb %dl
andb %dl,%al
DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not
B", and "not (A and B) = not A or not B", and using these, we can
develop much more efficient assembly language (which will almost
certainly take 3 cycles to run, and is much smaller):
movq %rbx,%rdx
orq %rsi,%rdx
seteb %al
For this particular routine, %rsi and %dl/%rdx are not used afterwards,
and can be simplified further (this bit will probably be a peephole
optimisation), dropping the cycle count to 2:
orq %rbx,%rsi
seteb %al
I am not sure how this is equal:
consider rbx=1, esi=1: current code results in al=1; the last sample
results in al=0. Or do I miss something?
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel