> On Thu, Aug 07, 2025 at 09:37:24AM +0000, Thomas de Bock wrote:
> > Why does that it not matter that the destination of all the chain blocks'
> > FALSE edge is the same, if we have:
> > <bb 1> :
> > _1 = this_13(D)->a;
> > _2 = _14(D)->a;
> > if (_1 == _2)
> >   goto <bb 2>; [INV]
> > else
> >   goto <bb X>; [INV]
> > <bb 2> :
> > _3 = this_13(D)->b;
> > _4 = _14(D)->b;
> > if (_3 == _4)
> >   goto <bb 3>; [INV]
> > else
> >   goto <bb Y>; [INV]
> > ...
> > Where X != B, and we replace bb 1 and 2 with, for example, a single bb:
> >
> > <bb 4> :
> > _5 = &this_13(D)->a;
> > _6 = &_14(D)->a;
> > _7 = __builtin_memcmp(_5, _6, somelen)
> > if (_1 == _2)
> >   goto <bb 3>; [INV]
> > else
> >   goto <bb Z>; [INV]
> >
> > How are we going to decide what block Z is, since <bb 1> and <bb 2> go 
> > somewhere
> > different when their respective field was found to not match, but in <bb 4> 
> > we
> > do not know which field caused the mismatch so we can only branch to 1 
> > block.
> > But using either X or Y for Z will result in an "optimization" that changes 
> > the
> > program's behaviour.

> In order to replace the series of equality comparisons with memcmp, all the
> edges for cases when the two SSA_NAMEs aren't equal (so for == the
> EDGE_FALSE_VALUE destination, for != EDGE_TRUE_VALUE destination) have to be
> the same.  So, X has to be the same as Y in your example above.
> And the other edges have to point to (except for the last bb in the series)
> point to the bb with next comparison.

Ah it seems I misunderstood you, that's exactly the point I was
trying to illustrate.

> Furthermore, not just that, but if there are any PHIs in that basic block,
> for all PHIs the PHI arguments from all those edges have to be the same.
> Otherwise the optimization needs to give up.

> BTW, I don't think the pattern recognition should be done too early, I think
> it should be done shortly after IPA optimizations, so that all inlining is
> done and you can actually see everything.

Agreed, though if the optimization ends up only being applied to default ==
operators, an alternative could be to just recurse into fields' default
== operators and merging them, before inlining happens.
Since applying it after inlining of the == operator happens,
(I assume) makes it impossible to apply it only to the default equality 
operators.

>         Jakub


Btw, was hoping you could look at this part of my previous e-mail
and tell me what you think. Since it's very hard to do more work
on the optimization currently without a solid knowledge of what
use-cases should be respected:

"I see what you mean, I think there is probably no way to apply the
optimization to all functions then indeed, since there is no way to
know if the early break on inequality of a field was arbitrary
or because it indicates the following data will be out of range.
My main goal currently though is applying the optimization to
the default comparison operator, and is it not undefined
behaviour to call it on an incomplete struct?

For -O2:
bool __attribute__((noinline))
f(std::array<int32_t, 4>* x, std::array<int32_t, 4>* y) {
    return *x == *y;
}

int main() {

  std::array<int32_t, 4>* a;
  std::array<int32_t, 4>* b;
  a = (std::array<int32_t, 4>*)malloc(sizeof(std::array<int32_t, 3>));
  b = (std::array<int32_t, 4>*)malloc(sizeof(std::array<int32_t, 3>));
  return f(a, b);
}
Also generates a memcmp call which reads out of bounds of the allocated
memory, doesn't this show that for default equality we can assume
that all the structure's data was allocated?

https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4114.htm
Does specify "a comparison of the subobjects that have the same position
in both objects against each other until one subobject is not equal to
the other."
But I don't see anything regarding when the memory should be loaded.

Either way, I think the effect this issue has on run-time does
demand a way to, at the very least, have a way of applying
this (possibly through different means) eventual vectorization of
comparisons. Even if it's through a flag or attribute etc, when
properly implemented this would speed up default comparisons
and decrease code size:
(https://docs.google.com/document/d/1CKp8cIfURXbPLSap0jFio7LW4suzR10u5gX4RBV0k7c/edit?tab=t.0#heading=h.mub2vspvhz42).
It would also help make code that currently uses memcmp to achieve
vectorization in struct comparisons much cleaner.
"



________________________________
From: Jakub Jelinek <ja...@redhat.com>
Sent: 07 August 2025 14:28:19
To: Thomas de Bock
Cc: Richard Biener; gcc@gcc.gnu.org
Subject: [ext] Re: Help with comparison-merging optimization pass

On Thu, Aug 07, 2025 at 09:37:24AM +0000, Thomas de Bock wrote:
> Why does that it not matter that the destination of all the chain blocks'
> FALSE edge is the same, if we have:
> <bb 1> :
> _1 = this_13(D)->a;
> _2 = _14(D)->a;
> if (_1 == _2)
>   goto <bb 2>; [INV]
> else
>   goto <bb X>; [INV]
> <bb 2> :
> _3 = this_13(D)->b;
> _4 = _14(D)->b;
> if (_3 == _4)
>   goto <bb 3>; [INV]
> else
>   goto <bb Y>; [INV]
> ...
> Where X != B, and we replace bb 1 and 2 with, for example, a single bb:
>
> <bb 4> :
> _5 = &this_13(D)->a;
> _6 = &_14(D)->a;
> _7 = __builtin_memcmp(_5, _6, somelen)
> if (_1 == _2)
>   goto <bb 3>; [INV]
> else
>   goto <bb Z>; [INV]
>
> How are we going to decide what block Z is, since <bb 1> and <bb 2> go 
> somewhere
> different when their respective field was found to not match, but in <bb 4> we
> do not know which field caused the mismatch so we can only branch to 1 block.
> But using either X or Y for Z will result in an "optimization" that changes 
> the
> program's behaviour.

In order to replace the series of equality comparisons with memcmp, all the
edges for cases when the two SSA_NAMEs aren't equal (so for == the
EDGE_FALSE_VALUE destination, for != EDGE_TRUE_VALUE destination) have to be
the same.  So, X has to be the same as Y in your example above.
And the other edges have to point to (except for the last bb in the series)
point to the bb with next comparison.
Furthermore, not just that, but if there are any PHIs in that basic block,
for all PHIs the PHI arguments from all those edges have to be the same.
Otherwise the optimization needs to give up.

BTW, I don't think the pattern recognition should be done too early, I think
it should be done shortly after IPA optimizations, so that all inlining is
done and you can actually see everything.

        Jakub


This e-mail and any attachments may contain information that is confidential 
and proprietary and otherwise protected from disclosure. If you are not the 
intended recipient of this e-mail, do not read, duplicate or redistribute it by 
any means. Please immediately delete it and any attachments and notify the 
sender that you have received it by mistake. Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail or any 
attachments. The DRW Companies make no representations that this e-mail or any 
attachments are free of computer viruses or other defects.

Reply via email to