> On Thu, Aug 07, 2025 at 09:37:24AM +0000, Thomas de Bock wrote: > > Why does that it not matter that the destination of all the chain blocks' > > FALSE edge is the same, if we have: > > <bb 1> : > > _1 = this_13(D)->a; > > _2 = _14(D)->a; > > if (_1 == _2) > > goto <bb 2>; [INV] > > else > > goto <bb X>; [INV] > > <bb 2> : > > _3 = this_13(D)->b; > > _4 = _14(D)->b; > > if (_3 == _4) > > goto <bb 3>; [INV] > > else > > goto <bb Y>; [INV] > > ... > > Where X != B, and we replace bb 1 and 2 with, for example, a single bb: > > > > <bb 4> : > > _5 = &this_13(D)->a; > > _6 = &_14(D)->a; > > _7 = __builtin_memcmp(_5, _6, somelen) > > if (_1 == _2) > > goto <bb 3>; [INV] > > else > > goto <bb Z>; [INV] > > > > How are we going to decide what block Z is, since <bb 1> and <bb 2> go > > somewhere > > different when their respective field was found to not match, but in <bb 4> > > we > > do not know which field caused the mismatch so we can only branch to 1 > > block. > > But using either X or Y for Z will result in an "optimization" that changes > > the > > program's behaviour.
> In order to replace the series of equality comparisons with memcmp, all the > edges for cases when the two SSA_NAMEs aren't equal (so for == the > EDGE_FALSE_VALUE destination, for != EDGE_TRUE_VALUE destination) have to be > the same. So, X has to be the same as Y in your example above. > And the other edges have to point to (except for the last bb in the series) > point to the bb with next comparison. Ah it seems I misunderstood you, that's exactly the point I was trying to illustrate. > Furthermore, not just that, but if there are any PHIs in that basic block, > for all PHIs the PHI arguments from all those edges have to be the same. > Otherwise the optimization needs to give up. > BTW, I don't think the pattern recognition should be done too early, I think > it should be done shortly after IPA optimizations, so that all inlining is > done and you can actually see everything. Agreed, though if the optimization ends up only being applied to default == operators, an alternative could be to just recurse into fields' default == operators and merging them, before inlining happens. Since applying it after inlining of the == operator happens, (I assume) makes it impossible to apply it only to the default equality operators. > Jakub Btw, was hoping you could look at this part of my previous e-mail and tell me what you think. Since it's very hard to do more work on the optimization currently without a solid knowledge of what use-cases should be respected: "I see what you mean, I think there is probably no way to apply the optimization to all functions then indeed, since there is no way to know if the early break on inequality of a field was arbitrary or because it indicates the following data will be out of range. My main goal currently though is applying the optimization to the default comparison operator, and is it not undefined behaviour to call it on an incomplete struct? For -O2: bool __attribute__((noinline)) f(std::array<int32_t, 4>* x, std::array<int32_t, 4>* y) { return *x == *y; } int main() { std::array<int32_t, 4>* a; std::array<int32_t, 4>* b; a = (std::array<int32_t, 4>*)malloc(sizeof(std::array<int32_t, 3>)); b = (std::array<int32_t, 4>*)malloc(sizeof(std::array<int32_t, 3>)); return f(a, b); } Also generates a memcmp call which reads out of bounds of the allocated memory, doesn't this show that for default equality we can assume that all the structure's data was allocated? https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4114.htm Does specify "a comparison of the subobjects that have the same position in both objects against each other until one subobject is not equal to the other." But I don't see anything regarding when the memory should be loaded. Either way, I think the effect this issue has on run-time does demand a way to, at the very least, have a way of applying this (possibly through different means) eventual vectorization of comparisons. Even if it's through a flag or attribute etc, when properly implemented this would speed up default comparisons and decrease code size: (https://docs.google.com/document/d/1CKp8cIfURXbPLSap0jFio7LW4suzR10u5gX4RBV0k7c/edit?tab=t.0#heading=h.mub2vspvhz42). It would also help make code that currently uses memcmp to achieve vectorization in struct comparisons much cleaner. " ________________________________ From: Jakub Jelinek <ja...@redhat.com> Sent: 07 August 2025 14:28:19 To: Thomas de Bock Cc: Richard Biener; gcc@gcc.gnu.org Subject: [ext] Re: Help with comparison-merging optimization pass On Thu, Aug 07, 2025 at 09:37:24AM +0000, Thomas de Bock wrote: > Why does that it not matter that the destination of all the chain blocks' > FALSE edge is the same, if we have: > <bb 1> : > _1 = this_13(D)->a; > _2 = _14(D)->a; > if (_1 == _2) > goto <bb 2>; [INV] > else > goto <bb X>; [INV] > <bb 2> : > _3 = this_13(D)->b; > _4 = _14(D)->b; > if (_3 == _4) > goto <bb 3>; [INV] > else > goto <bb Y>; [INV] > ... > Where X != B, and we replace bb 1 and 2 with, for example, a single bb: > > <bb 4> : > _5 = &this_13(D)->a; > _6 = &_14(D)->a; > _7 = __builtin_memcmp(_5, _6, somelen) > if (_1 == _2) > goto <bb 3>; [INV] > else > goto <bb Z>; [INV] > > How are we going to decide what block Z is, since <bb 1> and <bb 2> go > somewhere > different when their respective field was found to not match, but in <bb 4> we > do not know which field caused the mismatch so we can only branch to 1 block. > But using either X or Y for Z will result in an "optimization" that changes > the > program's behaviour. In order to replace the series of equality comparisons with memcmp, all the edges for cases when the two SSA_NAMEs aren't equal (so for == the EDGE_FALSE_VALUE destination, for != EDGE_TRUE_VALUE destination) have to be the same. So, X has to be the same as Y in your example above. And the other edges have to point to (except for the last bb in the series) point to the bb with next comparison. Furthermore, not just that, but if there are any PHIs in that basic block, for all PHIs the PHI arguments from all those edges have to be the same. Otherwise the optimization needs to give up. BTW, I don't think the pattern recognition should be done too early, I think it should be done shortly after IPA optimizations, so that all inlining is done and you can actually see everything. Jakub This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.