On Tue, Aug 5, 2025 at 7:20 PM Thomas de Bock via Gcc <gcc@gcc.gnu.org> wrote: > > I have been working on a GCC optimization pass that merges comparisons of > consecutive memory regions with memcmp calls, which get vectorized later with > O2 (regarding https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953). > > This is already implemented in LLVM > (https://github.com/llvm/llvm-project/blob/b470ac419d7e8eec6c8a27539096e38a1751ee12/llvm/lib/Transforms/Scalar/MergeICmps.cpp), > > which has been very helpful in bringing the optimization to GCC. > > There are some parts I would like feedback on however, my current approach is > implemented as an early optimization pass (after pass_modref). > > When encountering a PHI node, it goes through its arguments and stores the > corresponding (non-empty) basic blocks, > > it then recognizes chains of INTEGRAL_TYPE (but not booleans) comparisons in > these blocks, merges them as much as possible (so 2 consecutive comparisons > of 1 byte each, becomes 1 comparison of 2 bytes). > > For each chain respectively, it then removes most of the old blocks, creates > the new merged comparison structure, and puts it in its place. > > This has some issues though: > > 1. C array comparisons, when getting to this pass, are already implemented > as loops in the CFG, making them more difficult to recognize.
For loops the canonical place to perform such optimization is the loop distribution pass which already recognizes memcpy but also strlen (strcmp is more like strlen). For straight-line code there's also a bugreport about supporting vectorization of such sequences from the basic-block vectorizer. Passes with related transforms are ifcombine (it now also does limited load merging), store-merging and phiopt. > 2. When a struct A has a struct B field and we compare 2 A instances, when > getting to this pass, A::operator== still contains a call to B::operator==, > making it hard to merge the comparisons of B together with those of A. What > is weird to me is that even after einline and IPA inline passes it seems it > is still there as a call, instead of being inlined. I'd not do this as an early pass (and I'd not do it as a separate pass anyway). > These are of course 2 cases that could be handled by implementing special > cases (e.g. detecting the loop structure, recursing into the called function, > then coming back and merging anyway), but I feel as though tackling the > problems in this way will result in infinite complexity as I discover more > cases. That's why I was hoping for some feedback on this, does the general > approach I am taking seem logical for GCC, should I handle these as special > cases, is there some other way of going about it? > Any tips very much appreciated, thank you! As for the case with the call I'd figure why we do not inline. Richard. > > > > This e-mail and any attachments may contain information that is confidential > and proprietary and otherwise protected from disclosure. If you are not the > intended recipient of this e-mail, do not read, duplicate or redistribute it > by any means. Please immediately delete it and any attachments and notify the > sender that you have received it by mistake. Unintended recipients are > prohibited from taking action on the basis of information in this e-mail or > any attachments. The DRW Companies make no representations that this e-mail > or any attachments are free of computer viruses or other defects.