Hi all, I'm pinging to discuss again if we want to move this forward for GCC14.
I did some testing again and I haven't been able to find obvious regressions, including testing the code from PR86270 and PR70359 that Richard mentioned. I still believe that zero can be considered a special case even for hardware that doesn't directly benefit in the comparison. For example it happens that the testcase from the commit compiles to one instruction less in x86: .LFB0: movl (%rdi), %eax leal 1(%rax), %edx movl %edx, (%rdi) testl %eax, %eax je .L4 ret .L4: jmp g vs .LFB0: movl (%rdi), %eax addl $1, %eax movl %eax, (%rdi) cmpl $1, %eax je .L4 ret .L4: xorl %eax, %eax jmp g (The xorl is not emitted when testl is used. LLVM uses testl but also does xor eax, eax :) ) Although this is accidental, I believe it also showcases that zero is a preferential value in various ways. I'm running benchmarks comparing the effects of this change and I'm also still looking for testcases that result in problematic regressions. Any feedback or other concerns about this are appreciated! Thanks, Manolis On Wed, Apr 26, 2023 at 9:43 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Apr 26, 2023 at 4:30 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > > > > > > On 4/25/23 01:21, Richard Biener wrote: > > > On Tue, Apr 25, 2023 at 1:05 AM Jeff Law <jeffreya...@gmail.com> wrote > > >> > > >> > > >> > > >> > > >> On 4/24/23 02:06, Richard Biener via Gcc-patches wrote: > > >>> On Fri, Apr 21, 2023 at 11:01 PM Philipp Tomsich > > >>> <philipp.toms...@vrull.eu> wrote: > > >>>> > > >>>> Any guidance on the next steps for this patch? > > >>> > > >>> I think we want to perform this transform later, in particular when > > >>> the test is a loop exit test we do not want to do it as it prevents > > >>> coalescing of the IV on the backedge at out-of-SSA time. > > >>> > > >>> That means doing the transform in folding and/or before inlining > > >>> (the test could become a loop exit test) would be a no-go. In fact > > >>> for SSA coalescing we'd want the reverse transform in some cases, see > > >>> PRs 86270 and 70359. > > >>> > > >>> If we can reliably undo for the loop case I suppose we can do the > > >>> canonicalization to compare against zero. In any case please split > > >>> up the patch (note > > >> I've also > > >>> hoped we could eventually get rid of that part of > > >>> tree-ssa-forwprop.cc > > >> in favor > > >>> of match.pd patterns since it uses GENERIC folding :/). > > >>> > > >> Do we have enough information to do this at expansion time? That would > > >> avoid introducing the target dependencies to drive this in gimple. > > > > > > I think so, but there isn't any convenient place to do this I think. I > > > suppose > > > there's no hope to catch it during RTL opts? > > Combine would be the most natural place in the RTL pipeline, but it'd be > > a 2->2 combination which would be rejected. > > > > We could possibly do it as a define_insn_and_split, but the gimple->RTL > > interface seems like a better fit to me. If TER has done its job, we > > should see a complex enough tree node to do the right thing. > > Of course we'd want to get rid of TER in favor of ISEL > > Richard. > > > jeff