[Bug tree-optimization/120996] [16 regression][AArch64] 15% regression in microBUDE since r16-1108-gb7960a3f966a0f

tnfchris at gcc dot gnu.org via Gcc-bugs Wed, 25 Feb 2026 01:44:22 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120996


--- Comment #17 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> It's questionable whether the reduction is reflecting the orginal issue and
> what the original issue actually is.

I think the "regression" is an improvement in sinking.  I don't think there's
much to be done there as it's one of those things where the improvement is done
on scalar and is correct but at vector the codegen degenerates.

I was looking at this before, but the reported regression is at -O3, which
limits what we can optimize. (The project itself requires -Ofast where there's
no regression).

Aside from the not being able to simplify x * 1 = x at -O3 -ftrapping-math the
other issue is that code does point out that comparisons aren't shared.

This testcase shows the exposed issue https://godbolt.org/z/or5aTYMvc

There for instance we have 5 compares. On most SVE codes compares have a very
low throughput, so it's better for us to generate binary operations on top of a
shared compare.

e.g. the above should be just 2 compares and 2 BIC and 1 NOR.  I have tried to
fix this in two ways:

1. I taught ifcvt to share compares, such that we get more operations on masks
vs compares. SVE has a large range of mask operations, BIC, NOR, ORN, AND,
NAND, NOT, etc.. This works, but then match.pd undoes it because it has an
assumption that compares are cheaper than mask operations.

2. I moved it inside the vectorizer, I extended the mask tracking code to allow
generation of ~mask whenever mask is already available.  This was somewhat
simpler, but I wasn't able to get the same quality out as from the ifcvt patch
(I need to investigate why still).  But again match.pd undid it.

I then added a restriction to the match.pd patterns that for vector operations
not to fold the ~.  But this isn't in general beneficial. (For instance for
Adv. SIMD the compares are better).

I have pushed the patches for these to GCC 17 (as I started during stage 4 on
them and the changes became rather big).

Fixing that brings the performance of -O3 closer to what it was.

[Bug tree-optimization/120996] [16 regression][AArch64] 15% regression in microBUDE since r16-1108-gb7960a3f966a0f

Reply via email to