[Bug tree-optimization/122308] Inefficient vectorization on inner loop due to unroll-and-jam

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 17 Oct 2025 16:05:11 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122308


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2025-10-17
            Summary|Inefficient vectorization   |Inefficient vectorization
                   |on inner loop               |on inner loop due to
                   |                            |unroll-and-jam
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Feng Xue from comment #2)
> (In reply to Richard Biener from comment #1)
> > I don't see us vectorizer the outer loop:
> > 
> > t.c:7:21: note:   analyze in outer loop: *(&b + (sizetype) index_40 * 2)
> > analyze_innermost: t.c:12:18: missed:   failed: evolution of base is not
> > affine.
> > t.c:7:21: missed:  bad data references.
> > t.c:7:21: note:  ***** Analysis failed with vector mode V4SI
> 
> I may mis-describe the problem. On aarch64, llvm generates much compact
> vectorized code for the inner loop. The addition statement is directly
> mapped to a vector<short> add. But gcc does not.
> 
> https://godbolt.org/z/Wadvdzozq

I think it's fine, the issue seems to be that we apply unroll-and-jam
and this version of the loop confuses us somehow.

  for (int j = 0; j < 1024; ++j)
    {
      a[j] += b[index + j];
      a[j] += b[index2 + j];
    }

is fine.  On x86 we don't vectorize the unrolled-and-jammed version at all:

t.c:11:25: note:   ==> examining statement: _48 = b[_47];
t.c:11:25: missed:   unsupported vector types for emulated gather.
t.c:12:18: missed:   not vectorized: relevant stmt not supported: _48 = b[_47];
t.c:11:25: note:   unsupported SLP instance starting from: a[j_23] = _51;
t.c:11:25: missed:  unsupported SLP instances

so it also shows we fail to analyze one of the data-refs, resorting back
to gathers.

[Bug tree-optimization/122308] Inefficient vectorization on inner loop due to unroll-and-jam

Reply via email to