https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97064

            Bug ID: 97064
           Summary: BB vectorization behaves sub-optimal
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The testcase g++.dg/vect/slp-pr87105.cc ends in

  _64 = MIN_EXPR <_32, _87>;
  bBox_6(D)->x0 = _64;
  _67 = MIN_EXPR <_33, _86>;
  bBox_6(D)->y0 = _67;
  _70 = MAX_EXPR <_36, _87>;
  bBox_6(D)->x1 = _70;
  _73 = MAX_EXPR <_39, _86>;
  bBox_6(D)->y1 = _73;

thus feeding a 4 element store with a non-uniform SLP opportunity
starting with { MIN, MIN, MAX, MAX }.  With 2-element vector type
vectorization this eventually gets vectorized by splitting the group
which is prioritized over just building the { MIN..., MAX } vector
from scalars but with 4-element vector type vectorization no splitting
is considered and we end up successfully vectorizing just the store
with never considering the smaller vector size.

So at the moment the testcase PASSes with SSE but fails with AVX.

Reply via email to