https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97064
Bug ID: 97064 Summary: BB vectorization behaves sub-optimal Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- The testcase g++.dg/vect/slp-pr87105.cc ends in _64 = MIN_EXPR <_32, _87>; bBox_6(D)->x0 = _64; _67 = MIN_EXPR <_33, _86>; bBox_6(D)->y0 = _67; _70 = MAX_EXPR <_36, _87>; bBox_6(D)->x1 = _70; _73 = MAX_EXPR <_39, _86>; bBox_6(D)->y1 = _73; thus feeding a 4 element store with a non-uniform SLP opportunity starting with { MIN, MIN, MAX, MAX }. With 2-element vector type vectorization this eventually gets vectorized by splitting the group which is prioritized over just building the { MIN..., MAX } vector from scalars but with 4-element vector type vectorization no splitting is considered and we end up successfully vectorizing just the store with never considering the smaller vector size. So at the moment the testcase PASSes with SSE but fails with AVX.