https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176

            Bug ID: 102176
           Summary: BB SLP scalar costing is off with extern promoted
                    nodes
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

On aarch64 we can see

int foo(long *restrict res, long *restrict foo, long a, long b)
{
  res[0] = ((foo[0] * a) >> 1) + foo[0];
  res[1] = ((foo[1] * b) >> 1) + foo[1];
}

being vectorized as

t.c:3:10: note: Costing subgraph:
t.c:3:10: note: node 0x35f03b0 (max_nunits=2, refcnt=1)
t.c:3:10: note: op template: *res_12(D) = _4;
t.c:3:10: note:         stmt 0 *res_12(D) = _4;
t.c:3:10: note:         stmt 1 MEM[(long int *)res_12(D) + 8B] = _8;
t.c:3:10: note:         children 0x35f0440
t.c:3:10: note: node 0x35f0440 (max_nunits=2, refcnt=1)
t.c:3:10: note: op template: _4 = _1 + _3;
t.c:3:10: note:         stmt 0 _4 = _1 + _3;
t.c:3:10: note:         stmt 1 _8 = _5 + _7;
t.c:3:10: note:         children 0x35f04d0 0x35f0560
t.c:3:10: note: node 0x35f04d0 (max_nunits=2, refcnt=2)
t.c:3:10: note: op template: _1 = *foo_10(D);
t.c:3:10: note:         stmt 0 _1 = *foo_10(D);
t.c:3:10: note:         stmt 1 _5 = MEM[(long int *)foo_10(D) + 8B];
t.c:3:10: note: node 0x35f0560 (max_nunits=2, refcnt=1)
t.c:3:10: note: op template: _3 = _2 >> 1;
t.c:3:10: note:         stmt 0 _3 = _2 >> 1;
t.c:3:10: note:         stmt 1 _7 = _6 >> 1;
t.c:3:10: note:         children 0x35f05f0 0x35f0710
t.c:3:10: note: node (external) 0x35f05f0 (max_nunits=2, refcnt=1)
t.c:3:10: note:         stmt 0 _2 = _1 * a_11(D);
t.c:3:10: note:         stmt 1 _6 = _5 * b_14(D);
t.c:3:10: note:         children 0x35f04d0 0x35f0680
t.c:3:10: note: node (external) 0x35f0680 (max_nunits=1, refcnt=1)
t.c:3:10: note:         { a_11(D), b_14(D) }
t.c:3:10: note: node (constant) 0x35f0710 (max_nunits=1, refcnt=1)
t.c:3:10: note:         { 1, 1 }

so the promoted external node 0x35f05f0 should keep the load live.
vect_bb_slp_scalar_cost relies on PURE_SLP_STMT but
that's unreliable here since the per-stmt setting cannot capture the
different uses.  The code shares intend (and some bugs) with
vect_bb_slp_mark_live_stmts and the problem in general is a bit
difficult given the lack of back-mapping from stmt to SLP nodes
referencing it.

Reply via email to