https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176
Bug ID: 102176 Summary: BB SLP scalar costing is off with extern promoted nodes Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- On aarch64 we can see int foo(long *restrict res, long *restrict foo, long a, long b) { res[0] = ((foo[0] * a) >> 1) + foo[0]; res[1] = ((foo[1] * b) >> 1) + foo[1]; } being vectorized as t.c:3:10: note: Costing subgraph: t.c:3:10: note: node 0x35f03b0 (max_nunits=2, refcnt=1) t.c:3:10: note: op template: *res_12(D) = _4; t.c:3:10: note: stmt 0 *res_12(D) = _4; t.c:3:10: note: stmt 1 MEM[(long int *)res_12(D) + 8B] = _8; t.c:3:10: note: children 0x35f0440 t.c:3:10: note: node 0x35f0440 (max_nunits=2, refcnt=1) t.c:3:10: note: op template: _4 = _1 + _3; t.c:3:10: note: stmt 0 _4 = _1 + _3; t.c:3:10: note: stmt 1 _8 = _5 + _7; t.c:3:10: note: children 0x35f04d0 0x35f0560 t.c:3:10: note: node 0x35f04d0 (max_nunits=2, refcnt=2) t.c:3:10: note: op template: _1 = *foo_10(D); t.c:3:10: note: stmt 0 _1 = *foo_10(D); t.c:3:10: note: stmt 1 _5 = MEM[(long int *)foo_10(D) + 8B]; t.c:3:10: note: node 0x35f0560 (max_nunits=2, refcnt=1) t.c:3:10: note: op template: _3 = _2 >> 1; t.c:3:10: note: stmt 0 _3 = _2 >> 1; t.c:3:10: note: stmt 1 _7 = _6 >> 1; t.c:3:10: note: children 0x35f05f0 0x35f0710 t.c:3:10: note: node (external) 0x35f05f0 (max_nunits=2, refcnt=1) t.c:3:10: note: stmt 0 _2 = _1 * a_11(D); t.c:3:10: note: stmt 1 _6 = _5 * b_14(D); t.c:3:10: note: children 0x35f04d0 0x35f0680 t.c:3:10: note: node (external) 0x35f0680 (max_nunits=1, refcnt=1) t.c:3:10: note: { a_11(D), b_14(D) } t.c:3:10: note: node (constant) 0x35f0710 (max_nunits=1, refcnt=1) t.c:3:10: note: { 1, 1 } so the promoted external node 0x35f05f0 should keep the load live. vect_bb_slp_scalar_cost relies on PURE_SLP_STMT but that's unreliable here since the per-stmt setting cannot capture the different uses. The code shares intend (and some bugs) with vect_bb_slp_mark_live_stmts and the problem in general is a bit difficult given the lack of back-mapping from stmt to SLP nodes referencing it.