https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
--- Comment #40 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 30 Jul 2025, hjl.tools at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 > > --- Comment #39 from H.J. Lu <hjl.tools at gmail dot com> --- > (In reply to rguent...@suse.de from comment #38) > > On Wed, 30 Jul 2025, hjl.tools at gmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 > > > > > > --- Comment #37 from H.J. Lu <hjl.tools at gmail dot com> --- > > > (In reply to Richard Biener from comment #35) > > > > (In reply to H.J. Lu from comment #33) > > > > > Created attachment 61995 [details] > > > > > An updated patch > > > > > > > > > > Please try this. > > > > > > > > Looking at the patch I do wonder about > > > > > > > > static void > > > > ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs, > > > > rtx inner_scalar = nullptr) > > > > { > > > > basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, > > > > bbs); > > > > while (bb->loop_father->latch > > > > != EXIT_BLOCK_PTR_FOR_FN (cfun)) > > > > bb = get_immediate_dominator (CDI_DOMINATORS, > > > > bb->loop_father->header); > > > > > > > > when the nearest common dominator is a BB in a loop nest like > > > > > > > > loop { > > > > loop { > > > > } > > > > > > > > loop { > > > > BB; > > > > } > > > > BB'; > > > > } > > > > > > > > this will skip an arbitrary number of earlier sibling loops. I think > > > > if we want to do such additional hoisting at all - for a splat of a > > > > non-constant we have to ensure the set of the source we splat is still > > > > dominating the insertion point (where's that done?) - it IMO only > > > > makes sense (without extra costing) to hoist the set out of a perfect > > > > nest, thus never across earlier sibling loops. Even for BB' this is > > > > likely problematic. > > > > > > Since my patch works, I'd like to keep it as is. Will it work for you? > > > > Sure. Where is it ensured the splat isn't inserted before it is set? > > With my patch, we got > > basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs); > /* For X86_CSE_VEC_DUP, don't place the vector set outside of the loop > to avoid extra spills. */ > if (!load || load->kind != X86_CSE_VEC_DUP) > { > while (bb->loop_father->latch > != EXIT_BLOCK_PTR_FOR_FN (cfun)) > bb = get_immediate_dominator (CDI_DOMINATORS, > bb->loop_father->header); > } > > When load->kind == X86_CSE_VEC_DUP, bb is the nearest common dominator which > may be inside the loop. That ensures this when there's a single reaching def for all of the uses in 'bbs'. So your patch then is also a correctness fix.