https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
--- Comment #35 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to H.J. Lu from comment #33) > Created attachment 61995 [details] > An updated patch > > Please try this. Looking at the patch I do wonder about static void ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs, rtx inner_scalar = nullptr) { basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs); while (bb->loop_father->latch != EXIT_BLOCK_PTR_FOR_FN (cfun)) bb = get_immediate_dominator (CDI_DOMINATORS, bb->loop_father->header); when the nearest common dominator is a BB in a loop nest like loop { loop { } loop { BB; } BB'; } this will skip an arbitrary number of earlier sibling loops. I think if we want to do such additional hoisting at all - for a splat of a non-constant we have to ensure the set of the source we splat is still dominating the insertion point (where's that done?) - it IMO only makes sense (without extra costing) to hoist the set out of a perfect nest, thus never across earlier sibling loops. Even for BB' this is likely problematic. That would be done with sth like while (loop_outer (bb->loop_father)) { auto cand = loop_outer (bb->loop_father); /* Do not hoist out of loops with siblings. */ if (cand->next) break; bb = get_immediate_dominator (CDI_DOMINATORS, cand->header); } note as you compute loops without CFG manipulations there are no preheaders and there can be multiple entry edges into a loop header. This means there's no trivial insertion place before each header and the immediate dominator might be very far away. So an additional safety measure would be to do cand_bb = get_immediate_dominator (CDI_DOMINATORS, cand->header); if (!find_edge (cand_bb, cand->header)) break; bb = cand_bb; or to compute loops with LOOPS_MAY_HAVE_MULTIPLE_LATCHES|LOOPS_HAVE_PREHEADERS That said, I wonder about the correctness thing.