https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

--- Comment #40 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 30 Jul 2025, hjl.tools at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
> 
> --- Comment #39 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to rguent...@suse.de from comment #38)
> > On Wed, 30 Jul 2025, hjl.tools at gmail dot com wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
> > > 
> > > --- Comment #37 from H.J. Lu <hjl.tools at gmail dot com> ---
> > > (In reply to Richard Biener from comment #35)
> > > > (In reply to H.J. Lu from comment #33)
> > > > > Created attachment 61995 [details]
> > > > > An updated patch
> > > > > 
> > > > > Please try this.
> > > > 
> > > > Looking at the patch I do wonder about
> > > > 
> > > > static void 
> > > > ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs,
> > > >                               rtx inner_scalar = nullptr)
> > > > {                        
> > > >   basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, 
> > > > bbs);
> > > >   while (bb->loop_father->latch              
> > > >          != EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > >     bb = get_immediate_dominator (CDI_DOMINATORS,
> > > >                                   bb->loop_father->header);
> > > > 
> > > > when the nearest common dominator is a BB in a loop nest like
> > > > 
> > > >  loop {
> > > >    loop {
> > > >    }
> > > > 
> > > >    loop {
> > > >       BB;
> > > >    }
> > > >    BB';
> > > >  }
> > > > 
> > > > this will skip an arbitrary number of earlier sibling loops.  I think
> > > > if we want to do such additional hoisting at all - for a splat of a
> > > > non-constant we have to ensure the set of the source we splat is still
> > > > dominating the insertion point (where's that done?) - it IMO only
> > > > makes sense (without extra costing) to hoist the set out of a perfect
> > > > nest, thus never across earlier sibling loops.  Even for BB' this is
> > > > likely problematic.
> > > 
> > > Since my patch works, I'd like to keep it as is.  Will it work for you?
> > 
> > Sure.  Where is it ensured the splat isn't inserted before it is set?
> 
> With my patch, we got
> 
>  basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs);
>   /* For X86_CSE_VEC_DUP, don't place the vector set outside of the loop
>      to avoid extra spills.  */
>   if (!load || load->kind != X86_CSE_VEC_DUP)
>     {    
>       while (bb->loop_father->latch
>              != EXIT_BLOCK_PTR_FOR_FN (cfun))
>         bb = get_immediate_dominator (CDI_DOMINATORS,
>                                       bb->loop_father->header);
>     }    
> 
> When load->kind == X86_CSE_VEC_DUP, bb is the nearest common dominator which
> may be inside the loop.

That ensures this when there's a single reaching def for all of the uses
in 'bbs'.  So your patch then is also a correctness fix.

Reply via email to