https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

--- Comment #35 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #33)
> Created attachment 61995 [details]
> An updated patch
> 
> Please try this.

Looking at the patch I do wonder about

static void 
ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs,
                              rtx inner_scalar = nullptr)
{                        
  basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs);
  while (bb->loop_father->latch              
         != EXIT_BLOCK_PTR_FOR_FN (cfun))
    bb = get_immediate_dominator (CDI_DOMINATORS,
                                  bb->loop_father->header);

when the nearest common dominator is a BB in a loop nest like

 loop {
   loop {
   }

   loop {
      BB;
   }
   BB';
 }

this will skip an arbitrary number of earlier sibling loops.  I think
if we want to do such additional hoisting at all - for a splat of a
non-constant we have to ensure the set of the source we splat is still
dominating the insertion point (where's that done?) - it IMO only
makes sense (without extra costing) to hoist the set out of a perfect
nest, thus never across earlier sibling loops.  Even for BB' this is
likely problematic.

That would be done with sth like

  while (loop_outer (bb->loop_father))
    {
      auto cand = loop_outer (bb->loop_father);
      /* Do not hoist out of loops with siblings.  */
      if (cand->next)
        break;
      bb = get_immediate_dominator (CDI_DOMINATORS, cand->header);
    }

note as you compute loops without CFG manipulations there are no
preheaders and there can be multiple entry edges into a loop header.
This means there's no trivial insertion place before each header
and the immediate dominator might be very far away.  So an additional
safety measure would be to do

     cand_bb = get_immediate_dominator (CDI_DOMINATORS, cand->header);
     if (!find_edge (cand_bb, cand->header))
       break;
     bb = cand_bb;

or to compute loops with
LOOPS_MAY_HAVE_MULTIPLE_LATCHES|LOOPS_HAVE_PREHEADERS

That said, I wonder about the correctness thing.

Reply via email to