Re: Improve profile update in merge_blocks

Jan Hubicka Fri, 17 Oct 2025 21:01:28 -0700

> > +/* If basic block is empty, we can remove conditionals that controls
> > +   its execution.  However in some cases the empty BB can stay live
> > +   (such as when it was a header of empty loop).  In this case we
> > +   need to update its count.  Since regions of dead BBs are acyclic
> > +   we simply propagate counts forward from live BBs to dead ones.  */
> 
> I can't follow how this works.  A region of dead BBs might fully
> contain a cycle btw.


It is usual topological sorting algorithm for DAGs based on reference
counting (which runs only within regions of dead BBs). cnt map holds the
refernece counts (number of incomming edges where profile is not yet
known) and is only populated by BBS which do not have yet final count.
Queue is a worklist (should been called this way since it is stack) of
those having 0 unknown incomming edges.

My first version just used RPO order of whole CFG, but then I was
thinking what to do with loops of dead BBs.  Since dead BBs has only one
exit edge they can only form infinite loops.

For infinite loops with non-zero entry count we have no way to represent
a profile that is consistent, since profile_count can not represent
infinity.  In practice we do not want to have very large counts on
infinite loops.  They are either errorneous code paths and should have
count of 0, while large count would make valid part of program cold. Or
they can be waiting loops which are not infinite anyway, but we do not
represent their exit (interrupt).  Static profile guesses infinte loop
to have --param max-predicted-iterations.  With real profile they either
should be known to be not reached (count 0) or have count representing
their wait time (which is probably not useful and may confuse hot/cold
heuristics).

In both cases I think it makes sense to simply not touch exisitng
profile of infinite loops in DCE.  If it was measured it is still valid;
if it was guessed it is at least capped to not confuse rest of compiler.

THe algorithm iterating in RPO order would increase counts of the
infinite loops each time it is run. This is why I do reference counting
since that will simply stop at the entry of the loop.
It was not clear to me if I can detect those loops easily in the RPO
walk since they may have multiple entry points, so I went for this
option.

Honza
> 
> > +
> > +static void
> > +propagate_counts ()
> > +{
> > +  basic_block bb;
> > +  auto_vec<basic_block, 16> queue;
> > +  hash_map <basic_block, int> cnt;
> > +
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +    if (!bitmap_bit_p (bb_contains_live_stmts, bb->index))
> > +      {
> > +       int n = 0;
> > +       for (edge e : bb->preds)
> > +         if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
> > +             && !bitmap_bit_p (bb_contains_live_stmts, e->src->index))
> > +           n++;
> > +       if (!n)
> > +         queue.safe_push (bb);
> 
> So everything we push has no live stmts itself and all of its incoming
> edge srcs have live stmts (thus the src blocks are all not removed).
> We don't now whether we actually elided BB though, bb->flags & BB_REACHABLE
> says so.

Unreachable BBs would stop propagation too. I put this after eliding
unreachable BBs so I believe they should be removed.  I had asseert that
tested that while bootstrap the algorithm always finishes with updating
all counts (no infinite loops). In testsuite we have plenty of infinite
loops of course.

I will update the comment.  Also I think i can avoid allocation of the
map if I process BBs with 0 unknown in-edges immediately since chained
dead BBs are quite rare in practice.

Does this make sense?
Honza

Re: Improve profile update in merge_blocks

Reply via email to