https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110423

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org

--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So there is another broad approach we can take here.

As Vineet mentioned, this isn't really a job for PRE/LCM as those are
formulated around a requirement that they never insert an expression evaluation
in any path that did not have an evaluation before.  ie no speculative constant
loads.

We could potentially relax that condition.  I'm not sure we'd formulate it as a
PRE/LCM problem, but it gives you a sense of how we could tackle this.  The
difficulty would be in the heuristics for when to apply this transformation
since it will make some codes slower and may increase register pressure.  This
is derived heavily from Click's work in the 90s.  This would happen in gimple
most likely, though I guess one could do it in RTL if they have a high pain
threshold.

In the simplest way to think about the placement algorithm is to find the
blocks where all the uses of any given constant C occur.  A trivially correct
placement of load of that constant would be the entry block as it must dominate
every block in that set.  Of course that would make the placement quite
speculative and lengthen live ranges.  That's usually referred to an an early
placement.

Next find the latest placement for the constant load that covers all the uses. 
That will be the lowest common ancestor in the dominator tree of the set of
blocks that use the constant.

If you were to imagine a path through the dominator tree starting at the early
placement (entry) and ending at the lowest common ancestor, any block on that
path could be selected for generating the constant load and would cover every
use with that single load.  Within the set of blocks on that path, find the set
with the lowest loop nesting, then within that reduced set find those with the
deepest control nesting (or lowest estimated frequency counts).  There may be
more than one block in that final set.  Any are valid and "reasonable" choices.


Click's paper is much more general, but the same concepts apply.  His paper
doesn't cover anything like bifurcating the graph (thus allowing multiple
constant loads in an effort to reduce undesired speculation or register
allocation conflicts).

We might be able to get away with this precisely because these are constant
loads and thus subject to rematerialization later if register pressure is high.

https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf

Reply via email to