https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110423
Jeffrey A. Law <law at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at gcc dot gnu.org --- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> --- So there is another broad approach we can take here. As Vineet mentioned, this isn't really a job for PRE/LCM as those are formulated around a requirement that they never insert an expression evaluation in any path that did not have an evaluation before. ie no speculative constant loads. We could potentially relax that condition. I'm not sure we'd formulate it as a PRE/LCM problem, but it gives you a sense of how we could tackle this. The difficulty would be in the heuristics for when to apply this transformation since it will make some codes slower and may increase register pressure. This is derived heavily from Click's work in the 90s. This would happen in gimple most likely, though I guess one could do it in RTL if they have a high pain threshold. In the simplest way to think about the placement algorithm is to find the blocks where all the uses of any given constant C occur. A trivially correct placement of load of that constant would be the entry block as it must dominate every block in that set. Of course that would make the placement quite speculative and lengthen live ranges. That's usually referred to an an early placement. Next find the latest placement for the constant load that covers all the uses. That will be the lowest common ancestor in the dominator tree of the set of blocks that use the constant. If you were to imagine a path through the dominator tree starting at the early placement (entry) and ending at the lowest common ancestor, any block on that path could be selected for generating the constant load and would cover every use with that single load. Within the set of blocks on that path, find the set with the lowest loop nesting, then within that reduced set find those with the deepest control nesting (or lowest estimated frequency counts). There may be more than one block in that final set. Any are valid and "reasonable" choices. Click's paper is much more general, but the same concepts apply. His paper doesn't cover anything like bifurcating the graph (thus allowing multiple constant loads in an effort to reduce undesired speculation or register allocation conflicts). We might be able to get away with this precisely because these are constant loads and thus subject to rematerialization later if register pressure is high. https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf