Although, RTL expansion may introduce new loops, these tend to be rare, and the expanders have all the information they need to hoist/sink invariant expressions and unroll/peel themselves.
I disagree. In order to make the proper decisions about merging givs and chosing which giv should represent a biv, you have to know a lot about the valid addressing modes on the machine and this isn't something the tree level optimizers should have to deal with. And there is still the issue of addressing calculations, which I don't think have been completely exposed yet. Certainly there should be no need for RTL-level loop optimizations to do loop unrolling or other large scale reorganization. Agreed there. Simiarly, CSE shouldn't need to process more than a single basic blocks, Again, not clear. Certainly the costly stuff I put in ages ago to walk through comparisons and around loops needs to go, but there's no reason to tie CSE to a basic block: it can operate until the next label, like it does now. Admittedly, the number of CSE opportunities won't be great, but why restrict them to a basic block? and GCSE shouldn't need to move anything other than simple expressions. Why would we need a GCSE at the RTL level at all? I'd guess the number of wins it would produce would be very small. The quality of alias analysis at the RTL-level shouldn't be an issue. Here I disagree the strongest! Instruction scheduling is rapidly becoming one of the most critical optimizations and must be done at the RTL level. The quality of instruction scheduling depends quite heavily on the quality of the aliasing information.