On 5/15/19 10:44 AM, Segher Boessenkool wrote: > On Wed, May 15, 2019 at 10:53:43AM +0200, Richard Biener wrote: >> I wonder if making the doloop patterns (tried to find them in rs6000.md, >> but I only see define_expands with no predicates/alternatives...) > > "doloop_end" --> "ctr<mode>" --> "<bd>_<mode>" > (all consecutive in rs6000.md btw.) Alternative 0 in "<bd>_<mode>" > are the actual looping instructions; the other alternatives are for > the uncommon case where we ended up not being able to use this insn > after all. > >> accept any counter register, just have a preference on that special >> counter reg and have the define_insn deal with RA allocating another >> one by emitting a regular update & branch-on-zero? > > That is what those other alternatives do. It is expensive, and cannot > even *work* in all cases. > > We have no generic "branch on (not) zero" in Power, btw. Archs that do > can use that as a doloop, if they choose IVs that end the loop at 0. > >> That is, the penalty of doing that shouldn't be too big and thus >> we can more optimistically cost & handle "doloops"? > > We have done that for ages, in the RTL level doloop handling. With > newer hardware it becomes more and more expensive to guess wrong. > >> I guess >> the doloop.c checks are quite too strict because we have to >> rely on RA being able to allocate that reg and as soon as we >> need to spill it using a general reg with update & branch-on-zero >> will be cheaper anyways? > > (Update, compare, branch, for us). > > We can predict quite well where the count register will be unavailable > to our doloops. The cost if we are allocated a GPR isn't so bad: it > costs an insn or maybe two more than if we made optimal code (without > doloop). > > But we can be allocated a floating point register, or memory, instead. > That is heavily discouraged (by making it more expensive), but it can > still happen. This is a jump_insn so it cannot get any reloads, either; > but even if it could, that is an *expensive* thing to do. RIght. ANd that's consistent with what other architectures have needed to do. I can't describe the pain of what happens on the PA when you find out that the loop counter got allocated to the shift amount register or a floating point register. It's rare, but you had to handle it. Ugh.
jeff