On Tue, Jan 7, 2020 at 6:48 PM Kewen.Lin <li...@linux.ibm.com> wrote: > > on 2020/1/7 下午5:14, Richard Biener wrote: > > On Mon, 6 Jan 2020, Kewen.Lin wrote: > > > >> We are thinking whether it can be handled in IVOPTs instead of one RTL > >> pass. > >> > >> During IVOPTs selecting IV cands, it doesn't know the loop will be > >> unrolled so > >> it doesn't count the possible step cost in with X-form. If we can teach > >> it to > >> consider the case, the IV cands which plays with D-form can be preferred. > >> Currently unrolling (incomplete) happens in RTL, it looks we have to > >> predict > >> the loop whether unroll in IVOPTs. Since there is some parameter checks > >> on RTL > >> insn counts and target hooks, it seems not easy to get that. Besides, we > >> need > >> to check the step is valid to put into D-form field (eg: DQ-form requires > >> divide > >> 16 exactly), to ensure no extra ADDIs needed. > >> > >> I'm not sure whether it's a good idea to implement in IVOPTs, but I did > >> some > >> changes in IVOPTs to prove it's doable to get expected codes, the patch is > >> attached. > >> > >> Any comments/suggestions are highly appreiciated! > > > > Is the unrolled code better than the not unrolled code (assuming > > optimal IV choice)? Then IMHO IVOPTs should drive the unrolling, > > either by actually doing it or by forcing it via the loop->unroll > > setting. I don't think second-guessing the RTL unroller at this > > point is going to work. Alternatively turn X-form into D-form during > > RTL unrolling? > > > > Hi Richard, > > Thanks for the comments! > > Yes, unrolled version is better on Power9 for both forms, but D-form unrolled > is better > than X-form unrolled. If we drive unrolling in IVOPTs, not sure it will be a > concern > that IVOPTs becomes too heavy? or too rude with forced UF if imprecise? Do we > still > have the plan to introduce one middle-end unroll pass, does it help if yes? I am a bit worried that would make IVOPTs heavy too, it might be possible to compute heuristics whether loop should be unrolled as a post-IVOPTs transformation. Of course the transformation needs to do more work than simply unrolling in order to take advantage of aforementioned addressing mode. BTW, unrolled loop won't perform as good as ppc if the target doesn't support [base + register + offset] addressing mode?
Another point, in case of multiple passes doing unrolling, the "already unrolled" information may need to be recorded as a flag of loop properties. Thanks, bin > The quoted RTL patch is to propose one RTL pass after RTL loop passes, it > also sounds > good to check whether RTL unrolling is a good place! > > > BR, > Kewen >