2012/4/12 Andrey Belevantsev <a...@ispras.ru>: > On 12.04.2012 17:54, Richard Guenther wrote: >> >> 2012/4/12 Andrey Belevantsev<a...@ispras.ru>: >>> >>> On 12.04.2012 16:38, Richard Guenther wrote: >>>> >>>> >>>> On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin<izamya...@gmail.com> >>>> wrote: >>>>> >>>>> >>>>> On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther >>>>> <richard.guent...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov<amona...@ispras.ru> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Can atom execute two IMUL in parallel? Or what exactly is the >>>>>>>> pipeline >>>>>>>> behavior? >>>>>>> >>>>>>> >>>>>>> >>>>>>> As I understand from Intel's optimization reference manual, the >>>>>>> behavior is as >>>>>>> follows: if the instruction immediately following IMUL has shorter >>>>>>> latency, >>>>>>> execution is stalled for 4 cycles (which is IMUL's latency); >>>>>>> otherwise, >>>>>>> a >>>>>>> 4-or-more cycles latency instruction can be issued after IMUL without >>>>>>> a >>>>>>> stall. >>>>>>> In other words, IMUL is pipelined with respect to other long-latency >>>>>>> instructions, but not to short-latency instructions. >>>>>> >>>>>> >>>>>> >>>>>> It seems to be modeled in the pipeline description though: >>>>>> >>>>>> ;;; imul insn has 5 cycles latency >>>>>> (define_reservation "atom-imul-32" >>>>>> "atom-imul-1, atom-imul-2, atom-imul-3, >>>>>> atom-imul-4, >>>>>> atom-port-0") >>>>>> >>>>>> ;;; imul instruction excludes other non-FP instructions. >>>>>> (exclusion_set "atom-eu-0, atom-eu-1" >>>>>> "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4") >>>>>> >>>>> >>>>> The main idea is quite simple: >>>>> >>>>> If we are going to schedule IMUL instruction (it is on the top of >>>>> ready list) we try to find out producer of other (independent) IMUL >>>>> instruction that is in ready list too. The goal is try to schedule >>>>> such a producer to get another IMUL in ready list and get scheduling >>>>> of 2 successive IMUL instructions. >>>> >>>> >>>> >>>> Why does that not happen without your patch? Does it never happen >>>> without >>>> your patch or does it merely not happen for one EEMBC benchmark (can >>>> you provide a testcase?)? >>> >>> >>> >>> It does not happen because the scheduler by itself does not do such >>> specific >>> reordering. That said, it is easy to imagine the cases where this patch >>> will make things worse rather than better. >> >> >> That surprises me. What is so specific about this reordering? > > > I mean that the scheduler does things like "sort the ready list according to > a number of heuristics and to the insn priority, then choose the insn that > would allow the maximum of ready insns to be issued on the current cycle". > The heuristics are rather general. The scheduler does not do things like > "if some random insn is ready, then choose some other random insn from the > ready list and schedule it" (which is what the patch does). This is what > scheduler hooks are for, to account for some target-specific heuristic. > > The problem is that this particular implementation looks somewhat like an > overkill and also motivated by a single benchmark. Testing on a wider set > of benchmarks and checking compile-time hit would make the motivation more > convincing.
Yeah, and especially looking _why_ the generic heuristics are not working and if they could be improved. After all the issue seems to be properly modeled in the DFA. Richard. > Andrey > > >> >>> Igor, why not try different subtler mechanisms like adjust_priority, >>> which >>> is get called when an insn is added to the ready list? E.g. increase the >>> producer's priority. >>> >>> The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out >>> when >>> you have more than one imul in the ready list? Don't you want to bump >>> the >>> priority of the other imul found? >>> >>> Andrey >>> >>> >>>> >>>>> And MD allows us only prefer scheduling of successive IMUL >>>>> instructions, >>>>> i.e. >>>>> If IMUL was just scheduled and ready list contains another IMUL >>>>> instruction then it will be chosen as the best candidate for >>>>> scheduling. >>>>> >>>>> >>>>>> at least from my very limited guessing of what the above does. So, >>>>>> did >>>>>> you >>>>>> analyze why the scheduler does not properly schedule things for you? >>>>>> >>>>>> Richard. >>>>>> >>>>>>> >>>>>>> From reading the patch, I could not understand the link between >>>>>>> pipeline >>>>>>> behavior and what the patch appears to do. >>>>>>> >>>>>>> Hope that helps. >>>>>>> >>>>>>> Alexander >>> >>> >>> >