Re: [PATCH] Atom: Scheduler improvements for better imul placement

Richard Guenther Thu, 12 Apr 2012 07:22:33 -0700

2012/4/12 Andrey Belevantsev <a...@ispras.ru>:
> On 12.04.2012 17:54, Richard Guenther wrote:
>>
>> 2012/4/12 Andrey Belevantsev<a...@ispras.ru>:
>>>
>>> On 12.04.2012 16:38, Richard Guenther wrote:
>>>>
>>>>
>>>> On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin<izamya...@gmail.com>
>>>>  wrote:
>>>>>
>>>>>
>>>>> On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
>>>>> <richard.guent...@gmail.com>    wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov<amona...@ispras.ru>
>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Can atom execute two IMUL in parallel?  Or what exactly is the
>>>>>>>> pipeline
>>>>>>>> behavior?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> As I understand from Intel's optimization reference manual, the
>>>>>>> behavior is as
>>>>>>> follows: if the instruction immediately following IMUL has shorter
>>>>>>> latency,
>>>>>>> execution is stalled for 4 cycles (which is IMUL's latency);
>>>>>>> otherwise,
>>>>>>> a
>>>>>>> 4-or-more cycles latency instruction can be issued after IMUL without
>>>>>>> a
>>>>>>> stall.
>>>>>>> In other words, IMUL is pipelined with respect to other long-latency
>>>>>>> instructions, but not to short-latency instructions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It seems to be modeled in the pipeline description though:
>>>>>>
>>>>>> ;;; imul insn has 5 cycles latency
>>>>>> (define_reservation "atom-imul-32"
>>>>>>                    "atom-imul-1, atom-imul-2, atom-imul-3,
>>>>>> atom-imul-4,
>>>>>>                     atom-port-0")
>>>>>>
>>>>>> ;;; imul instruction excludes other non-FP instructions.
>>>>>> (exclusion_set "atom-eu-0, atom-eu-1"
>>>>>>               "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")
>>>>>>
>>>>>
>>>>> The main idea is quite simple:
>>>>>
>>>>> If we are going to schedule IMUL instruction (it is on the top of
>>>>> ready list) we try to find out producer of other (independent) IMUL
>>>>> instruction that is in ready list too. The goal is try to schedule
>>>>> such a producer to get another IMUL in ready list and get scheduling
>>>>> of 2 successive IMUL instructions.
>>>>
>>>>
>>>>
>>>> Why does that not happen without your patch?  Does it never happen
>>>> without
>>>> your patch or does it merely not happen for one EEMBC benchmark (can
>>>> you provide a testcase?)?
>>>
>>>
>>>
>>> It does not happen because the scheduler by itself does not do such
>>> specific
>>> reordering.  That said, it is easy to imagine the cases where this patch
>>> will make things worse rather than better.
>>
>>
>> That surprises me.  What is so specific about this reordering?
>
>
> I mean that the scheduler does things like "sort the ready list according to
> a number of heuristics and to the insn priority, then choose the insn that
> would allow the maximum of ready insns to be issued on the current cycle".
>  The heuristics are rather general.  The scheduler does not do things like
> "if some random insn is ready, then choose some other random insn from the
> ready list and schedule it" (which is what the patch does). This is what
> scheduler hooks are for, to account for some target-specific heuristic.
>
> The problem is that this particular implementation looks somewhat like an
> overkill and also motivated by a single benchmark.  Testing on a wider set
> of benchmarks and checking compile-time hit would make the motivation more
> convincing.


Yeah, and especially looking _why_ the generic heuristics are not working
and if they could be improved.  After all the issue seems to be properly
modeled in the DFA.

Richard.

> Andrey
>
>
>>
>>> Igor, why not try different subtler mechanisms like adjust_priority,
>>> which
>>> is get called when an insn is added to the ready list?  E.g. increase the
>>> producer's priority.
>>>
>>> The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out
>>> when
>>> you have more than one imul in the ready list?  Don't you want to bump
>>> the
>>> priority of the other imul found?
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>> And MD allows us only prefer scheduling of successive IMUL
>>>>> instructions,
>>>>> i.e.
>>>>> If IMUL was just scheduled and ready list contains another IMUL
>>>>> instruction then it will be chosen as the best candidate for
>>>>> scheduling.
>>>>>
>>>>>
>>>>>> at least from my very limited guessing of what the above does.  So,
>>>>>> did
>>>>>> you
>>>>>> analyze why the scheduler does not properly schedule things for you?
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>>  From reading the patch, I could not understand the link between
>>>>>>> pipeline
>>>>>>> behavior and what the patch appears to do.
>>>>>>>
>>>>>>> Hope that helps.
>>>>>>>
>>>>>>> Alexander
>>>
>>>
>>>
>

Re: [PATCH] Atom: Scheduler improvements for better imul placement

Reply via email to