Re: [lng-odp] Scheduler, QUEUES_PER_PRIO and fairness

Nicolas Morey-Chaisemartin Fri, 28 Aug 2015 07:20:33 -0700


On 08/28/2015 03:59 PM, Savolainen, Petri (Nokia - FI/Espoo) wrote:
>
>> -----Original Message-----
>> From: ext Nicolas Morey-Chaisemartin [mailto:[email protected]]
>> Sent: Friday, August 28, 2015 4:40 PM
>> To: Savolainen, Petri (Nokia - FI/Espoo); LNG ODP Mailman List
>> Subject: Re: [lng-odp] Scheduler, QUEUES_PER_PRIO and fairness
>>
>>
>>
>> On 08/28/2015 01:56 PM, Savolainen, Petri (Nokia - FI/Espoo) wrote:
>>>> -----Original Message-----
>>>> From: lng-odp [mailto:[email protected]] On Behalf Of
>>>> ext Nicolas Morey-Chaisemartin
>>>> Sent: Friday, August 28, 2015 11:57 AM
>>>> To: LNG ODP Mailman List
>>>> Subject: [lng-odp] Scheduler, QUEUES_PER_PRIO and fairness
>>>>
>>>> Hi all,
>>>>
>>>> I'm currently diving into the scheduler code from linux-generic to
>>>> understand
>>>>  how it works and try to write an optimize version for our HW.
>>>>
>>>> Maybe I missed something, but I think there is a big fairness issue
>>>> there.
>>>>
>>>> Let me sum up what I get. We will stick with only one priority to
>> make
>>>> it simpler.
>>>> We have QUEUES_PER_PRIO queues in the schduler that may contain
>>>> commands.
>>>> Command are either:
>>>>  - Poll a pktio
>>>>  - Poll a queue
>>>> Both commands are pushed back to the queue if we didn't get all the
>>>> pending packet/buffer from them and need to be polled back.
>>>>
>>>> Threads that call schedule do a round robin scan of all schedule
>> queues
>>>> starting on the queue threadId % QUEUES_PER_PRIO, jumps to the next
>>>> schedule queue if the first schedule commands produced nothing. And
>>>> stop on the first schedule commands that produces packet.
>>>>
>>>> Am I right up to that point?
>>>>
>>>>
>>>> Now let's assume I'm a user unaware of how this works. I have
>>>> QUEUES_PER_PRIO at 4. And running ODP with worker 4 threads.
>>>> I have created a bunch of queues for my classification so each of
>> the
>>>> schedule queue as a 1 command to poll a queue in them.
>>>>
>>>> For some reason I only want to call odp_schedule() from one thread.
>>>> Could be that the others are directly polling specific high priority
>>>> queues.
>>>>
>>>> When my thread enters odp_schedule, it starts with the
>> schedule_queue =
>>>> ( thId % 4). Let's say 0.
>>>> Schedule queue 0 also contains the schedule command for my pktio.
>>>>
>>>> Now let's see what happens
>>>> schedule()
>>>>    check sched_queue 0
>>>>        got sched_cmd to poll pktio
>>>>         pktio_poll
>>>>             dispatch 1 packet to each of my queues (this traffic is
>>>> really well balanced and regular ;) )
>>>>         re enqueue the sched_cmd
>>>>         continue
>>>>    check sched_queue 1
>>>>        got a sched_cmd to poll queue #1 ( which now has a packet)
>>>>        return packet, requeue sched_cmd
>>>>
>>>> schedule()
>>>>   check sched_queue 0
>>>>       got sched_cmd to poll a queue #0 (which also has a packet)
>>>>        return packet, requeue sched_cmd
>>>>
>>>> We are now in the exact same state as we were before the first call.
>>>> Except queue #2 and #3 now have 1 packet pending.
>>>> This should keep going on until we run out of packet in the pool...
>>>>
>>>>
>>>> So here are a few questions now:
>>>> 1) Did I completely miss something and it actually works?
>>>> 2) Why do linux-generic needs multiple queues per prio? Is it to
>> reduce
>>>> contention on the sched_queue ?
>>>>
>>>> Nicolas
>>> Yes, it's optimized for multiple threads (4 or more). Each thread
>> start from different (e.g % 4) scheduler queue and keep processing that
>> as long as there are events. When thread's "home" sched queue is empty
>> it tries it's luck on the next sched queue, and so on. This lowers
>> sched queue lock contention (and increases cache hit rate) when threads
>> try to serve the same set of queues and load balance only when needed
>> (when the home sched queue is empty).
>>> -Petri
>>>
>>>
>> The issue I see here is that it introduces a very important design
>> constraint in ODP user code that is purely implementation specific.
>> It assumes that at least QUEUES_PER_PRIO thread will be using schedule
>> regularly and that their ID are different module QUEUES_PER_PRIO.
>>
>> I could very easily see cases where a user would use schedule from one
>> thread and redispatch some of the packets to other neighboring threads
>> for additional computations (crypto, CRC, etc.) and even with multiple
>> threads using schedule, not necessarily use consecutive one.
>> This would impact a lot the fairness and as my example shows even stall
>> the application at some point in time.
>>
>> I guess something could be added to documentation but it's not a very
>> good example to give for the "generic" implementation.
>>
>> Nicolas
> It should not stall if system is not overloaded. Thread #0 should move to 
> sched queue #1 as soon as sched queue #0 is empty. If sched queue #0 never 
> empties (e.g. you circulate same events back to original queue) and you have 
> only one thread, other sched queues will pile up events until pool runs out. 
> But that's overload then.
It does. But if #0 fills back before #1 is empty (ping pong effect like in my 
example), sched queue #2 will never be checked !
The system is not necessary overloaded but just not fair to queues.


>
> Application should not care about the implementation. Performance may vary 
> between implementations but it should not stall.
Agreed.

>
> If some thread jump between schedule() and other work, you should do pause -> 
> schedule until no more events -> do other stuff -> resume -> schedule ... 
> Otherwise, scheduler may spin on other threads waiting for the thread that 
> exited it's schedule loop.
>
I don't see how pause/resume have an effect on the overall scheme of things. It 
allows to empty the cache, but it would not avoid the potential stall.

Nicolas
_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] Scheduler, QUEUES_PER_PRIO and fairness

Reply via email to