On 08/28/2015 03:59 PM, Savolainen, Petri (Nokia - FI/Espoo) wrote: > >> -----Original Message----- >> From: ext Nicolas Morey-Chaisemartin [mailto:[email protected]] >> Sent: Friday, August 28, 2015 4:40 PM >> To: Savolainen, Petri (Nokia - FI/Espoo); LNG ODP Mailman List >> Subject: Re: [lng-odp] Scheduler, QUEUES_PER_PRIO and fairness >> >> >> >> On 08/28/2015 01:56 PM, Savolainen, Petri (Nokia - FI/Espoo) wrote: >>>> -----Original Message----- >>>> From: lng-odp [mailto:[email protected]] On Behalf Of >>>> ext Nicolas Morey-Chaisemartin >>>> Sent: Friday, August 28, 2015 11:57 AM >>>> To: LNG ODP Mailman List >>>> Subject: [lng-odp] Scheduler, QUEUES_PER_PRIO and fairness >>>> >>>> Hi all, >>>> >>>> I'm currently diving into the scheduler code from linux-generic to >>>> understand >>>> how it works and try to write an optimize version for our HW. >>>> >>>> Maybe I missed something, but I think there is a big fairness issue >>>> there. >>>> >>>> Let me sum up what I get. We will stick with only one priority to >> make >>>> it simpler. >>>> We have QUEUES_PER_PRIO queues in the schduler that may contain >>>> commands. >>>> Command are either: >>>> - Poll a pktio >>>> - Poll a queue >>>> Both commands are pushed back to the queue if we didn't get all the >>>> pending packet/buffer from them and need to be polled back. >>>> >>>> Threads that call schedule do a round robin scan of all schedule >> queues >>>> starting on the queue threadId % QUEUES_PER_PRIO, jumps to the next >>>> schedule queue if the first schedule commands produced nothing. And >>>> stop on the first schedule commands that produces packet. >>>> >>>> Am I right up to that point? >>>> >>>> >>>> Now let's assume I'm a user unaware of how this works. I have >>>> QUEUES_PER_PRIO at 4. And running ODP with worker 4 threads. >>>> I have created a bunch of queues for my classification so each of >> the >>>> schedule queue as a 1 command to poll a queue in them. >>>> >>>> For some reason I only want to call odp_schedule() from one thread. >>>> Could be that the others are directly polling specific high priority >>>> queues. >>>> >>>> When my thread enters odp_schedule, it starts with the >> schedule_queue = >>>> ( thId % 4). Let's say 0. >>>> Schedule queue 0 also contains the schedule command for my pktio. >>>> >>>> Now let's see what happens >>>> schedule() >>>> check sched_queue 0 >>>> got sched_cmd to poll pktio >>>> pktio_poll >>>> dispatch 1 packet to each of my queues (this traffic is >>>> really well balanced and regular ;) ) >>>> re enqueue the sched_cmd >>>> continue >>>> check sched_queue 1 >>>> got a sched_cmd to poll queue #1 ( which now has a packet) >>>> return packet, requeue sched_cmd >>>> >>>> schedule() >>>> check sched_queue 0 >>>> got sched_cmd to poll a queue #0 (which also has a packet) >>>> return packet, requeue sched_cmd >>>> >>>> We are now in the exact same state as we were before the first call. >>>> Except queue #2 and #3 now have 1 packet pending. >>>> This should keep going on until we run out of packet in the pool... >>>> >>>> >>>> So here are a few questions now: >>>> 1) Did I completely miss something and it actually works? >>>> 2) Why do linux-generic needs multiple queues per prio? Is it to >> reduce >>>> contention on the sched_queue ? >>>> >>>> Nicolas >>> Yes, it's optimized for multiple threads (4 or more). Each thread >> start from different (e.g % 4) scheduler queue and keep processing that >> as long as there are events. When thread's "home" sched queue is empty >> it tries it's luck on the next sched queue, and so on. This lowers >> sched queue lock contention (and increases cache hit rate) when threads >> try to serve the same set of queues and load balance only when needed >> (when the home sched queue is empty). >>> -Petri >>> >>> >> The issue I see here is that it introduces a very important design >> constraint in ODP user code that is purely implementation specific. >> It assumes that at least QUEUES_PER_PRIO thread will be using schedule >> regularly and that their ID are different module QUEUES_PER_PRIO. >> >> I could very easily see cases where a user would use schedule from one >> thread and redispatch some of the packets to other neighboring threads >> for additional computations (crypto, CRC, etc.) and even with multiple >> threads using schedule, not necessarily use consecutive one. >> This would impact a lot the fairness and as my example shows even stall >> the application at some point in time. >> >> I guess something could be added to documentation but it's not a very >> good example to give for the "generic" implementation. >> >> Nicolas > It should not stall if system is not overloaded. Thread #0 should move to > sched queue #1 as soon as sched queue #0 is empty. If sched queue #0 never > empties (e.g. you circulate same events back to original queue) and you have > only one thread, other sched queues will pile up events until pool runs out. > But that's overload then. It does. But if #0 fills back before #1 is empty (ping pong effect like in my example), sched queue #2 will never be checked ! The system is not necessary overloaded but just not fair to queues.
> > Application should not care about the implementation. Performance may vary > between implementations but it should not stall. Agreed. > > If some thread jump between schedule() and other work, you should do pause -> > schedule until no more events -> do other stuff -> resume -> schedule ... > Otherwise, scheduler may spin on other threads waiting for the thread that > exited it's schedule loop. > I don't see how pause/resume have an effect on the overall scheme of things. It allows to empty the cache, but it would not avoid the potential stall. Nicolas _______________________________________________ lng-odp mailing list [email protected] https://lists.linaro.org/mailman/listinfo/lng-odp
