Thanks, Alex. This does imply that we're missing a critical bit of buffer meta data. Since odp_queue_enq() only specifies a target queue and buffer, there needs to be some way to relate this call to the queue that the buffer was previously sourced from so that the atomic/ordered semantics can be maintained.
Should we have a last_queue field in the buffers that gets set by odp_schedule() and referenced as part of subsequent enq operations? Bill On Wed, Oct 15, 2014 at 1:44 PM, Alexandru Badicioiu < [email protected]> wrote: > Bill, I have the same understanding as yours regarding these aspects. Free > calls should be aware of the source queue of a buffer to inform the > scheduler that the context should be released too. The same for enqueue > calls. I'm not sure of the use of explicit release context calls - > eventually any buffer/event/packet (i.e. entity delivered by the scheduler) > would be enqueued or freed so the scheduler will be informed. > > Alex > > On 15 October 2014 20:23, Bill Fischofer <[email protected]> > wrote: > >> If I call odp_schedule() and get back an event associated with an atomic >> queue, my understanding is that I owe the implementation a subsequent call >> to odp_schedule_release_atomic() to dispose of it. Similarly, if I receive >> an event from an ordered queue, I need to enq that event somewhere else or >> otherwise there will be a gap in the downstream order that will cause a >> stall. >> >> The interplay between queues and the scheduler is why we need a design >> that spells this out in detail. That's where what is and is not API vs. >> implementation is also spelled out. Right now my understanding is we have >> the following types of queues and their scheduling >> implications/interactions: >> >> - Parallel: Anything on the queue can be given to anyone without >> restriction. There are no restrictions relating to subsequent event >> disposal or downstream processing. >> >> >> - Atomic: Anyone can get something from a queue, but once they get it >> the queue is not able to give out subsequent events to anyone else until >> the queue is either explicitly (via odp_schedule_release_atomic()) or >> implicitly (via a subsequent enq of the event to some other queue) made >> available for rescheduling. >> >> >> - Ordered: Anything on the queue can be given to anyone without >> restriction, however subsequent enqs of those events onto other queues >> must >> be order preserving. This implies that if an application wishes to >> dispose >> of an event without a subsequent enq it needs to inform the scheduler of >> this to prevent downstream stalls. So it appears there needs to be an >> ordered equivalent to odp_schedule_relaese_atomic() that we currently >> don't >> have. >> >> Is this understanding correct? If not what is the correct way to view >> this? In any event, we just need to write this out and get agreement as to >> what the meanings and conventions are associated with this. >> >> On Wed, Oct 15, 2014 at 11:42 AM, Ola Liljedahl <[email protected] >> > wrote: >> >>> Is should be part of the implementation. But it is not. Because >>> prescheduled events might not be processed if the thread the events are >>> prefetched to stops calling schedule() and process those events. And the >>> corresponding queues (if atomic) will be locked forever... Also problems >>> with ordered queues as later packets will also be stalled until those >>> prefetched packets are released. >>> >>> On 15 October 2014 17:54, Bill Fischofer <[email protected]> >>> wrote: >>> >>>> Whether or not events are prefetched is an implementation consideration >>>> that is part of the implementation, not the application, no? Again, I >>>> don't think this is something we need to worry about for ODP v1.0. It >>>> should be properly addressed in a wider context post-v1.0. >>>> >>>> On Wed, Oct 15, 2014 at 10:51 AM, Ola Liljedahl < >>>> [email protected]> wrote: >>>> >>>>> What if a thread wants to stop consuming and processing events? We >>>>> don't (can't) leave some events prescheduled (and stashed in some per-core >>>>> "portal") after the thread has stopped processing. So a thread must be >>>>> able >>>>> to stop prefetching and then consume and process all remaining >>>>> (prefetched) >>>>> events before it completely stops processing. How would this work on >>>>> Freescale or TI ODP implementations? >>>>> >>>>> >>>>> On 15 October 2014 15:54, Savolainen, Petri (NSN - FI/Espoo) < >>>>> [email protected]> wrote: >>>>> >>>>>> System will deadlock if your application decides to step out from >>>>>> the schedule loop, and a throughput optimized scheduler has already >>>>>> pre-scheduled a number of buffers to that core (== locked a number of >>>>>> atomic queues). >>>>>> >>>>>> >>>>>> >>>>>> Application has to be sure that scheduler has not locked anything for >>>>>> that core before stepping out of the schedule loop. Typically, it’s >>>>>> impossible for the HW scheduler to rewind scheduling decision afterwards >>>>>> (when application tells it wants to exit). >>>>>> >>>>>> >>>>>> >>>>>> -Petri >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *From:* ext Bill Fischofer [mailto:[email protected]] >>>>>> *Sent:* Wednesday, October 15, 2014 4:38 PM >>>>>> *To:* Savolainen, Petri (NSN - FI/Espoo) >>>>>> *Cc:* ext Alexandru Badicioiu; Ola Liljedahl; >>>>>> [email protected] >>>>>> >>>>>> *Subject:* Re: [lng-odp] odp_schedule() vs. odp_schedule_one() >>>>>> >>>>>> >>>>>> >>>>>> It's not clear why you'd want to expose implementation considerations >>>>>> through the API. That's what DPDK does and it gets them into all sorts >>>>>> of >>>>>> portability trouble. odp_schedule() is how a thread discovers the next >>>>>> thing it's supposed to do. From that standpoint there doesn't appear to >>>>>> be >>>>>> any application-visible distinction between odp_schedule() and >>>>>> odp_schedule_one(). In both cases, the application gets a buffer, as >>>>>> well >>>>>> as the queue it was drawn from. That's all the application needs to >>>>>> know--everything else is behind-the-scenes implementation mechanics that >>>>>> will vary from platform to platform. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Oct 15, 2014 at 8:13 AM, Savolainen, Petri (NSN - FI/Espoo) < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> It’s not only push vs pull. It can be also “pull many” vs “pull one”. >>>>>> Alex, I think your HW supports both: pull many or pull only one. >>>>>> >>>>>> Global scheduling == SoC level scheduling, not scheduling from e.g. >>>>>> per core level stash of (pre-scheduled) buffers/queues. >>>>>> >>>>>> The first goal of the function is to streamline application main loop >>>>>> when application have to step out of the schedule loop often (e.g. in >>>>>> addition to ODP scheduler, poll a third party lib). So instead of ... >>>>>> >>>>>> main_odp_loop >>>>>> { >>>>>> odp_schedule_resume() >>>>>> >>>>>> buf = odp_schedule(...) >>>>>> >>>>>> <process it> >>>>>> >>>>>> odp_schedule_pause() >>>>>> >>>>>> while ( (buf = odp_schedule(...)) != INVALID) >>>>>> { >>>>>> <process it> >>>>>> } >>>>>> >>>>>> odp_schedule_release_atomic() >>>>>> >>>>>> return >>>>>> } >>>>>> >>>>>> ... you can do ... >>>>>> >>>>>> main_odp_loop >>>>>> { >>>>>> >>>>>> buf = odp_schedule_one(...) >>>>>> >>>>>> <process it> >>>>>> >>>>>> odp_schedule_release_atomic() >>>>>> >>>>>> return >>>>>> } >>>>>> >>>>>> >>>>>> The second goal is to optimize for QoS response time. It could be >>>>>> handled with another call that tells ODP to optimize for QoS instead of >>>>>> throughput. >>>>>> >>>>>> >>>>>> -Petri >>>>>> >>>>>> >>>>>> From: [email protected] [mailto: >>>>>> [email protected]] On Behalf Of ext Alexandru >>>>>> Badicioiu >>>>>> Sent: Wednesday, October 15, 2014 3:52 PM >>>>>> To: Ola Liljedahl >>>>>> Cc: [email protected] >>>>>> Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one() >>>>>> >>>>>> >>>>>> The documentation suggests that these two calls can be used in the >>>>>> same application which may be a problem also for platforms which do >>>>>> support >>>>>> both modes, but not at the same time or without re-initialization, >>>>>> re-configuration, etc. By modes I mean PUSH (odp_schedule()), when the >>>>>> scheduler runs independently of the application and pushes frames to the >>>>>> application, and PULL (odp_schedule_one()) when the scheduler runs when >>>>>> the application decides and the application pulls the frames from the >>>>>> scheduler. >>>>>> Also the term "global scheduling" is confusing and may not reflect >>>>>> the reality of the HW. >>>>>> >>>>>> >>>>>> Alex >>>>>> >>>>>> On 15 October 2014 15:15, Ola Liljedahl <[email protected]> >>>>>> wrote: >>>>>> * Schedule one buffer >>>>>> * >>>>>> * Like odp_schedule(), but is quaranteed to schedule only one buffer >>>>>> at a time. >>>>>> * Each call will perform global scheduling and will reserve one >>>>>> buffer per >>>>>> * thread in maximum. When called after other schedule functions, >>>>>> returns >>>>>> * locally stored buffers (if any) first, and then continues in the >>>>>> global >>>>>> * scheduling mode. >>>>>> * >>>>>> * This function optimises priority scheduling (over throughput). >>>>>> >>>>>> As Taras commented, some implementations will not be able to truly >>>>>> schedule only one event at a time. Scheduler implementations could use a >>>>>> pipelined designed where events are scheduled in advance so that the next >>>>>> event can be prefetched while the current event is being processed. This >>>>>> will limit concurrent processing (e.g. an idle core could have received >>>>>> that second event and process it concurrently, this would have reduced >>>>>> latency for that event). >>>>>> >>>>>> odp_schedule_one() has the same functionality as odp_schedule(). >>>>>> However it is supposed to guarantee only one event at a time is scheduled >>>>>> in order to prioritize latency to the potential detriment of throughput. >>>>>> >>>>>> We question whether odp_schedule_one() actually has to guarantee only >>>>>> one event at a time. The functionality provided is the same for these two >>>>>> calls. One call is focused on throughput (and minimizing overhead, >>>>>> e.g.by allowing prescheduling and do prefetching), the other is >>>>>> focused on latency (at the cost of overhead). An ODP implementation could >>>>>> use the same implementation for both functions (some ODP implementations >>>>>> will always schedule events in advance, other implementations will always >>>>>> only schedule one event at a time). odp_schedule_one() just hints the ODP >>>>>> implementations that latency and concurrent processing is more important >>>>>> but this is not a strict requirement. >>>>>> >>>>>> Maybe we only need one schedule call and possibly use a different >>>>>> mechanism to hint the ODP scheduler whether to optimize for throughput >>>>>> (e.g. preschedule/prefetch) or latency. >>>>>> >>>>>> --Ola >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> lng-odp mailing list >>>>>> [email protected] >>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>>> _______________________________________________ >>>>>> lng-odp mailing list >>>>>> [email protected] >>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >
_______________________________________________ lng-odp mailing list [email protected] http://lists.linaro.org/mailman/listinfo/lng-odp
