It’s release_atomic needed when you have finished working on the current buffer from an atomic queue, but you are not going to call another schedule call for a while (like in this example when you step out from the main_odp_loop). Schedule call releases atomic context implicitly, but now application don’t want another buffer but step out.
-Petri From: ext Jacob, Jerin [mailto:[email protected]] Sent: Wednesday, October 15, 2014 5:17 PM To: Bill Fischofer; Savolainen, Petri (NSN - FI/Espoo) Cc: [email protected] Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one() For octeon platform, both odp_schedule_one() and odp_schedule() are identical. We can't tell the hardware to prefetch "n" number of works to specified core. Petri, Could you please share the significance of the use of "odp_schedule_release_atomic()" in below mentioned code snippet that you have shared earlier in this mail chain. main_odp_loop { buf = odp_schedule_one(...) <process it> odp_schedule_release_atomic() return } ________________________________ From: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> on behalf of Bill Fischofer <[email protected]<mailto:[email protected]>> Sent: Wednesday, October 15, 2014 7:43 PM To: Savolainen, Petri (NSN - FI/Espoo) Cc: [email protected]<mailto:[email protected]> Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one() How is an application stepping outside of the schedule loop any different from an application taking an excessively long time to complete the processing for a single schedule? These seem to be more issues with the application design than with the meaning of specific ODP APIs. On Wed, Oct 15, 2014 at 9:01 AM, Savolainen, Petri (NSN - FI/Espoo) <[email protected]<mailto:[email protected]>> wrote: Hi, Alex, are you sure it’s init level only. I think there’s a mode where core can specify in each dequeue command give me “only 1” vs “1..3”. Anyway, QMan is doing that SoC level scheduling and cores just process frames from their per “core stash” (SoC level scheduling is done already in that point when a core sees a frame). -Petri From: ext Alexandru Badicioiu [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, October 15, 2014 4:47 PM To: Savolainen, Petri (NSN - FI/Espoo) Cc: Ola Liljedahl; [email protected]<mailto:[email protected]> Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one() Hi Petri, yes, my hardware supports both push and pull models and up to three frames can be returned as the result of executing a dequeue command. But these modes are configured at initialization on a per-core basis and can't be changed on traffic or without re-initialization. ODP "global scheduling" is still confusing for me. It still looks to me that the meaning is executing the scheduler as a result of a schedule call. The "core stash" you mention is for me the memory area from where a core gets the results of scheduling commands (i.e. dequeue commands) regardless of the working mode of the scheduler (push vs pull). With DPAA the scheduling happens on a per-core (sw portal) basis, not per SoC. Ola, I think the scheduler provides the work (i.e. frames) to the cores, not the classifier. Classifier only associates a frame with a queue. Alex On 15 October 2014 16:13, Savolainen, Petri (NSN - FI/Espoo) <[email protected]<mailto:[email protected]>> wrote: Hi, It’s not only push vs pull. It can be also “pull many” vs “pull one”. Alex, I think your HW supports both: pull many or pull only one. Global scheduling == SoC level scheduling, not scheduling from e.g. per core level stash of (pre-scheduled) buffers/queues. The first goal of the function is to streamline application main loop when application have to step out of the schedule loop often (e.g. in addition to ODP scheduler, poll a third party lib). So instead of ... main_odp_loop { odp_schedule_resume() buf = odp_schedule(...) <process it> odp_schedule_pause() while ( (buf = odp_schedule(...)) != INVALID) { <process it> } odp_schedule_release_atomic() return } ... you can do ... main_odp_loop { buf = odp_schedule_one(...) <process it> odp_schedule_release_atomic() return } The second goal is to optimize for QoS response time. It could be handled with another call that tells ODP to optimize for QoS instead of throughput. -Petri From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of ext Alexandru Badicioiu Sent: Wednesday, October 15, 2014 3:52 PM To: Ola Liljedahl Cc: [email protected]<mailto:[email protected]> Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one() The documentation suggests that these two calls can be used in the same application which may be a problem also for platforms which do support both modes, but not at the same time or without re-initialization, re-configuration, etc. By modes I mean PUSH (odp_schedule()), when the scheduler runs independently of the application and pushes frames to the application, and PULL (odp_schedule_one()) when the scheduler runs when the application decides and the application pulls the frames from the scheduler. Also the term "global scheduling" is confusing and may not reflect the reality of the HW. Alex On 15 October 2014 15:15, Ola Liljedahl <[email protected]<mailto:[email protected]>> wrote: * Schedule one buffer * * Like odp_schedule(), but is quaranteed to schedule only one buffer at a time. * Each call will perform global scheduling and will reserve one buffer per * thread in maximum. When called after other schedule functions, returns * locally stored buffers (if any) first, and then continues in the global * scheduling mode. * * This function optimises priority scheduling (over throughput). As Taras commented, some implementations will not be able to truly schedule only one event at a time. Scheduler implementations could use a pipelined designed where events are scheduled in advance so that the next event can be prefetched while the current event is being processed. This will limit concurrent processing (e.g. an idle core could have received that second event and process it concurrently, this would have reduced latency for that event). odp_schedule_one() has the same functionality as odp_schedule(). However it is supposed to guarantee only one event at a time is scheduled in order to prioritize latency to the potential detriment of throughput. We question whether odp_schedule_one() actually has to guarantee only one event at a time. The functionality provided is the same for these two calls. One call is focused on throughput (and minimizing overhead, e.g.by<http://e.g.by> allowing prescheduling and do prefetching), the other is focused on latency (at the cost of overhead). An ODP implementation could use the same implementation for both functions (some ODP implementations will always schedule events in advance, other implementations will always only schedule one event at a time). odp_schedule_one() just hints the ODP implementations that latency and concurrent processing is more important but this is not a strict requirement. Maybe we only need one schedule call and possibly use a different mechanism to hint the ODP scheduler whether to optimize for throughput (e.g. preschedule/prefetch) or latency. --Ola _______________________________________________ lng-odp mailing list [email protected]<mailto:[email protected]> http://lists.linaro.org/mailman/listinfo/lng-odp _______________________________________________ lng-odp mailing list [email protected]<mailto:[email protected]> http://lists.linaro.org/mailman/listinfo/lng-odp
_______________________________________________ lng-odp mailing list [email protected] http://lists.linaro.org/mailman/listinfo/lng-odp
