For octeon platform, both odp_schedule_one() and odp_schedule() are identical.
We can't tell the hardware to prefetch "n" number of works to specified core.

Petri,

Could you please share the significance of the use of 
"odp_schedule_release_atomic()"
in below mentioned code snippet that you have shared earlier in this mail chain.

main_odp_loop
{

  buf = odp_schedule_one(...)

  <process it>

  odp_schedule_release_atomic()

  return
}​​


________________________________
From: [email protected] <[email protected]> on 
behalf of Bill Fischofer <[email protected]>
Sent: Wednesday, October 15, 2014 7:43 PM
To: Savolainen, Petri (NSN - FI/Espoo)
Cc: [email protected]
Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one()

How is an application stepping outside of the schedule loop any different from 
an application taking an excessively long time to complete the processing for a 
single schedule?  These seem to be more issues with the application design than 
with the meaning of specific ODP APIs.

On Wed, Oct 15, 2014 at 9:01 AM, Savolainen, Petri (NSN - FI/Espoo) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Alex, are you sure it’s init level only. I think there’s a mode where core can 
specify in each dequeue command  give me “only 1” vs “1..3”.

Anyway, QMan is doing that SoC level scheduling and cores just process frames 
from their per “core stash” (SoC level scheduling is done already in that point 
when a core sees a frame).


-Petri


From: ext Alexandru Badicioiu 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, October 15, 2014 4:47 PM
To: Savolainen, Petri (NSN - FI/Espoo)
Cc: Ola Liljedahl; [email protected]<mailto:[email protected]>

Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one()

Hi Petri,
yes, my hardware supports both push and pull models and up to three frames can 
be returned as the result of executing a dequeue command.  But these modes are 
configured at initialization on a per-core basis and can't be changed on 
traffic or without re-initialization.
ODP "global scheduling" is still confusing for me. It still looks to me that 
the meaning is executing the scheduler as a result of a schedule call. The 
"core stash" you mention is for me the memory area from where a core gets the 
results of scheduling commands (i.e. dequeue commands) regardless of the 
working mode of the scheduler (push vs pull). With DPAA the scheduling happens 
on a per-core (sw portal) basis, not per SoC.
Ola,
I think the scheduler provides the work (i.e. frames) to the cores, not the 
classifier. Classifier only associates a frame with a queue.

Alex







On 15 October 2014 16:13, Savolainen, Petri (NSN - FI/Espoo) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

It’s not only push vs pull. It can be also “pull many” vs “pull one”. Alex, I 
think your HW supports both: pull many or pull only one.

Global scheduling == SoC level scheduling, not scheduling from e.g. per core 
level stash of (pre-scheduled) buffers/queues.

The first goal of the function is to streamline application main loop when 
application have to step out of the schedule loop often (e.g. in addition to 
ODP scheduler, poll a third party lib). So instead of ...

main_odp_loop
{
  odp_schedule_resume()

  buf = odp_schedule(...)

  <process it>

  odp_schedule_pause()

  while ( (buf = odp_schedule(...)) != INVALID)
  {
    <process it>
  }

  odp_schedule_release_atomic()

  return
}

... you can do ...

main_odp_loop
{

  buf = odp_schedule_one(...)

  <process it>

  odp_schedule_release_atomic()

  return
}


The second goal is to optimize for QoS response time. It could be handled with 
another call that tells ODP to optimize for QoS instead of throughput.


-Petri


From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of ext Alexandru Badicioiu
Sent: Wednesday, October 15, 2014 3:52 PM
To: Ola Liljedahl
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one()

The documentation suggests that these two calls can be used in the same 
application which may be a problem also for platforms which do support both 
modes, but not at the same time or without re-initialization, re-configuration, 
etc. By modes I mean PUSH (odp_schedule()), when the scheduler runs 
independently of the application and pushes frames to the application,  and 
PULL (odp_schedule_one()) when the scheduler runs when the application decides 
and the application pulls the frames from the scheduler.
Also the term "global scheduling" is confusing and may not reflect the reality 
of the HW.


Alex

On 15 October 2014 15:15, Ola Liljedahl 
<[email protected]<mailto:[email protected]>> wrote:
 * Schedule one buffer
 *
 * Like odp_schedule(), but is quaranteed to schedule only one buffer at a time.
 * Each call will perform global scheduling and will reserve one buffer per
 * thread in maximum. When called after other schedule functions, returns
 * locally stored buffers (if any) first, and then continues in the global
 * scheduling mode.
 *
 * This function optimises priority scheduling (over throughput).

As Taras commented, some implementations will not be able to truly schedule 
only one event at a time. Scheduler implementations could use a pipelined 
designed where events are scheduled in advance so that the next event can be 
prefetched while the current event is being processed. This will limit 
concurrent processing (e.g. an idle core could have received that second event 
and process it concurrently, this would have reduced latency for that event).

odp_schedule_one() has the same functionality as odp_schedule(). However it is 
supposed to guarantee only one event at a time is scheduled in order to 
prioritize latency to the potential detriment of throughput.

We question whether odp_schedule_one() actually has to guarantee only one event 
at a time. The functionality provided is the same for these two calls. One call 
is focused on throughput (and minimizing overhead, e.g.by<http://e.g.by> 
allowing prescheduling and do prefetching), the other is focused on latency (at 
the cost of overhead). An ODP implementation could use the same implementation 
for both functions (some ODP implementations will always schedule events in 
advance, other implementations will always only schedule one event at a time). 
odp_schedule_one() just hints the ODP implementations that latency and 
concurrent processing is more important but this is not a strict requirement.

Maybe we only need one schedule call and possibly use a different mechanism to 
hint the ODP scheduler whether to optimize for throughput (e.g. 
preschedule/prefetch) or latency.

--Ola


_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
http://lists.linaro.org/mailman/listinfo/lng-odp


_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
http://lists.linaro.org/mailman/listinfo/lng-odp


_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to