Re: [lng-odp] odp_schedule() vs. odp_schedule_one()

Bill Fischofer Wed, 15 Oct 2014 12:23:10 -0700

That's exactly what meta data is used for.  This particular bit of meta
data would only be settable by a scheduler, so there would be no
application-accessible setter for it.  But I see no harm in making the
getter available.  That would also potentially eliminate the need for the
second argument to odp_schedule() since the application could always
retrieve the source queue information from the returned buffer if desired.


On Wed, Oct 15, 2014 at 2:14 PM, Ola Liljedahl <[email protected]>
wrote:

> I think the ODP implementation is supposed to remember the source queue
> for an atomically scheduled packet. Same for ordered queues. No need to
> expose this to the application and less risk of doing it wrong.
>
> -- Ola
>
>
> On 15 October 2014 20:57, Bill Fischofer <[email protected]>
> wrote:
>
>> Thanks, Alex.  This does imply that we're missing a critical bit of
>> buffer meta data.  Since odp_queue_enq() only specifies a target queue and
>> buffer, there needs to be some way to relate this call to the queue that
>> the buffer was previously sourced from so that the atomic/ordered semantics
>> can be maintained.
>>
>> Should we have a last_queue field in the buffers that gets set by
>> odp_schedule() and referenced as part of subsequent enq operations?
>>
>> Bill
>>
>> On Wed, Oct 15, 2014 at 1:44 PM, Alexandru Badicioiu <
>> [email protected]> wrote:
>>
>>> Bill, I have the same understanding as yours regarding these aspects.
>>> Free calls should be aware of the source queue of a buffer to inform the
>>> scheduler that the context should be released too. The same for enqueue
>>> calls. I'm not sure of the use of explicit release context calls -
>>> eventually any buffer/event/packet (i.e. entity delivered by the scheduler)
>>> would be enqueued or freed so the scheduler will be informed.
>>>
>>> Alex
>>>
>>> On 15 October 2014 20:23, Bill Fischofer <[email protected]>
>>> wrote:
>>>
>>>> If I call odp_schedule() and get back an event associated with an
>>>> atomic queue, my understanding is that I owe the implementation a
>>>> subsequent call to odp_schedule_release_atomic() to dispose of it.
>>>> Similarly, if I receive an event from an ordered queue, I need to enq that
>>>> event somewhere else or otherwise there will be a gap in the downstream
>>>> order that will cause a stall.
>>>>
>>>> The interplay between queues and the scheduler is why we need a design
>>>> that spells this out in detail.  That's where what is and is not API vs.
>>>> implementation is also spelled out.  Right now my understanding is we have
>>>> the following types of queues and their scheduling
>>>> implications/interactions:
>>>>
>>>>    - Parallel: Anything on the queue can be given to anyone without
>>>>    restriction.  There are no restrictions relating to subsequent event
>>>>    disposal or downstream processing.
>>>>
>>>>
>>>>    - Atomic: Anyone can get something from a queue, but once they get
>>>>    it the queue is not able to give out subsequent events to anyone else 
>>>> until
>>>>    the queue is either explicitly (via odp_schedule_release_atomic()) or
>>>>    implicitly (via a subsequent enq of the event to some other queue) made
>>>>    available for rescheduling.
>>>>
>>>>
>>>>    - Ordered: Anything on the queue can be given to anyone without
>>>>    restriction, however subsequent enqs of those events onto other queues 
>>>> must
>>>>    be order preserving.  This implies that if an application wishes to 
>>>> dispose
>>>>    of an event without a subsequent enq it needs to inform the scheduler of
>>>>    this to prevent downstream stalls.  So it appears there needs to be an
>>>>    ordered equivalent to odp_schedule_relaese_atomic() that we currently 
>>>> don't
>>>>    have.
>>>>
>>>> Is this understanding correct?  If not what is the correct way to view
>>>> this?  In any event, we just need to write this out and get agreement as to
>>>> what the meanings and conventions are associated with this.
>>>>
>>>> On Wed, Oct 15, 2014 at 11:42 AM, Ola Liljedahl <
>>>> [email protected]> wrote:
>>>>
>>>>> Is should be part of the implementation. But it is not. Because
>>>>> prescheduled events might not be processed if the thread the events are
>>>>> prefetched to stops calling schedule() and process those events. And the
>>>>> corresponding queues (if atomic) will be locked forever... Also problems
>>>>> with ordered queues as later packets will also be stalled until those
>>>>> prefetched packets are released.
>>>>>
>>>>> On 15 October 2014 17:54, Bill Fischofer <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Whether or not events are prefetched is an implementation
>>>>>> consideration that is part of the implementation, not the application, 
>>>>>> no?
>>>>>> Again, I don't think this is something we need to worry about for ODP
>>>>>> v1.0.  It should be properly addressed in a wider context post-v1.0.
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 10:51 AM, Ola Liljedahl <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> What if a thread wants to stop consuming and processing events? We
>>>>>>> don't (can't) leave some events prescheduled (and stashed in some 
>>>>>>> per-core
>>>>>>> "portal") after the thread has stopped processing. So a thread must be 
>>>>>>> able
>>>>>>> to stop prefetching and then consume and process all remaining 
>>>>>>> (prefetched)
>>>>>>> events before it completely stops processing. How would this work on
>>>>>>> Freescale or TI ODP implementations?
>>>>>>>
>>>>>>>
>>>>>>> On 15 October 2014 15:54, Savolainen, Petri (NSN - FI/Espoo) <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>>  System will deadlock if your application decides to step out from
>>>>>>>> the schedule loop, and  a throughput optimized scheduler has already
>>>>>>>> pre-scheduled a number of buffers to that core (== locked a number of
>>>>>>>> atomic queues).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Application has to be sure that scheduler has not locked anything
>>>>>>>> for that core before stepping out of the schedule loop. Typically, it’s
>>>>>>>> impossible for the HW scheduler to rewind scheduling decision 
>>>>>>>> afterwards
>>>>>>>> (when application tells it wants to exit).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Petri
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* ext Bill Fischofer [mailto:[email protected]]
>>>>>>>> *Sent:* Wednesday, October 15, 2014 4:38 PM
>>>>>>>> *To:* Savolainen, Petri (NSN - FI/Espoo)
>>>>>>>> *Cc:* ext Alexandru Badicioiu; Ola Liljedahl;
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>> *Subject:* Re: [lng-odp] odp_schedule() vs. odp_schedule_one()
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It's not clear why you'd want to expose implementation
>>>>>>>> considerations through the API.  That's what DPDK does and it gets them
>>>>>>>> into all sorts of portability trouble.  odp_schedule() is how a thread
>>>>>>>> discovers the next thing it's supposed to do. From that standpoint 
>>>>>>>> there
>>>>>>>> doesn't appear to be any application-visible distinction between
>>>>>>>> odp_schedule() and odp_schedule_one().  In both cases, the application 
>>>>>>>> gets
>>>>>>>> a buffer, as well as the queue it was drawn from.  That's all the
>>>>>>>> application needs to know--everything else is behind-the-scenes
>>>>>>>> implementation mechanics that will vary from platform to platform.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 15, 2014 at 8:13 AM, Savolainen, Petri (NSN - FI/Espoo)
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> It’s not only push vs pull. It can be also “pull many” vs “pull
>>>>>>>> one”. Alex, I think your HW supports both: pull many or pull only one.
>>>>>>>>
>>>>>>>> Global scheduling == SoC level scheduling, not scheduling from e.g.
>>>>>>>> per core level stash of (pre-scheduled) buffers/queues.
>>>>>>>>
>>>>>>>> The first goal of the function is to streamline application main
>>>>>>>> loop when application have to step out of the schedule loop often 
>>>>>>>> (e.g. in
>>>>>>>> addition to ODP scheduler, poll a third party lib). So instead of ...
>>>>>>>>
>>>>>>>> main_odp_loop
>>>>>>>> {
>>>>>>>>   odp_schedule_resume()
>>>>>>>>
>>>>>>>>   buf = odp_schedule(...)
>>>>>>>>
>>>>>>>>   <process it>
>>>>>>>>
>>>>>>>>   odp_schedule_pause()
>>>>>>>>
>>>>>>>>   while ( (buf = odp_schedule(...)) != INVALID)
>>>>>>>>   {
>>>>>>>>     <process it>
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   odp_schedule_release_atomic()
>>>>>>>>
>>>>>>>>   return
>>>>>>>> }
>>>>>>>>
>>>>>>>> ... you can do ...
>>>>>>>>
>>>>>>>> main_odp_loop
>>>>>>>> {
>>>>>>>>
>>>>>>>>   buf = odp_schedule_one(...)
>>>>>>>>
>>>>>>>>   <process it>
>>>>>>>>
>>>>>>>>   odp_schedule_release_atomic()
>>>>>>>>
>>>>>>>>   return
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> The second goal is to optimize for QoS response time. It could be
>>>>>>>> handled with another call that tells ODP to optimize for QoS instead of
>>>>>>>> throughput.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Petri
>>>>>>>>
>>>>>>>>
>>>>>>>> From: [email protected] [mailto:
>>>>>>>> [email protected]] On Behalf Of ext Alexandru
>>>>>>>> Badicioiu
>>>>>>>> Sent: Wednesday, October 15, 2014 3:52 PM
>>>>>>>> To: Ola Liljedahl
>>>>>>>> Cc: [email protected]
>>>>>>>> Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one()
>>>>>>>>
>>>>>>>>
>>>>>>>> The documentation suggests that these two calls can be used in the
>>>>>>>> same application which may be a problem also for platforms which do 
>>>>>>>> support
>>>>>>>> both modes, but not at the same time or without re-initialization,
>>>>>>>> re-configuration, etc. By modes I mean PUSH (odp_schedule()), when the
>>>>>>>> scheduler runs independently of the application and pushes frames to 
>>>>>>>> the
>>>>>>>> application,  and PULL (odp_schedule_one()) when the scheduler runs 
>>>>>>>> when
>>>>>>>> the application decides and the application pulls the frames from the
>>>>>>>> scheduler.
>>>>>>>> Also the term "global scheduling" is confusing and may not reflect
>>>>>>>> the reality of the HW.
>>>>>>>>
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> On 15 October 2014 15:15, Ola Liljedahl <[email protected]>
>>>>>>>> wrote:
>>>>>>>>  * Schedule one buffer
>>>>>>>>  *
>>>>>>>>  * Like odp_schedule(), but is quaranteed to schedule only one
>>>>>>>> buffer at a time.
>>>>>>>>  * Each call will perform global scheduling and will reserve one
>>>>>>>> buffer per
>>>>>>>>  * thread in maximum. When called after other schedule functions,
>>>>>>>> returns
>>>>>>>>  * locally stored buffers (if any) first, and then continues in the
>>>>>>>> global
>>>>>>>>  * scheduling mode.
>>>>>>>>  *
>>>>>>>>  * This function optimises priority scheduling (over throughput).
>>>>>>>>
>>>>>>>> As Taras commented, some implementations will not be able to truly
>>>>>>>> schedule only one event at a time. Scheduler implementations could use 
>>>>>>>> a
>>>>>>>> pipelined designed where events are scheduled in advance so that the 
>>>>>>>> next
>>>>>>>> event can be prefetched while the current event is being processed. 
>>>>>>>> This
>>>>>>>> will limit concurrent processing (e.g. an idle core could have received
>>>>>>>> that second event and process it concurrently, this would have reduced
>>>>>>>> latency for that event).
>>>>>>>>
>>>>>>>> odp_schedule_one() has the same functionality as odp_schedule().
>>>>>>>> However it is supposed to guarantee only one event at a time is 
>>>>>>>> scheduled
>>>>>>>> in order to prioritize latency to the potential detriment of 
>>>>>>>> throughput.
>>>>>>>>
>>>>>>>> We question whether odp_schedule_one() actually has to guarantee
>>>>>>>> only one event at a time. The functionality provided is the same for 
>>>>>>>> these
>>>>>>>> two calls. One call is focused on throughput (and minimizing overhead,
>>>>>>>> e.g.by allowing prescheduling and do prefetching), the other is
>>>>>>>> focused on latency (at the cost of overhead). An ODP implementation 
>>>>>>>> could
>>>>>>>> use the same implementation for both functions (some ODP 
>>>>>>>> implementations
>>>>>>>> will always schedule events in advance, other implementations will 
>>>>>>>> always
>>>>>>>> only schedule one event at a time). odp_schedule_one() just hints the 
>>>>>>>> ODP
>>>>>>>> implementations that latency and concurrent processing is more 
>>>>>>>> important
>>>>>>>> but this is not a strict requirement.
>>>>>>>>
>>>>>>>> Maybe we only need one schedule call and possibly use a different
>>>>>>>> mechanism to hint the ODP scheduler whether to optimize for throughput
>>>>>>>> (e.g. preschedule/prefetch) or latency.
>>>>>>>>
>>>>>>>> --Ola
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> lng-odp mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> lng-odp mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] odp_schedule() vs. odp_schedule_one()

Reply via email to