Thanks, Alex.  This does imply that we're missing a critical bit of buffer
meta data.  Since odp_queue_enq() only specifies a target queue and buffer,
there needs to be some way to relate this call to the queue that the buffer
was previously sourced from so that the atomic/ordered semantics can be
maintained.

Should we have a last_queue field in the buffers that gets set by
odp_schedule() and referenced as part of subsequent enq operations?

Bill

On Wed, Oct 15, 2014 at 1:44 PM, Alexandru Badicioiu <
[email protected]> wrote:

> Bill, I have the same understanding as yours regarding these aspects. Free
> calls should be aware of the source queue of a buffer to inform the
> scheduler that the context should be released too. The same for enqueue
> calls. I'm not sure of the use of explicit release context calls -
> eventually any buffer/event/packet (i.e. entity delivered by the scheduler)
> would be enqueued or freed so the scheduler will be informed.
>
> Alex
>
> On 15 October 2014 20:23, Bill Fischofer <[email protected]>
> wrote:
>
>> If I call odp_schedule() and get back an event associated with an atomic
>> queue, my understanding is that I owe the implementation a subsequent call
>> to odp_schedule_release_atomic() to dispose of it.  Similarly, if I receive
>> an event from an ordered queue, I need to enq that event somewhere else or
>> otherwise there will be a gap in the downstream order that will cause a
>> stall.
>>
>> The interplay between queues and the scheduler is why we need a design
>> that spells this out in detail.  That's where what is and is not API vs.
>> implementation is also spelled out.  Right now my understanding is we have
>> the following types of queues and their scheduling
>> implications/interactions:
>>
>>    - Parallel: Anything on the queue can be given to anyone without
>>    restriction.  There are no restrictions relating to subsequent event
>>    disposal or downstream processing.
>>
>>
>>    - Atomic: Anyone can get something from a queue, but once they get it
>>    the queue is not able to give out subsequent events to anyone else until
>>    the queue is either explicitly (via odp_schedule_release_atomic()) or
>>    implicitly (via a subsequent enq of the event to some other queue) made
>>    available for rescheduling.
>>
>>
>>    - Ordered: Anything on the queue can be given to anyone without
>>    restriction, however subsequent enqs of those events onto other queues 
>> must
>>    be order preserving.  This implies that if an application wishes to 
>> dispose
>>    of an event without a subsequent enq it needs to inform the scheduler of
>>    this to prevent downstream stalls.  So it appears there needs to be an
>>    ordered equivalent to odp_schedule_relaese_atomic() that we currently 
>> don't
>>    have.
>>
>> Is this understanding correct?  If not what is the correct way to view
>> this?  In any event, we just need to write this out and get agreement as to
>> what the meanings and conventions are associated with this.
>>
>> On Wed, Oct 15, 2014 at 11:42 AM, Ola Liljedahl <[email protected]
>> > wrote:
>>
>>> Is should be part of the implementation. But it is not. Because
>>> prescheduled events might not be processed if the thread the events are
>>> prefetched to stops calling schedule() and process those events. And the
>>> corresponding queues (if atomic) will be locked forever... Also problems
>>> with ordered queues as later packets will also be stalled until those
>>> prefetched packets are released.
>>>
>>> On 15 October 2014 17:54, Bill Fischofer <[email protected]>
>>> wrote:
>>>
>>>> Whether or not events are prefetched is an implementation consideration
>>>> that is part of the implementation, not the application, no?  Again, I
>>>> don't think this is something we need to worry about for ODP v1.0.  It
>>>> should be properly addressed in a wider context post-v1.0.
>>>>
>>>> On Wed, Oct 15, 2014 at 10:51 AM, Ola Liljedahl <
>>>> [email protected]> wrote:
>>>>
>>>>> What if a thread wants to stop consuming and processing events? We
>>>>> don't (can't) leave some events prescheduled (and stashed in some per-core
>>>>> "portal") after the thread has stopped processing. So a thread must be 
>>>>> able
>>>>> to stop prefetching and then consume and process all remaining 
>>>>> (prefetched)
>>>>> events before it completely stops processing. How would this work on
>>>>> Freescale or TI ODP implementations?
>>>>>
>>>>>
>>>>> On 15 October 2014 15:54, Savolainen, Petri (NSN - FI/Espoo) <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>  System will deadlock if your application decides to step out from
>>>>>> the schedule loop, and  a throughput optimized scheduler has already
>>>>>> pre-scheduled a number of buffers to that core (== locked a number of
>>>>>> atomic queues).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Application has to be sure that scheduler has not locked anything for
>>>>>> that core before stepping out of the schedule loop. Typically, it’s
>>>>>> impossible for the HW scheduler to rewind scheduling decision afterwards
>>>>>> (when application tells it wants to exit).
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Petri
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* ext Bill Fischofer [mailto:[email protected]]
>>>>>> *Sent:* Wednesday, October 15, 2014 4:38 PM
>>>>>> *To:* Savolainen, Petri (NSN - FI/Espoo)
>>>>>> *Cc:* ext Alexandru Badicioiu; Ola Liljedahl;
>>>>>> [email protected]
>>>>>>
>>>>>> *Subject:* Re: [lng-odp] odp_schedule() vs. odp_schedule_one()
>>>>>>
>>>>>>
>>>>>>
>>>>>> It's not clear why you'd want to expose implementation considerations
>>>>>> through the API.  That's what DPDK does and it gets them into all sorts 
>>>>>> of
>>>>>> portability trouble.  odp_schedule() is how a thread discovers the next
>>>>>> thing it's supposed to do. From that standpoint there doesn't appear to 
>>>>>> be
>>>>>> any application-visible distinction between odp_schedule() and
>>>>>> odp_schedule_one().  In both cases, the application gets a buffer, as 
>>>>>> well
>>>>>> as the queue it was drawn from.  That's all the application needs to
>>>>>> know--everything else is behind-the-scenes implementation mechanics that
>>>>>> will vary from platform to platform.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 8:13 AM, Savolainen, Petri (NSN - FI/Espoo) <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It’s not only push vs pull. It can be also “pull many” vs “pull one”.
>>>>>> Alex, I think your HW supports both: pull many or pull only one.
>>>>>>
>>>>>> Global scheduling == SoC level scheduling, not scheduling from e.g.
>>>>>> per core level stash of (pre-scheduled) buffers/queues.
>>>>>>
>>>>>> The first goal of the function is to streamline application main loop
>>>>>> when application have to step out of the schedule loop often (e.g. in
>>>>>> addition to ODP scheduler, poll a third party lib). So instead of ...
>>>>>>
>>>>>> main_odp_loop
>>>>>> {
>>>>>>   odp_schedule_resume()
>>>>>>
>>>>>>   buf = odp_schedule(...)
>>>>>>
>>>>>>   <process it>
>>>>>>
>>>>>>   odp_schedule_pause()
>>>>>>
>>>>>>   while ( (buf = odp_schedule(...)) != INVALID)
>>>>>>   {
>>>>>>     <process it>
>>>>>>   }
>>>>>>
>>>>>>   odp_schedule_release_atomic()
>>>>>>
>>>>>>   return
>>>>>> }
>>>>>>
>>>>>> ... you can do ...
>>>>>>
>>>>>> main_odp_loop
>>>>>> {
>>>>>>
>>>>>>   buf = odp_schedule_one(...)
>>>>>>
>>>>>>   <process it>
>>>>>>
>>>>>>   odp_schedule_release_atomic()
>>>>>>
>>>>>>   return
>>>>>> }
>>>>>>
>>>>>>
>>>>>> The second goal is to optimize for QoS response time. It could be
>>>>>> handled with another call that tells ODP to optimize for QoS instead of
>>>>>> throughput.
>>>>>>
>>>>>>
>>>>>> -Petri
>>>>>>
>>>>>>
>>>>>> From: [email protected] [mailto:
>>>>>> [email protected]] On Behalf Of ext Alexandru
>>>>>> Badicioiu
>>>>>> Sent: Wednesday, October 15, 2014 3:52 PM
>>>>>> To: Ola Liljedahl
>>>>>> Cc: [email protected]
>>>>>> Subject: Re: [lng-odp] odp_schedule() vs. odp_schedule_one()
>>>>>>
>>>>>>
>>>>>> The documentation suggests that these two calls can be used in the
>>>>>> same application which may be a problem also for platforms which do 
>>>>>> support
>>>>>> both modes, but not at the same time or without re-initialization,
>>>>>> re-configuration, etc. By modes I mean PUSH (odp_schedule()), when the
>>>>>> scheduler runs independently of the application and pushes frames to the
>>>>>> application,  and PULL (odp_schedule_one()) when the scheduler runs when
>>>>>> the application decides and the application pulls the frames from the
>>>>>> scheduler.
>>>>>> Also the term "global scheduling" is confusing and may not reflect
>>>>>> the reality of the HW.
>>>>>>
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> On 15 October 2014 15:15, Ola Liljedahl <[email protected]>
>>>>>> wrote:
>>>>>>  * Schedule one buffer
>>>>>>  *
>>>>>>  * Like odp_schedule(), but is quaranteed to schedule only one buffer
>>>>>> at a time.
>>>>>>  * Each call will perform global scheduling and will reserve one
>>>>>> buffer per
>>>>>>  * thread in maximum. When called after other schedule functions,
>>>>>> returns
>>>>>>  * locally stored buffers (if any) first, and then continues in the
>>>>>> global
>>>>>>  * scheduling mode.
>>>>>>  *
>>>>>>  * This function optimises priority scheduling (over throughput).
>>>>>>
>>>>>> As Taras commented, some implementations will not be able to truly
>>>>>> schedule only one event at a time. Scheduler implementations could use a
>>>>>> pipelined designed where events are scheduled in advance so that the next
>>>>>> event can be prefetched while the current event is being processed. This
>>>>>> will limit concurrent processing (e.g. an idle core could have received
>>>>>> that second event and process it concurrently, this would have reduced
>>>>>> latency for that event).
>>>>>>
>>>>>> odp_schedule_one() has the same functionality as odp_schedule().
>>>>>> However it is supposed to guarantee only one event at a time is scheduled
>>>>>> in order to prioritize latency to the potential detriment of throughput.
>>>>>>
>>>>>> We question whether odp_schedule_one() actually has to guarantee only
>>>>>> one event at a time. The functionality provided is the same for these two
>>>>>> calls. One call is focused on throughput (and minimizing overhead,
>>>>>> e.g.by allowing prescheduling and do prefetching), the other is
>>>>>> focused on latency (at the cost of overhead). An ODP implementation could
>>>>>> use the same implementation for both functions (some ODP implementations
>>>>>> will always schedule events in advance, other implementations will always
>>>>>> only schedule one event at a time). odp_schedule_one() just hints the ODP
>>>>>> implementations that latency and concurrent processing is more important
>>>>>> but this is not a strict requirement.
>>>>>>
>>>>>> Maybe we only need one schedule call and possibly use a different
>>>>>> mechanism to hint the ODP scheduler whether to optimize for throughput
>>>>>> (e.g. preschedule/prefetch) or latency.
>>>>>>
>>>>>> --Ola
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> [email protected]
>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> [email protected]
>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to