Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling models

Bill Fischofer Mon, 20 Oct 2014 10:11:30 -0700

Ok, thanks for the clarification.

I agree it makes sense for atomic queues to be implicitly released by
subsequent queueing operations on buffers taken from them.  But that sort
of reinforces the notion that one of the pieces of buffer meta data needed
is the last_queue that the buffer was on so that this release can take
place.


Regarding odp_buffer_enq_multi(), the question is whether the queue types
(parallel/atomic/ordered) imply the use of a scheduler or if they are
intrinsic to the queue itself.  While we could say that these are only
applicable to scheduled queues, the fact that many of these operations are
triggered by queue APIs suggest that this is not necessary.  For example, a
polled queue could easily be parallel, or atomic, or ordered at essentially
no additional cost.  We might want to rename odp_schedule_release_atomic()
to something like odp_buffer_release_atomic() to make that clear,
especially since odp_schedule_release_atomic() is currently defined to take
a void argument when it would seem to need an odp_buffer_t argument for
completeness.

Having these be applicable only to scheduled queues for ODP v1.0 is
probably simpler at this point, but something we may need to revisit post
v1.0, especially as we move away from the notion of a monolithic scheduler
to something more parameterized.  If we have multiple schedulers acting on
queues then it becomes clear that these operations are inherent to the
queues themselves and not just the scheduler.

Regarding atomic queues being ordered then by that definition we currently
do not implement atomic queues.  If atomic queues are ordered then if
Threads T1 and T2 each dequeue a buffer from an atomic queue A in the order
A1 and A2 and then T1 issues an odp_schedule_release_atomic() call that
unlocks A (allowing T2 to obtain A2 before T1 has disposed of A1) then a
downstream queue would need to ensure that A1 appears first even if T2
enqueues A2 before T1 enqueues A1.  We do not currently ensure this.  If
Atomic does not imply Ordered then all Atomic is really doing is protecting
the queue context from parallel access. This in itself seems useful and
need not be combined with ordering semantics.

In an earlier thread on this topic it was suggested that perhaps we'd want
to have an ORDERED_ATOMIC (or ATOMIC_ORDERED) queue scheduling type in
addition to ATOMIC for when such semantics are needed.  I could see that
making sense.

The question of order preservation through multiple queueing levels seems
highly germane and necessary for predictable behavior.  If we take the
above proposed definition then it is quite precise.  Buffer order
propagates from one ordered queue to the next, ignoring non-ordered
intermediates.  This relaxation should facilitate implementations where
ordering must be handled in SW since it means that HW offloads that may not
in themselves handle ordering as ODP defines it could be accommodated.  The
only ambiguity is how are gaps handled?  We have that question independent
of intermediates.  Perhaps an odp_queue_release_order(buf) might be a good
counterpart to odp_queue_release_atomic(buf) ?  The former call would say
that the sequence owned by the referenced buffer should be deemed filled so
that it no longer blocks subsequent buffers originating from the same
ordered queue.

Regarding the point of queues being proxies for a flow, even in that model
a flow will be proxied by multiple queues unless you require that
processing occur in a single stage.  For applications structured in a more
pipelined fashion (which can occur even with single-stages that make use of
HW offload that involves reschedules) you have multiple queues representing
the same flow at different stages of processing (and possibly having unique
per-stage per-flow contexts associated with them).  So maintaining
end-to-end order on a per-flow basis is still required.

On the final point, we're not talking about ordering on the wire (that's
clearly unpredictable and in any case beyond the scope of ODP).  But within
a single ODP application order preservation from ingress to egress, while
allowing parallelism in between, would seem to be one of the main design
points for ODP.

Bill

On Mon, Oct 20, 2014 at 10:02 AM, Ola Liljedahl <[email protected]>
wrote:

> Bill, some spelling errors... I was listing the current calls that release
> the scheduling lock for a queue.
>
> * odp_queue_end() was supposed to be odp_queue_enq()  (q is a d upside
> down, maybe I am getting dyslectic?)
> * odp_queue_end_multi() was of course odp_queue_enq_multi().  Not sure how
> this call would work with atomic or ordered queues (all buffers must come
> from same queue), I guess it can degenerate into only returning one buffer
> at a time.
> * odp_buffer_free(odp_buffer_t buf)
> * odp_schedule_release(odp_buffer_t buf)  <--- new call
> (or odp_schedule_release_atomic() morphed)
>
> -- Ola
>
>
>
>
> On 20 October 2014 16:50, Bill Fischofer <[email protected]>
> wrote:
>
>> Thanks, Ola.  I need to think about this and respond more carefully, but
>> in the meantime could you propose the syntax/semantics of odp_queue_end(),
>>  odp_queue_end_multi(), and odp_schedule_release() in a bit more detail?
>>
>> These seem to be new APIs and we need to be clear about their proposed
>> semantics and intended use.
>>
>> Thanks.
>>
>> Bill
>>
>> On Mon, Oct 20, 2014 at 9:40 AM, Ola Liljedahl <[email protected]>
>> wrote:
>>
>>> On 17 October 2014 10:01, Alexandru Badicioiu <
>>> [email protected]> wrote:
>>>
>>>> Hi Bill, check my thoughts inline.
>>>> Thanks,
>>>> Alex
>>>>
>>>> On 17 October 2014 03:31, Bill Fischofer <[email protected]>
>>>> wrote:
>>>>
>>>>> Based on discussions we had yesterday and today, I'd like to outline
>>>>> the open issues regarding queues and synchronization/scheduling models.
>>>>> We'd like to get consensus on this in time for next week's Tuesday call.
>>>>>
>>>>> ODP identifies three different synchronization/scheduling models for
>>>>> queues: Parallel, Atomic, and Ordered.  Here are my current understandings
>>>>> of what these mean:
>>>>>
>>>>>    - Parallel: Buffers on a parallel queue can be dequeued by the
>>>>>    scheduler for any caller without restriction.  This permits maximum
>>>>>    scale-out and concurrency for events that are truly independent.
>>>>>
>>>>>
>>>>>    - Atomic: Buffers on an atomic queue can be dequeued by the
>>>>>    scheduler for any caller. However, only one buffer from an atomic 
>>>>> queue may
>>>>>    be in process at any given time. When the scheduler dequeues a buffer 
>>>>> from
>>>>>    an atomic queue, the queue is locked and cannot dequeue further buffers
>>>>>    until it is released.  Releasing an atomic queue can occur in two ways:
>>>>>
>>>>>
>>>>>    - The dequeued buffer is enqueued to another queue via an
>>>>>       odp_queue_enq() call. This action implicitly unlocks the atomic 
>>>>> queue the
>>>>>       buffer was sourced from.  Note that this is the most common way in 
>>>>> which
>>>>>       atomic queues are unlocked.
>>>>>
>>>>>
>>>>>    - A call is made to odp_schedule_release_atomic() for the locked
>>>>>       queue.  This tells the scheduler that the queue's atomicity 
>>>>> guarantee is
>>>>>       deemed satisfied by the application and the queue is free to 
>>>>> dequeue items
>>>>>       to other scheduler callers. This method MUST be used if the caller 
>>>>> consumes
>>>>>       the buffer (e.g., frees it instead of enqueues it to another queue) 
>>>>> and MAY
>>>>>       be used as a performance optimization if the caller is done with any
>>>>>       references to data that was serialized by the queue (e.g., the 
>>>>> queue's
>>>>>       context).  It is an application programming error to release a queue
>>>>>       prematurely as references subsequent to the release will not be
>>>>>       synchronized.
>>>>>
>>>>> Third way - odp_buffer_free() called on a buffer which was dequeued
>>>> from an atomic queue.
>>>>
>>> odp_queue_end(): implicit release of atomic queue
>>> odp_queue_end_multi(): ? I assume multi-calls cannot be used with atomic
>>> scheduling
>>> odp_buffer_free(): (buffer is consumed) implicit release of atomic queue
>>> odp_schedule_release_atomic(): explicit release of atomic queue
>>>
>>>
>>>
>>>>
>>>>>    - Ordered: Buffers on an ordered queue can be dequeued by the
>>>>>    scheduler for any caller, however buffers on an ordered queue retain
>>>>>    knowledge of their sequence on their source queue and this sequence 
>>>>> will be
>>>>>    restored whenever they are enqueued to a subsequent ordered queue.  
>>>>> That
>>>>>    is, if ordered queue A contains buffers A1, A2, and A3, and these are
>>>>>    dequeued for processing by three different threads, then when they are
>>>>>    subsequently enqueued to another ordered queue B by these threads, they
>>>>>    will appear on B as B1, B2, and B3 regardless of the order in which 
>>>>> their
>>>>>    processing threads issued odp_queue_enq() calls to place them on B.
>>>>>
>>>>> Why ordering has to be restored only if the destination is an ordered
>>>> queue? I think it should be restored regardless of the destination queue
>>>> type. Also it should be restored if there are multiple destination queues,
>>>> if the implementation supports it.
>>>>
>>> Atomic queues are also ordered queues.
>>>
>>> If you enqueue a packet from an atomic or ordered queue onto a parallel
>>> queue, ordering would normally be lost. If there was ever only one thread
>>> scheduling packets from this parallel queue (how do you guarantee that?),
>>> then you could argue that ordering is still maintained. But this seems like
>>> a fragile situation.
>>>
>>> Nothing would stop an implementation from always restoring order for
>>> packets scheduled from ordered queues when enqueuing them.
>>>
>>>
>>> Implicit in these definitions is the fact that all queues associated
>>>>> with odp_pktio_t objects are ordered queues.
>>>>>
>>>> Why is this implicit? The order restoration happens when buffers are
>>>> enqueued to the destination queue(s). Aren't the queues associated with
>>>> odp_pktio_t the first queues seen by a packet? If an pktio queue is
>>>> parallel, there is no requirement at all to ensure any ordering at next
>>>> enqueue.
>>>>
>>> Packet I/O ingress queues may be atomic and thus also ordered in some
>>> sense.
>>> I think the application should decide what ordering requirements there
>>> are for packet I/O ingress and egress queues.
>>>
>>>
>>>
>>>>> First question: Are these definitions accurate and complete?  If not,
>>>>> then what are the correct definitions for these types we wish to define?
>>>>> Assuming these are correct, then there are several areas in ODP API that
>>>>> seem to need refinement:
>>>>>
>>>>>    - It seems that ODP buffers need at least two additional pieces of
>>>>>    system meta data that are missing:
>>>>>
>>>>>
>>>>>    - Buffers need to have a last_queue that is the odp_queue_t of the
>>>>>       last queue they were dequeued from.  Needed so that odp_queue_enq() 
>>>>> can
>>>>>       unlock a previous atomic queue.
>>>>>
>>>>>
>>>>>    - Buffers need to retain sequence knowledge from the first ordered
>>>>>       queue they were sourced from (i.e., their ingress odp_pktio_t) so 
>>>>> this can
>>>>>       be used for order restoration as they are enqueued to downstream 
>>>>> ordered
>>>>>       queues
>>>>>
>>>>>
>>>>>    - odp_schedule_release_atomic() currently takes a void argument
>>>>>    and this is ambiguous. It should either take an odp_queue_t which is 
>>>>> the
>>>>>    atomic queue that is to be unlocked or else an odp_buffer_t and use 
>>>>> that
>>>>>    buffer's last_queue to find the queue to be unlocked. Taking an
>>>>>    odp_buffer_t argument would seem to be more consistent.
>>>>>
>>>>> The idea is probably that the ODP implementation knows which queue is
>>> associated with each thread. Better that the implementation keeps track of
>>> this than the application. As both odp_queue_end() and odp_buffer_free()
>>> takes the buffer as parameter, I think it makes sense to specify the buffer
>>> also to the odp_schedule_release() call (use the same call for atomic and
>>> ordered scheduling).
>>>
>>> odp_schedule.h:
>>> void odp_schedule_release(odp_buffer_t buf);
>>>
>>>
>>> I think any argument is superfluous for odp_schedule_release_atomic , if
>>>> the application really needs to continue the buffer processing outside the
>>>> atomic context. Otherwise the context release happens on free. There can be
>>>> only one atomic context associated with a thread, at any given moment, and
>>>> the scheduler implementation has to keep track of it. Of course when
>>>> freeing the buffer the free call has to know that the atomic context of
>>>> that buffer has been previously released.
>>>>
>>> An application could store a packet on some internal data structure and
>>> continue processing later. Thus the packet would be "consumed" from a
>>> scheduler perspective and the application wants to inform the scheduler of
>>> this and unlock the corresponding queue.
>>>
>>>
>>>
>>>>>    - Given the above definition of ordered queues, it would seem that
>>>>>    there needs to be some ordered equivalent to 
>>>>> odp_schedule_release_atomic()
>>>>>    to handle the case where a buffer from an ordered queue is consumed 
>>>>> rather
>>>>>    than propagated.  This is to avoid creating un-fillable gaps in the
>>>>>    ordering sequence downstream.
>>>>>
>>>>> Any consumption will end up with calling odp_buffer_free() and free
>>>> has to inform the order restoration logic. Do we really need to extract a
>>>> buffer from the ordered flow before the consumption?
>>>>
>>> You have no idea when the buffer will be freed. We don't want to pause
>>> scheduling (for the involved queue) for an indeterminate time.
>>>
>>>
>>>
>>>>
>>>>
>>>>> Second question: If an application generates packets (via
>>>>> odp_packet_alloc()), how are these sequenced with respect to other packets
>>>>> on downstream ordered queues?  Same question for cloned/copied packets.
>>>>>
>>>>> Clones/copies share the same sequencing information. The same for
>>>> fragments. If locally generated packets have to be inserted in an ordered
>>>> flow they have to explicitly request a sequencing information relevant to
>>>> source queue at the moment of insertion. Then they can be enqueued/freed
>>>> similarly as the other buffers in the flow.
>>>>
>>>>
>>>>> Third question: If an application takes packets from a source ordered
>>>>> queue and enqueues them to different target ordered queues, how is
>>>>> sequencing handled here since by definition there are gaps in the
>>>>> individual downstream ordered queues.  The simplest example is a switch or
>>>>> router that takes packets from one odp_pktio_t and sends them to multiple
>>>>> target odp_pktio_ts. A similar question arises for packets sourced from
>>>>> multiple input ordered queues going to the same target ordered queue.
>>>>>
>>>> I think that order is important to be maintained per flow, not the
>>>> absolute one between unrelated flows.
>>>> I think type of the destination queues and their number also may have
>>>> no importance. Ordering should not be associated with a particular
>>>> destination, is rather a source defined thing. Ordering works between two
>>>> points - definition point (where the order is observed) and restoration
>>>> point , which can be rather logical (e.g. next enqueue) than physical (a
>>>> given queue). Order definition point is usually a queue. If it's feasible
>>>> for the HW that the order can be observed across multiple queues, then
>>>> ordering can work this way too. Maybe we can view the API this way - define
>>>> order definition/restoration points and associate queues with them.
>>>>
>>> Why would you maintain order across multiple queues? I don't think we
>>> are interested in global packet ordering, just per-flow. And a queue is a
>>> proxy for a flow.
>>>
>>>
>>>
>>>>
>>>>> Fourth question: How does this intersect with flow identification from
>>>>> the classifier?  It would seem that the classifier should override the raw
>>>>> packet sequence and re-write this information as a flow sequence which
>>>>> would be honored as the ordering sequence by subsequent downstream ordered
>>>>> queues.  Note that flows would still have the same downstream gap issues 
>>>>> if
>>>>> they are enqueued to multiple downstream ordered queues.  This would
>>>>> definitely arise in multihoming support for SCTP, though this example is
>>>>> not an ODP v1.0 consideration.
>>>>>
>>>> End-to-end packet order is not guaranteed anyway so termination
>>> software at the end must always handle misordered (and missing) packets.
>>>
>>>
>>>
>>>>> We may not have a complete set of answers to all of these questions
>>>>> for ODP v1.0 but we need to be precise about what is and is not done for
>>>>> them in ODP v1.0 so that we can do accurate testing for v1.0 as well as
>>>>> identify the areas that need further work next year as we move beyond 
>>>>> v1.0.
>>>>>
>>>>> Thanks for your thoughts and suggestions on these.  If there are
>>>>> additional questions along these lines feel free to add them, but I'd
>>>>> really like to scope this discussion to what is in ODP v1.0.
>>>>>
>>>>> Bill
>>>>>
>>>>> _______________________________________________
>>>>> lng-odp mailing list
>>>>> [email protected]
>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> [email protected]
>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>>
>>>
>>
>

_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling models

Reply via email to