Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling models

Bill Fischofer Mon, 20 Oct 2014 12:56:07 -0700

See inline comments.

On Mon, Oct 20, 2014 at 1:09 PM, Ola Liljedahl <[email protected]>
wrote:


> On 20 October 2014 19:09, Bill Fischofer <[email protected]>
> wrote:
>
>> Ok, thanks for the clarification.
>>
>> I agree it makes sense for atomic queues to be implicitly released by
>> subsequent queueing operations on buffers taken from them.  But that sort
>> of reinforces the notion that one of the pieces of buffer meta data needed
>> is the last_queue that the buffer was on so that this release can take
>> place.
>>
> However it is implemented, I think this should be kept hidden from the
> application. Buffers may remember which queue they were scheduled from or
> there is some state in the scheduler which remembers which CPU (thread) is
> associated with with queue etc. But to keep all options open, all calls
> that release atomic or ordered queue should take the buffer as a parameter.
>
>
That makes sense.  There are system meta data items that we explicitly
expose to applications via APIs and others that it is assume an
implementation will make use of.  This would fall into the latter
category.  The only reason for exposing a last_queue meta data item to the
application would be to enable the odp_schedule() call to not take an
output queue parameter since that information would be retrievable from the
returned buffer if the application needed to know that.


>
>> Regarding odp_buffer_enq_multi(), the question is whether the queue types
>> (parallel/atomic/ordered) imply the use of a scheduler or if they are
>> intrinsic to the queue itself.  While we could say that these are only
>> applicable to scheduled queues, the fact that many of these operations are
>> triggered by queue APIs suggest that this is not necessary.  For example, a
>> polled queue could easily be parallel, or atomic, or ordered at essentially
>> no additional cost.  We might want to rename odp_schedule_release_atomic()
>> to something like odp_buffer_release_atomic() to make that clear,
>> especially since odp_schedule_release_atomic() is currently defined to take
>> a void argument when it would seem to need an odp_buffer_t argument for
>> completeness.
>>
> If the semantics of a "polled" queue is only that that a thread specifies
> the queue directly when dequeuing a packet, I think parallel and ordered
> scheduling makes sense. Atomic scheduling does not make much sense because
> if that queue already has a packet outstanding, we cannot dequeue another
> packet from the same queue (this would the atomic guarantee). Of course the
> dequeue call can then return ODP_INVALID_BUFFER and force the application
> to do something else (e.g. dequeue from some other queue) but now we have
> just pushed scheduling into the application.
>

That's exactly how I'd expect a polled atomic queue to behave, otherwise
you'd have potential deadlock.


>
>
>> Having these be applicable only to scheduled queues for ODP v1.0 is
>> probably simpler at this point, but something we may need to revisit post
>> v1.0, especially as we move away from the notion of a monolithic scheduler
>> to something more parameterized.  If we have multiple schedulers acting on
>> queues then it becomes clear that these operations are inherent to the
>> queues themselves and not just the scheduler.
>>
>> Regarding atomic queues being ordered then by that definition we
>> currently do not implement atomic queues.  If atomic queues are ordered
>> then if Threads T1 and T2 each dequeue a buffer from an atomic queue A
>>
> This is exactly what cannot happen with atomic queues. Only one
> outstanding buffer at a time. Atomic is a poor work me think, "exclusive"
> might have been more descriptive.
>
> I was merely saying that the atomic guarantee implicitly ensures ordering
> as well. Except perhaps when the application is explicitly releasing the
> scheduler before enqueuing the packet on a queue which would allow another
> (really quick) thread to schedule another packet from the same atomic queue
> and enqueue it first. What happens then?
>

But the definition of ordered queues explicitly covers precisely this
case.  Ordering is only interesting if you can have more than one buffer
from an ordered queue "in flight" at the same time because it's only then
that system-mediated order restoration guarantees come into play.
Otherwise you just have serial processing of an atomic queue and that
requires no extra effort on the part of the system to ensure order
preservation.  The can of worms is opened by odp_schedule_release_atomic()
here.  That causes no problems *unless *we say that atomic implies
ordered.  To me the two seem independent and while we may wish to have
ATOMIC_ORDERED queues as an additional category, atomic queues that do not
have an ordering guarantee would also seem to be useful.


>
>
> in the order A1 and A2 and then T1 issues an odp_schedule_release_atomic()
>> call that unlocks A (allowing T2 to obtain A2 before T1 has disposed of A1)
>> then a downstream queue would need to ensure that A1 appears first even if
>> T2 enqueues A2 before T1 enqueues A1.  We do not currently ensure this.  If
>> Atomic does not imply Ordered then all Atomic is really doing is protecting
>> the queue context from parallel access. This in itself seems useful and
>> need not be combined with ordering semantics.
>>
> Agree that this is a separate function and the one we are primarily after
> with atomic scheduling.
>
>
>> In an earlier thread on this topic it was suggested that perhaps we'd
>> want to have an ORDERED_ATOMIC (or ATOMIC_ORDERED) queue scheduling type in
>> addition to ATOMIC for when such semantics are needed.  I could see that
>> making sense.
>>
>> The question of order preservation through multiple queueing levels seems
>> highly germane and necessary for predictable behavior.  If we take the
>> above proposed definition then it is quite precise.  Buffer order
>> propagates from one ordered queue to the next, ignoring non-ordered
>> intermediates.  This relaxation should facilitate implementations where
>> ordering must be handled in SW since it means that HW offloads that may not
>> in themselves handle ordering as ODP defines it could be accommodated.  The
>> only ambiguity is how are gaps handled?  We have that question independent
>> of intermediates.  Perhaps an odp_queue_release_order(buf) might be a good
>> counterpart to odp_queue_release_atomic(buf) ?  The former call would say
>> that the sequence owned by the referenced buffer should be deemed filled so
>> that it no longer blocks subsequent buffers originating from the same
>> ordered queue.
>>
> This was my odp_schedule_release(buf) call. Atomic or ordered does not
> matter.
>

OK.  Whether we call it odp_queue_release() or odp_schedule_release() or
perhaps better, odp_buffer_release(), the proposal would be that this
single call announces that whatever scheduling guarantees were promised for
the referenced buffer should be considered satisfied and would therefore no
longer block any other buffers originating from the same queue.  It would
mean that an application could not release a buffer "in stages" but that
may not be much of a restriction since it would eliminate potential race
conditions with partially-released buffers.  Needing to support partial
releases would significantly complicate implementations with little added
application benefit.


>
>
>>
>> Regarding the point of queues being proxies for a flow, even in that
>> model a flow will be proxied by multiple queues unless you require that
>> processing occur in a single stage.  For applications structured in a more
>> pipelined fashion (which can occur even with single-stages that make use of
>> HW offload that involves reschedules) you have multiple queues representing
>> the same flow at different stages of processing (and possibly having unique
>> per-stage per-flow contexts associated with them).  So maintaining
>> end-to-end order on a per-flow basis is still required.
>>
> As long as each processing segment maintains ingress-to-egress order, the
> whole chain of segments will maintain order. The queues between the
> processing segments are always FIFO (there is no parallelism involved). I
> think maintaining per-segment ordering is a simpler problem than
> maintaining order from physical ingress to physical egress.
>

That might be a reasonable application restriction for ODP v1.0.


>
> Packets from atomic or ordered queues do not have to be enqueued
> immediately on some queue in order to preserve ordering. Packets can be
> sent in-order to some external block (e.g. crypto engine) which responds in
> order (possibly to an atomic or parallel queue) and then one or more SW
> processing stages continue the processing and the packets are eventually
> enqueued in-order (with the help of some reordering) on the transmit queue.
>
> In-order is the natural state, only when there is parallelism involved is
> there a risk of order being lost. Ingress-to-egress order over parallel
> processing segments (in HW or in SW) is maintained by scheduling packets
> from ordered or parallel queues.
>
>
>
>>
>> On the final point, we're not talking about ordering on the wire (that's
>> clearly unpredictable and in any case beyond the scope of ODP).  But within
>> a single ODP application order preservation from ingress to egress, while
>> allowing parallelism in between, would seem to be one of the main design
>> points for ODP.
>>
> Please find a specific scenario where a chain of ordered processing stages
> will not maintain global ingress-to-egress ordering.
>

Consider IP fragment reassembly in the context of flow handling.  Not every
packet is fragmented, and since fragments are self-identifying they can be
processed in parallel.  Once reassembled they can be assigned their proper
sequence and sync up with other (non-fragmented) packets from the same flow
already in flight.  So one could imagine the order being intermixed with
parallel stages (for the reassembly) that then gets merged back into the
ordered processing.  It's this downstream merge back onto ordered queues
that does the trick.


>
> -- Ola
>
>
>>
>> Bill
>>
>> On Mon, Oct 20, 2014 at 10:02 AM, Ola Liljedahl <[email protected]
>> > wrote:
>>
>>> Bill, some spelling errors... I was listing the current calls that
>>> release the scheduling lock for a queue.
>>>
>>> * odp_queue_end() was supposed to be odp_queue_enq()  (q is a d upside
>>> down, maybe I am getting dyslectic?)
>>> * odp_queue_end_multi() was of course odp_queue_enq_multi().  Not sure
>>> how this call would work with atomic or ordered queues (all buffers must
>>> come from same queue), I guess it can degenerate into only returning one
>>> buffer at a time.
>>> * odp_buffer_free(odp_buffer_t buf)
>>> * odp_schedule_release(odp_buffer_t buf)  <--- new call
>>> (or odp_schedule_release_atomic() morphed)
>>>
>>> -- Ola
>>>
>>>
>>>
>>>
>>> On 20 October 2014 16:50, Bill Fischofer <[email protected]>
>>> wrote:
>>>
>>>> Thanks, Ola.  I need to think about this and respond more carefully,
>>>> but in the meantime could you propose the syntax/semantics of
>>>> odp_queue_end(),  odp_queue_end_multi(), and odp_schedule_release() in a
>>>> bit more detail?
>>>>
>>>> These seem to be new APIs and we need to be clear about their proposed
>>>> semantics and intended use.
>>>>
>>>> Thanks.
>>>>
>>>> Bill
>>>>
>>>> On Mon, Oct 20, 2014 at 9:40 AM, Ola Liljedahl <
>>>> [email protected]> wrote:
>>>>
>>>>> On 17 October 2014 10:01, Alexandru Badicioiu <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Bill, check my thoughts inline.
>>>>>> Thanks,
>>>>>> Alex
>>>>>>
>>>>>> On 17 October 2014 03:31, Bill Fischofer <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Based on discussions we had yesterday and today, I'd like to outline
>>>>>>> the open issues regarding queues and synchronization/scheduling models.
>>>>>>> We'd like to get consensus on this in time for next week's Tuesday call.
>>>>>>>
>>>>>>> ODP identifies three different synchronization/scheduling models for
>>>>>>> queues: Parallel, Atomic, and Ordered.  Here are my current 
>>>>>>> understandings
>>>>>>> of what these mean:
>>>>>>>
>>>>>>>    - Parallel: Buffers on a parallel queue can be dequeued by the
>>>>>>>    scheduler for any caller without restriction.  This permits maximum
>>>>>>>    scale-out and concurrency for events that are truly independent.
>>>>>>>
>>>>>>>
>>>>>>>    - Atomic: Buffers on an atomic queue can be dequeued by the
>>>>>>>    scheduler for any caller. However, only one buffer from an atomic 
>>>>>>> queue may
>>>>>>>    be in process at any given time. When the scheduler dequeues a 
>>>>>>> buffer from
>>>>>>>    an atomic queue, the queue is locked and cannot dequeue further 
>>>>>>> buffers
>>>>>>>    until it is released.  Releasing an atomic queue can occur in two 
>>>>>>> ways:
>>>>>>>
>>>>>>>
>>>>>>>    - The dequeued buffer is enqueued to another queue via an
>>>>>>>       odp_queue_enq() call. This action implicitly unlocks the atomic 
>>>>>>> queue the
>>>>>>>       buffer was sourced from.  Note that this is the most common way 
>>>>>>> in which
>>>>>>>       atomic queues are unlocked.
>>>>>>>
>>>>>>>
>>>>>>>    - A call is made to odp_schedule_release_atomic() for the locked
>>>>>>>       queue.  This tells the scheduler that the queue's atomicity 
>>>>>>> guarantee is
>>>>>>>       deemed satisfied by the application and the queue is free to 
>>>>>>> dequeue items
>>>>>>>       to other scheduler callers. This method MUST be used if the 
>>>>>>> caller consumes
>>>>>>>       the buffer (e.g., frees it instead of enqueues it to another 
>>>>>>> queue) and MAY
>>>>>>>       be used as a performance optimization if the caller is done with 
>>>>>>> any
>>>>>>>       references to data that was serialized by the queue (e.g., the 
>>>>>>> queue's
>>>>>>>       context).  It is an application programming error to release a 
>>>>>>> queue
>>>>>>>       prematurely as references subsequent to the release will not be
>>>>>>>       synchronized.
>>>>>>>
>>>>>>> Third way - odp_buffer_free() called on a buffer which was dequeued
>>>>>> from an atomic queue.
>>>>>>
>>>>> odp_queue_end(): implicit release of atomic queue
>>>>> odp_queue_end_multi(): ? I assume multi-calls cannot be used with
>>>>> atomic scheduling
>>>>> odp_buffer_free(): (buffer is consumed) implicit release of atomic
>>>>> queue
>>>>> odp_schedule_release_atomic(): explicit release of atomic queue
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>    - Ordered: Buffers on an ordered queue can be dequeued by the
>>>>>>>    scheduler for any caller, however buffers on an ordered queue retain
>>>>>>>    knowledge of their sequence on their source queue and this sequence 
>>>>>>> will be
>>>>>>>    restored whenever they are enqueued to a subsequent ordered queue.  
>>>>>>> That
>>>>>>>    is, if ordered queue A contains buffers A1, A2, and A3, and these are
>>>>>>>    dequeued for processing by three different threads, then when they 
>>>>>>> are
>>>>>>>    subsequently enqueued to another ordered queue B by these threads, 
>>>>>>> they
>>>>>>>    will appear on B as B1, B2, and B3 regardless of the order in which 
>>>>>>> their
>>>>>>>    processing threads issued odp_queue_enq() calls to place them on B.
>>>>>>>
>>>>>>> Why ordering has to be restored only if the destination is an
>>>>>> ordered queue? I think it should be restored regardless of the 
>>>>>> destination
>>>>>> queue type. Also it should be restored if there are multiple destination
>>>>>> queues, if the implementation supports it.
>>>>>>
>>>>> Atomic queues are also ordered queues.
>>>>>
>>>>> If you enqueue a packet from an atomic or ordered queue onto a
>>>>> parallel queue, ordering would normally be lost. If there was ever only 
>>>>> one
>>>>> thread scheduling packets from this parallel queue (how do you guarantee
>>>>> that?), then you could argue that ordering is still maintained. But this
>>>>> seems like a fragile situation.
>>>>>
>>>>> Nothing would stop an implementation from always restoring order for
>>>>> packets scheduled from ordered queues when enqueuing them.
>>>>>
>>>>>
>>>>> Implicit in these definitions is the fact that all queues associated
>>>>>>> with odp_pktio_t objects are ordered queues.
>>>>>>>
>>>>>> Why is this implicit? The order restoration happens when buffers are
>>>>>> enqueued to the destination queue(s). Aren't the queues associated with
>>>>>> odp_pktio_t the first queues seen by a packet? If an pktio queue is
>>>>>> parallel, there is no requirement at all to ensure any ordering at next
>>>>>> enqueue.
>>>>>>
>>>>> Packet I/O ingress queues may be atomic and thus also ordered in some
>>>>> sense.
>>>>> I think the application should decide what ordering requirements there
>>>>> are for packet I/O ingress and egress queues.
>>>>>
>>>>>
>>>>>
>>>>>>> First question: Are these definitions accurate and complete?  If
>>>>>>> not, then what are the correct definitions for these types we wish to
>>>>>>> define?  Assuming these are correct, then there are several areas in ODP
>>>>>>> API that seem to need refinement:
>>>>>>>
>>>>>>>    - It seems that ODP buffers need at least two additional pieces
>>>>>>>    of system meta data that are missing:
>>>>>>>
>>>>>>>
>>>>>>>    - Buffers need to have a last_queue that is the odp_queue_t of
>>>>>>>       the last queue they were dequeued from.  Needed so that 
>>>>>>> odp_queue_enq() can
>>>>>>>       unlock a previous atomic queue.
>>>>>>>
>>>>>>>
>>>>>>>    - Buffers need to retain sequence knowledge from the first
>>>>>>>       ordered queue they were sourced from (i.e., their ingress 
>>>>>>> odp_pktio_t) so
>>>>>>>       this can be used for order restoration as they are enqueued to 
>>>>>>> downstream
>>>>>>>       ordered queues
>>>>>>>
>>>>>>>
>>>>>>>    - odp_schedule_release_atomic() currently takes a void argument
>>>>>>>    and this is ambiguous. It should either take an odp_queue_t which is 
>>>>>>> the
>>>>>>>    atomic queue that is to be unlocked or else an odp_buffer_t and use 
>>>>>>> that
>>>>>>>    buffer's last_queue to find the queue to be unlocked. Taking an
>>>>>>>    odp_buffer_t argument would seem to be more consistent.
>>>>>>>
>>>>>>> The idea is probably that the ODP implementation knows which queue
>>>>> is associated with each thread. Better that the implementation keeps track
>>>>> of this than the application. As both odp_queue_end() and 
>>>>> odp_buffer_free()
>>>>> takes the buffer as parameter, I think it makes sense to specify the 
>>>>> buffer
>>>>> also to the odp_schedule_release() call (use the same call for atomic and
>>>>> ordered scheduling).
>>>>>
>>>>> odp_schedule.h:
>>>>> void odp_schedule_release(odp_buffer_t buf);
>>>>>
>>>>>
>>>>> I think any argument is superfluous for odp_schedule_release_atomic ,
>>>>>> if the application really needs to continue the buffer processing outside
>>>>>> the atomic context. Otherwise the context release happens on free. There
>>>>>> can be only one atomic context associated with a thread, at any given
>>>>>> moment, and the scheduler implementation has to keep track of it. Of 
>>>>>> course
>>>>>> when freeing the buffer the free call has to know that the atomic context
>>>>>> of that buffer has been previously released.
>>>>>>
>>>>> An application could store a packet on some internal data structure
>>>>> and continue processing later. Thus the packet would be "consumed" from a
>>>>> scheduler perspective and the application wants to inform the scheduler of
>>>>> this and unlock the corresponding queue.
>>>>>
>>>>>
>>>>>
>>>>>>>    - Given the above definition of ordered queues, it would seem
>>>>>>>    that there needs to be some ordered equivalent to
>>>>>>>    odp_schedule_release_atomic() to handle the case where a buffer from 
>>>>>>> an
>>>>>>>    ordered queue is consumed rather than propagated.  This is to avoid
>>>>>>>    creating un-fillable gaps in the ordering sequence downstream.
>>>>>>>
>>>>>>> Any consumption will end up with calling odp_buffer_free() and free
>>>>>> has to inform the order restoration logic. Do we really need to extract a
>>>>>> buffer from the ordered flow before the consumption?
>>>>>>
>>>>> You have no idea when the buffer will be freed. We don't want to pause
>>>>> scheduling (for the involved queue) for an indeterminate time.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Second question: If an application generates packets (via
>>>>>>> odp_packet_alloc()), how are these sequenced with respect to other 
>>>>>>> packets
>>>>>>> on downstream ordered queues?  Same question for cloned/copied packets.
>>>>>>>
>>>>>>> Clones/copies share the same sequencing information. The same for
>>>>>> fragments. If locally generated packets have to be inserted in an ordered
>>>>>> flow they have to explicitly request a sequencing information relevant to
>>>>>> source queue at the moment of insertion. Then they can be enqueued/freed
>>>>>> similarly as the other buffers in the flow.
>>>>>>
>>>>>>
>>>>>>> Third question: If an application takes packets from a source
>>>>>>> ordered queue and enqueues them to different target ordered queues, how 
>>>>>>> is
>>>>>>> sequencing handled here since by definition there are gaps in the
>>>>>>> individual downstream ordered queues.  The simplest example is a switch 
>>>>>>> or
>>>>>>> router that takes packets from one odp_pktio_t and sends them to 
>>>>>>> multiple
>>>>>>> target odp_pktio_ts. A similar question arises for packets sourced from
>>>>>>> multiple input ordered queues going to the same target ordered queue.
>>>>>>>
>>>>>> I think that order is important to be maintained per flow, not the
>>>>>> absolute one between unrelated flows.
>>>>>> I think type of the destination queues and their number also may have
>>>>>> no importance. Ordering should not be associated with a particular
>>>>>> destination, is rather a source defined thing. Ordering works between two
>>>>>> points - definition point (where the order is observed) and restoration
>>>>>> point , which can be rather logical (e.g. next enqueue) than physical (a
>>>>>> given queue). Order definition point is usually a queue. If it's feasible
>>>>>> for the HW that the order can be observed across multiple queues, then
>>>>>> ordering can work this way too. Maybe we can view the API this way - 
>>>>>> define
>>>>>> order definition/restoration points and associate queues with them.
>>>>>>
>>>>> Why would you maintain order across multiple queues? I don't think we
>>>>> are interested in global packet ordering, just per-flow. And a queue is a
>>>>> proxy for a flow.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> Fourth question: How does this intersect with flow identification
>>>>>>> from the classifier?  It would seem that the classifier should override 
>>>>>>> the
>>>>>>> raw packet sequence and re-write this information as a flow sequence 
>>>>>>> which
>>>>>>> would be honored as the ordering sequence by subsequent downstream 
>>>>>>> ordered
>>>>>>> queues.  Note that flows would still have the same downstream gap 
>>>>>>> issues if
>>>>>>> they are enqueued to multiple downstream ordered queues.  This would
>>>>>>> definitely arise in multihoming support for SCTP, though this example is
>>>>>>> not an ODP v1.0 consideration.
>>>>>>>
>>>>>> End-to-end packet order is not guaranteed anyway so termination
>>>>> software at the end must always handle misordered (and missing) packets.
>>>>>
>>>>>
>>>>>
>>>>>>> We may not have a complete set of answers to all of these questions
>>>>>>> for ODP v1.0 but we need to be precise about what is and is not done for
>>>>>>> them in ODP v1.0 so that we can do accurate testing for v1.0 as well as
>>>>>>> identify the areas that need further work next year as we move beyond 
>>>>>>> v1.0.
>>>>>>>
>>>>>>> Thanks for your thoughts and suggestions on these.  If there are
>>>>>>> additional questions along these lines feel free to add them, but I'd
>>>>>>> really like to scope this discussion to what is in ODP v1.0.
>>>>>>>
>>>>>>> Bill
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> lng-odp mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> [email protected]
>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling models

Reply via email to