Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling models

Gilad Ben Yossef Tue, 21 Oct 2014 05:12:07 -0700


"I agree it makes sense for atomic queues to be implicitly released by 
subsequent queueing operations on buffers taken from them.  But that sort of 
reinforces the notion that one of the pieces of buffer meta data needed is the 
last_queue that the buffer was on so that this release can take place."
[gby] I agree but would prefer if we don't force that piece of meta data to be 
the last_queue – it can be a "ticket" which a HW queue scheduler knows to do 
the right thing with without being able to answer the question (from which 
queue you came).
Thanks,
Gilad

From: [email protected] 
[mailto:[email protected]] On Behalf Of Bill Fischofer
Sent: Monday, October 20, 2014 8:10 PM
To: Ola Liljedahl
Cc: lng-odp-forward
Subject: Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling 
models

Ok, thanks for the clarification.

I agree it makes sense for atomic queues to be implicitly released by 
subsequent queueing operations on buffers taken from them.  But that sort of 
reinforces the notion that one of the pieces of buffer meta data needed is the 
last_queue that the buffer was on so that this release can take place.
[gby] I agree but would prefer if we don't force that piece of meta data to be 
the last_queue – it can be a "ticket" which a HW queue scheduler knows to do 
the right thing with without being able to answer the question (from which 
queue you came).
Thanks,
Gilad

Gilad Ben-Yossef
Software Architect
EZchip Technologies Ltd.
37 Israel Pollak Ave, Kiryat Gat 82025 ,Israel
Tel: +972-4-959-6666 ext. 576, Fax: +972-8-681-1483
Mobile: +972-52-826-0388, US Mobile: +1-973-826-0388
Email: [email protected]<mailto:[email protected]>, Web: 
http://www.ezchip.com<http://www.ezchip.com/>

"Ethernet always wins."
        — Andy Bechtolsheim

Regarding odp_buffer_enq_multi(), the question is whether the queue types 
(parallel/atomic/ordered) imply the use of a scheduler or if they are intrinsic 
to the queue itself.  While we could say that these are only applicable to 
scheduled queues, the fact that many of these operations are triggered by queue 
APIs suggest that this is not necessary.  For example, a polled queue could 
easily be parallel, or atomic, or ordered at essentially no additional cost.  
We might want to rename odp_schedule_release_atomic() to something like 
odp_buffer_release_atomic() to make that clear, especially since 
odp_schedule_release_atomic() is currently defined to take a void argument when 
it would seem to need an odp_buffer_t argument for completeness.

Having these be applicable only to scheduled queues for ODP v1.0 is probably 
simpler at this point, but something we may need to revisit post v1.0, 
especially as we move away from the notion of a monolithic scheduler to 
something more parameterized.  If we have multiple schedulers acting on queues 
then it becomes clear that these operations are inherent to the queues 
themselves and not just the scheduler.

Regarding atomic queues being ordered then by that definition we currently do 
not implement atomic queues.  If atomic queues are ordered then if Threads T1 
and T2 each dequeue a buffer from an atomic queue A in the order A1 and A2 and 
then T1 issues an odp_schedule_release_atomic() call that unlocks A (allowing 
T2 to obtain A2 before T1 has disposed of A1) then a downstream queue would 
need to ensure that A1 appears first even if T2 enqueues A2 before T1 enqueues 
A1.  We do not currently ensure this.  If Atomic does not imply Ordered then 
all Atomic is really doing is protecting the queue context from parallel 
access. This in itself seems useful and need not be combined with ordering 
semantics.

In an earlier thread on this topic it was suggested that perhaps we'd want to 
have an ORDERED_ATOMIC (or ATOMIC_ORDERED) queue scheduling type in addition to 
ATOMIC for when such semantics are needed.  I could see that making sense.

The question of order preservation through multiple queueing levels seems 
highly germane and necessary for predictable behavior.  If we take the above 
proposed definition then it is quite precise.  Buffer order propagates from one 
ordered queue to the next, ignoring non-ordered intermediates.  This relaxation 
should facilitate implementations where ordering must be handled in SW since it 
means that HW offloads that may not in themselves handle ordering as ODP 
defines it could be accommodated.  The only ambiguity is how are gaps handled?  
We have that question independent of intermediates.  Perhaps an 
odp_queue_release_order(buf) might be a good counterpart to 
odp_queue_release_atomic(buf) ?  The former call would say that the sequence 
owned by the referenced buffer should be deemed filled so that it no longer 
blocks subsequent buffers originating from the same ordered queue.

Regarding the point of queues being proxies for a flow, even in that model a 
flow will be proxied by multiple queues unless you require that processing 
occur in a single stage.  For applications structured in a more pipelined 
fashion (which can occur even with single-stages that make use of HW offload 
that involves reschedules) you have multiple queues representing the same flow 
at different stages of processing (and possibly having unique per-stage 
per-flow contexts associated with them).  So maintaining end-to-end order on a 
per-flow basis is still required.

On the final point, we're not talking about ordering on the wire (that's 
clearly unpredictable and in any case beyond the scope of ODP).  But within a 
single ODP application order preservation from ingress to egress, while 
allowing parallelism in between, would seem to be one of the main design points 
for ODP.

Bill

On Mon, Oct 20, 2014 at 10:02 AM, Ola Liljedahl 
<[email protected]<mailto:[email protected]>> wrote:
Bill, some spelling errors... I was listing the current calls that release the 
scheduling lock for a queue.

* odp_queue_end() was supposed to be odp_queue_enq()  (q is a d upside down, 
maybe I am getting dyslectic?)
* odp_queue_end_multi() was of course odp_queue_enq_multi().  Not sure how this 
call would work with atomic or ordered queues (all buffers must come from same 
queue), I guess it can degenerate into only returning one buffer at a time.
* odp_buffer_free(odp_buffer_t buf)
* odp_schedule_release(odp_buffer_t buf)  <--- new call (or 
odp_schedule_release_atomic() morphed)

-- Ola

On 20 October 2014 16:50, Bill Fischofer 
<[email protected]<mailto:[email protected]>> wrote:
Thanks, Ola.  I need to think about this and respond more carefully, but in the 
meantime could you propose the syntax/semantics of odp_queue_end(),  
odp_queue_end_multi(), and odp_schedule_release() in a bit more detail?

These seem to be new APIs and we need to be clear about their proposed 
semantics and intended use.

Thanks.

Bill

On Mon, Oct 20, 2014 at 9:40 AM, Ola Liljedahl 
<[email protected]<mailto:[email protected]>> wrote:
On 17 October 2014 10:01, Alexandru Badicioiu 
<[email protected]<mailto:[email protected]>> wrote:
Hi Bill, check my thoughts inline.
Thanks,
Alex

On 17 October 2014 03:31, Bill Fischofer 
<[email protected]<mailto:[email protected]>> wrote:
Based on discussions we had yesterday and today, I'd like to outline the open 
issues regarding queues and synchronization/scheduling models.  We'd like to 
get consensus on this in time for next week's Tuesday call.

ODP identifies three different synchronization/scheduling models for queues: 
Parallel, Atomic, and Ordered.  Here are my current understandings of what 
these mean:

  *   Parallel: Buffers on a parallel queue can be dequeued by the scheduler 
for any caller without restriction.  This permits maximum scale-out and 
concurrency for events that are truly independent.

  *   Atomic: Buffers on an atomic queue can be dequeued by the scheduler for 
any caller. However, only one buffer from an atomic queue may be in process at 
any given time. When the scheduler dequeues a buffer from an atomic queue, the 
queue is locked and cannot dequeue further buffers until it is released.  
Releasing an atomic queue can occur in two ways:

     *   The dequeued buffer is enqueued to another queue via an 
odp_queue_enq() call. This action implicitly unlocks the atomic queue the 
buffer was sourced from.  Note that this is the most common way in which atomic 
queues are unlocked.

     *   A call is made to odp_schedule_release_atomic() for the locked queue.  
This tells the scheduler that the queue's atomicity guarantee is deemed 
satisfied by the application and the queue is free to dequeue items to other 
scheduler callers. This method MUST be used if the caller consumes the buffer 
(e.g., frees it instead of enqueues it to another queue) and MAY be used as a 
performance optimization if the caller is done with any references to data that 
was serialized by the queue (e.g., the queue's context).  It is an application 
programming error to release a queue prematurely as references subsequent to 
the release will not be synchronized.
Third way - odp_buffer_free() called on a buffer which was dequeued from an 
atomic queue.
odp_queue_end(): implicit release of atomic queue
odp_queue_end_multi(): ? I assume multi-calls cannot be used with atomic 
scheduling
odp_buffer_free(): (buffer is consumed) implicit release of atomic queue
odp_schedule_release_atomic(): explicit release of atomic queue

  *   Ordered: Buffers on an ordered queue can be dequeued by the scheduler for 
any caller, however buffers on an ordered queue retain knowledge of their 
sequence on their source queue and this sequence will be restored whenever they 
are enqueued to a subsequent ordered queue.  That is, if ordered queue A 
contains buffers A1, A2, and A3, and these are dequeued for processing by three 
different threads, then when they are subsequently enqueued to another ordered 
queue B by these threads, they will appear on B as B1, B2, and B3 regardless of 
the order in which their processing threads issued odp_queue_enq() calls to 
place them on B.
Why ordering has to be restored only if the destination is an ordered queue? I 
think it should be restored regardless of the destination queue type. Also it 
should be restored if there are multiple destination queues, if the 
implementation supports it.
Atomic queues are also ordered queues.

If you enqueue a packet from an atomic or ordered queue onto a parallel queue, 
ordering would normally be lost. If there was ever only one thread scheduling 
packets from this parallel queue (how do you guarantee that?), then you could 
argue that ordering is still maintained. But this seems like a fragile 
situation.

Nothing would stop an implementation from always restoring order for packets 
scheduled from ordered queues when enqueuing them.

Implicit in these definitions is the fact that all queues associated with 
odp_pktio_t objects are ordered queues.
Why is this implicit? The order restoration happens when buffers are enqueued 
to the destination queue(s). Aren't the queues associated with odp_pktio_t the 
first queues seen by a packet? If an pktio queue is parallel, there is no 
requirement at all to ensure any ordering at next enqueue.
Packet I/O ingress queues may be atomic and thus also ordered in some sense.
I think the application should decide what ordering requirements there are for 
packet I/O ingress and egress queues.

First question: Are these definitions accurate and complete?  If not, then what 
are the correct definitions for these types we wish to define?  Assuming these 
are correct, then there are several areas in ODP API that seem to need 
refinement:

  *   It seems that ODP buffers need at least two additional pieces of system 
meta data that are missing:

     *   Buffers need to have a last_queue that is the odp_queue_t of the last 
queue they were dequeued from.  Needed so that odp_queue_enq() can unlock a 
previous atomic queue.

     *   Buffers need to retain sequence knowledge from the first ordered queue 
they were sourced from (i.e., their ingress odp_pktio_t) so this can be used 
for order restoration as they are enqueued to downstream ordered queues

  *   odp_schedule_release_atomic() currently takes a void argument and this is 
ambiguous. It should either take an odp_queue_t which is the atomic queue that 
is to be unlocked or else an odp_buffer_t and use that buffer's last_queue to 
find the queue to be unlocked. Taking an odp_buffer_t argument would seem to be 
more consistent.
The idea is probably that the ODP implementation knows which queue is 
associated with each thread. Better that the implementation keeps track of this 
than the application. As both odp_queue_end() and odp_buffer_free() takes the 
buffer as parameter, I think it makes sense to specify the buffer also to the 
odp_schedule_release() call (use the same call for atomic and ordered 
scheduling).

odp_schedule.h:
void odp_schedule_release(odp_buffer_t buf);

I think any argument is superfluous for odp_schedule_release_atomic , if the 
application really needs to continue the buffer processing outside the atomic 
context. Otherwise the context release happens on free. There can be only one 
atomic context associated with a thread, at any given moment, and the scheduler 
implementation has to keep track of it. Of course when freeing the buffer the 
free call has to know that the atomic context of that buffer has been 
previously released.
An application could store a packet on some internal data structure and 
continue processing later. Thus the packet would be "consumed" from a scheduler 
perspective and the application wants to inform the scheduler of this and 
unlock the corresponding queue.

  *   Given the above definition of ordered queues, it would seem that there 
needs to be some ordered equivalent to odp_schedule_release_atomic() to handle 
the case where a buffer from an ordered queue is consumed rather than 
propagated.  This is to avoid creating un-fillable gaps in the ordering 
sequence downstream.
Any consumption will end up with calling odp_buffer_free() and free has to 
inform the order restoration logic. Do we really need to extract a buffer from 
the ordered flow before the consumption?
You have no idea when the buffer will be freed. We don't want to pause 
scheduling (for the involved queue) for an indeterminate time.

Second question: If an application generates packets (via odp_packet_alloc()), 
how are these sequenced with respect to other packets on downstream ordered 
queues?  Same question for cloned/copied packets.

Clones/copies share the same sequencing information. The same for fragments. If 
locally generated packets have to be inserted in an ordered flow they have to 
explicitly request a sequencing information relevant to source queue at the 
moment of insertion. Then they can be enqueued/freed similarly as the other 
buffers in the flow.

Third question: If an application takes packets from a source ordered queue and 
enqueues them to different target ordered queues, how is sequencing handled 
here since by definition there are gaps in the individual downstream ordered 
queues.  The simplest example is a switch or router that takes packets from one 
odp_pktio_t and sends them to multiple target odp_pktio_ts. A similar question 
arises for packets sourced from multiple input ordered queues going to the same 
target ordered queue.
I think that order is important to be maintained per flow, not the absolute one 
between unrelated flows.
I think type of the destination queues and their number also may have no 
importance. Ordering should not be associated with a particular destination, is 
rather a source defined thing. Ordering works between two points - definition 
point (where the order is observed) and restoration point , which can be rather 
logical (e.g. next enqueue) than physical (a given queue). Order definition 
point is usually a queue. If it's feasible for the HW that the order can be 
observed across multiple queues, then ordering can work this way too. Maybe we 
can view the API this way - define order definition/restoration points and 
associate queues with them.
Why would you maintain order across multiple queues? I don't think we are 
interested in global packet ordering, just per-flow. And a queue is a proxy for 
a flow.

Fourth question: How does this intersect with flow identification from the 
classifier?  It would seem that the classifier should override the raw packet 
sequence and re-write this information as a flow sequence which would be 
honored as the ordering sequence by subsequent downstream ordered queues.  Note 
that flows would still have the same downstream gap issues if they are enqueued 
to multiple downstream ordered queues.  This would definitely arise in 
multihoming support for SCTP, though this example is not an ODP v1.0 
consideration.
End-to-end packet order is not guaranteed anyway so termination software at the 
end must always handle misordered (and missing) packets.

We may not have a complete set of answers to all of these questions for ODP 
v1.0 but we need to be precise about what is and is not done for them in ODP 
v1.0 so that we can do accurate testing for v1.0 as well as identify the areas 
that need further work next year as we move beyond v1.0.

Thanks for your thoughts and suggestions on these.  If there are additional 
questions along these lines feel free to add them, but I'd really like to scope 
this discussion to what is in ODP v1.0.

Bill

_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
http://lists.linaro.org/mailman/listinfo/lng-odp

_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
http://lists.linaro.org/mailman/listinfo/lng-odp

_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling models

Reply via email to