"I agree it makes sense for atomic queues to be implicitly released by subsequent queueing operations on buffers taken from them. But that sort of reinforces the notion that one of the pieces of buffer meta data needed is the last_queue that the buffer was on so that this release can take place." [gby] I agree but would prefer if we don't force that piece of meta data to be the last_queue – it can be a "ticket" which a HW queue scheduler knows to do the right thing with without being able to answer the question (from which queue you came). Thanks, Gilad
From: [email protected] [mailto:[email protected]] On Behalf Of Bill Fischofer Sent: Monday, October 20, 2014 8:10 PM To: Ola Liljedahl Cc: lng-odp-forward Subject: Re: [lng-odp] [ARCH DESIGN] Queues and Synchronization/Scheduling models Ok, thanks for the clarification. I agree it makes sense for atomic queues to be implicitly released by subsequent queueing operations on buffers taken from them. But that sort of reinforces the notion that one of the pieces of buffer meta data needed is the last_queue that the buffer was on so that this release can take place. [gby] I agree but would prefer if we don't force that piece of meta data to be the last_queue – it can be a "ticket" which a HW queue scheduler knows to do the right thing with without being able to answer the question (from which queue you came). Thanks, Gilad Gilad Ben-Yossef Software Architect EZchip Technologies Ltd. 37 Israel Pollak Ave, Kiryat Gat 82025 ,Israel Tel: +972-4-959-6666 ext. 576, Fax: +972-8-681-1483 Mobile: +972-52-826-0388, US Mobile: +1-973-826-0388 Email: [email protected]<mailto:[email protected]>, Web: http://www.ezchip.com<http://www.ezchip.com/> "Ethernet always wins." — Andy Bechtolsheim Regarding odp_buffer_enq_multi(), the question is whether the queue types (parallel/atomic/ordered) imply the use of a scheduler or if they are intrinsic to the queue itself. While we could say that these are only applicable to scheduled queues, the fact that many of these operations are triggered by queue APIs suggest that this is not necessary. For example, a polled queue could easily be parallel, or atomic, or ordered at essentially no additional cost. We might want to rename odp_schedule_release_atomic() to something like odp_buffer_release_atomic() to make that clear, especially since odp_schedule_release_atomic() is currently defined to take a void argument when it would seem to need an odp_buffer_t argument for completeness. Having these be applicable only to scheduled queues for ODP v1.0 is probably simpler at this point, but something we may need to revisit post v1.0, especially as we move away from the notion of a monolithic scheduler to something more parameterized. If we have multiple schedulers acting on queues then it becomes clear that these operations are inherent to the queues themselves and not just the scheduler. Regarding atomic queues being ordered then by that definition we currently do not implement atomic queues. If atomic queues are ordered then if Threads T1 and T2 each dequeue a buffer from an atomic queue A in the order A1 and A2 and then T1 issues an odp_schedule_release_atomic() call that unlocks A (allowing T2 to obtain A2 before T1 has disposed of A1) then a downstream queue would need to ensure that A1 appears first even if T2 enqueues A2 before T1 enqueues A1. We do not currently ensure this. If Atomic does not imply Ordered then all Atomic is really doing is protecting the queue context from parallel access. This in itself seems useful and need not be combined with ordering semantics. In an earlier thread on this topic it was suggested that perhaps we'd want to have an ORDERED_ATOMIC (or ATOMIC_ORDERED) queue scheduling type in addition to ATOMIC for when such semantics are needed. I could see that making sense. The question of order preservation through multiple queueing levels seems highly germane and necessary for predictable behavior. If we take the above proposed definition then it is quite precise. Buffer order propagates from one ordered queue to the next, ignoring non-ordered intermediates. This relaxation should facilitate implementations where ordering must be handled in SW since it means that HW offloads that may not in themselves handle ordering as ODP defines it could be accommodated. The only ambiguity is how are gaps handled? We have that question independent of intermediates. Perhaps an odp_queue_release_order(buf) might be a good counterpart to odp_queue_release_atomic(buf) ? The former call would say that the sequence owned by the referenced buffer should be deemed filled so that it no longer blocks subsequent buffers originating from the same ordered queue. Regarding the point of queues being proxies for a flow, even in that model a flow will be proxied by multiple queues unless you require that processing occur in a single stage. For applications structured in a more pipelined fashion (which can occur even with single-stages that make use of HW offload that involves reschedules) you have multiple queues representing the same flow at different stages of processing (and possibly having unique per-stage per-flow contexts associated with them). So maintaining end-to-end order on a per-flow basis is still required. On the final point, we're not talking about ordering on the wire (that's clearly unpredictable and in any case beyond the scope of ODP). But within a single ODP application order preservation from ingress to egress, while allowing parallelism in between, would seem to be one of the main design points for ODP. Bill On Mon, Oct 20, 2014 at 10:02 AM, Ola Liljedahl <[email protected]<mailto:[email protected]>> wrote: Bill, some spelling errors... I was listing the current calls that release the scheduling lock for a queue. * odp_queue_end() was supposed to be odp_queue_enq() (q is a d upside down, maybe I am getting dyslectic?) * odp_queue_end_multi() was of course odp_queue_enq_multi(). Not sure how this call would work with atomic or ordered queues (all buffers must come from same queue), I guess it can degenerate into only returning one buffer at a time. * odp_buffer_free(odp_buffer_t buf) * odp_schedule_release(odp_buffer_t buf) <--- new call (or odp_schedule_release_atomic() morphed) -- Ola On 20 October 2014 16:50, Bill Fischofer <[email protected]<mailto:[email protected]>> wrote: Thanks, Ola. I need to think about this and respond more carefully, but in the meantime could you propose the syntax/semantics of odp_queue_end(), odp_queue_end_multi(), and odp_schedule_release() in a bit more detail? These seem to be new APIs and we need to be clear about their proposed semantics and intended use. Thanks. Bill On Mon, Oct 20, 2014 at 9:40 AM, Ola Liljedahl <[email protected]<mailto:[email protected]>> wrote: On 17 October 2014 10:01, Alexandru Badicioiu <[email protected]<mailto:[email protected]>> wrote: Hi Bill, check my thoughts inline. Thanks, Alex On 17 October 2014 03:31, Bill Fischofer <[email protected]<mailto:[email protected]>> wrote: Based on discussions we had yesterday and today, I'd like to outline the open issues regarding queues and synchronization/scheduling models. We'd like to get consensus on this in time for next week's Tuesday call. ODP identifies three different synchronization/scheduling models for queues: Parallel, Atomic, and Ordered. Here are my current understandings of what these mean: * Parallel: Buffers on a parallel queue can be dequeued by the scheduler for any caller without restriction. This permits maximum scale-out and concurrency for events that are truly independent. * Atomic: Buffers on an atomic queue can be dequeued by the scheduler for any caller. However, only one buffer from an atomic queue may be in process at any given time. When the scheduler dequeues a buffer from an atomic queue, the queue is locked and cannot dequeue further buffers until it is released. Releasing an atomic queue can occur in two ways: * The dequeued buffer is enqueued to another queue via an odp_queue_enq() call. This action implicitly unlocks the atomic queue the buffer was sourced from. Note that this is the most common way in which atomic queues are unlocked. * A call is made to odp_schedule_release_atomic() for the locked queue. This tells the scheduler that the queue's atomicity guarantee is deemed satisfied by the application and the queue is free to dequeue items to other scheduler callers. This method MUST be used if the caller consumes the buffer (e.g., frees it instead of enqueues it to another queue) and MAY be used as a performance optimization if the caller is done with any references to data that was serialized by the queue (e.g., the queue's context). It is an application programming error to release a queue prematurely as references subsequent to the release will not be synchronized. Third way - odp_buffer_free() called on a buffer which was dequeued from an atomic queue. odp_queue_end(): implicit release of atomic queue odp_queue_end_multi(): ? I assume multi-calls cannot be used with atomic scheduling odp_buffer_free(): (buffer is consumed) implicit release of atomic queue odp_schedule_release_atomic(): explicit release of atomic queue * Ordered: Buffers on an ordered queue can be dequeued by the scheduler for any caller, however buffers on an ordered queue retain knowledge of their sequence on their source queue and this sequence will be restored whenever they are enqueued to a subsequent ordered queue. That is, if ordered queue A contains buffers A1, A2, and A3, and these are dequeued for processing by three different threads, then when they are subsequently enqueued to another ordered queue B by these threads, they will appear on B as B1, B2, and B3 regardless of the order in which their processing threads issued odp_queue_enq() calls to place them on B. Why ordering has to be restored only if the destination is an ordered queue? I think it should be restored regardless of the destination queue type. Also it should be restored if there are multiple destination queues, if the implementation supports it. Atomic queues are also ordered queues. If you enqueue a packet from an atomic or ordered queue onto a parallel queue, ordering would normally be lost. If there was ever only one thread scheduling packets from this parallel queue (how do you guarantee that?), then you could argue that ordering is still maintained. But this seems like a fragile situation. Nothing would stop an implementation from always restoring order for packets scheduled from ordered queues when enqueuing them. Implicit in these definitions is the fact that all queues associated with odp_pktio_t objects are ordered queues. Why is this implicit? The order restoration happens when buffers are enqueued to the destination queue(s). Aren't the queues associated with odp_pktio_t the first queues seen by a packet? If an pktio queue is parallel, there is no requirement at all to ensure any ordering at next enqueue. Packet I/O ingress queues may be atomic and thus also ordered in some sense. I think the application should decide what ordering requirements there are for packet I/O ingress and egress queues. First question: Are these definitions accurate and complete? If not, then what are the correct definitions for these types we wish to define? Assuming these are correct, then there are several areas in ODP API that seem to need refinement: * It seems that ODP buffers need at least two additional pieces of system meta data that are missing: * Buffers need to have a last_queue that is the odp_queue_t of the last queue they were dequeued from. Needed so that odp_queue_enq() can unlock a previous atomic queue. * Buffers need to retain sequence knowledge from the first ordered queue they were sourced from (i.e., their ingress odp_pktio_t) so this can be used for order restoration as they are enqueued to downstream ordered queues * odp_schedule_release_atomic() currently takes a void argument and this is ambiguous. It should either take an odp_queue_t which is the atomic queue that is to be unlocked or else an odp_buffer_t and use that buffer's last_queue to find the queue to be unlocked. Taking an odp_buffer_t argument would seem to be more consistent. The idea is probably that the ODP implementation knows which queue is associated with each thread. Better that the implementation keeps track of this than the application. As both odp_queue_end() and odp_buffer_free() takes the buffer as parameter, I think it makes sense to specify the buffer also to the odp_schedule_release() call (use the same call for atomic and ordered scheduling). odp_schedule.h: void odp_schedule_release(odp_buffer_t buf); I think any argument is superfluous for odp_schedule_release_atomic , if the application really needs to continue the buffer processing outside the atomic context. Otherwise the context release happens on free. There can be only one atomic context associated with a thread, at any given moment, and the scheduler implementation has to keep track of it. Of course when freeing the buffer the free call has to know that the atomic context of that buffer has been previously released. An application could store a packet on some internal data structure and continue processing later. Thus the packet would be "consumed" from a scheduler perspective and the application wants to inform the scheduler of this and unlock the corresponding queue. * Given the above definition of ordered queues, it would seem that there needs to be some ordered equivalent to odp_schedule_release_atomic() to handle the case where a buffer from an ordered queue is consumed rather than propagated. This is to avoid creating un-fillable gaps in the ordering sequence downstream. Any consumption will end up with calling odp_buffer_free() and free has to inform the order restoration logic. Do we really need to extract a buffer from the ordered flow before the consumption? You have no idea when the buffer will be freed. We don't want to pause scheduling (for the involved queue) for an indeterminate time. Second question: If an application generates packets (via odp_packet_alloc()), how are these sequenced with respect to other packets on downstream ordered queues? Same question for cloned/copied packets. Clones/copies share the same sequencing information. The same for fragments. If locally generated packets have to be inserted in an ordered flow they have to explicitly request a sequencing information relevant to source queue at the moment of insertion. Then they can be enqueued/freed similarly as the other buffers in the flow. Third question: If an application takes packets from a source ordered queue and enqueues them to different target ordered queues, how is sequencing handled here since by definition there are gaps in the individual downstream ordered queues. The simplest example is a switch or router that takes packets from one odp_pktio_t and sends them to multiple target odp_pktio_ts. A similar question arises for packets sourced from multiple input ordered queues going to the same target ordered queue. I think that order is important to be maintained per flow, not the absolute one between unrelated flows. I think type of the destination queues and their number also may have no importance. Ordering should not be associated with a particular destination, is rather a source defined thing. Ordering works between two points - definition point (where the order is observed) and restoration point , which can be rather logical (e.g. next enqueue) than physical (a given queue). Order definition point is usually a queue. If it's feasible for the HW that the order can be observed across multiple queues, then ordering can work this way too. Maybe we can view the API this way - define order definition/restoration points and associate queues with them. Why would you maintain order across multiple queues? I don't think we are interested in global packet ordering, just per-flow. And a queue is a proxy for a flow. Fourth question: How does this intersect with flow identification from the classifier? It would seem that the classifier should override the raw packet sequence and re-write this information as a flow sequence which would be honored as the ordering sequence by subsequent downstream ordered queues. Note that flows would still have the same downstream gap issues if they are enqueued to multiple downstream ordered queues. This would definitely arise in multihoming support for SCTP, though this example is not an ODP v1.0 consideration. End-to-end packet order is not guaranteed anyway so termination software at the end must always handle misordered (and missing) packets. We may not have a complete set of answers to all of these questions for ODP v1.0 but we need to be precise about what is and is not done for them in ODP v1.0 so that we can do accurate testing for v1.0 as well as identify the areas that need further work next year as we move beyond v1.0. Thanks for your thoughts and suggestions on these. If there are additional questions along these lines feel free to add them, but I'd really like to scope this discussion to what is in ODP v1.0. Bill _______________________________________________ lng-odp mailing list [email protected]<mailto:[email protected]> http://lists.linaro.org/mailman/listinfo/lng-odp _______________________________________________ lng-odp mailing list [email protected]<mailto:[email protected]> http://lists.linaro.org/mailman/listinfo/lng-odp
_______________________________________________ lng-odp mailing list [email protected] http://lists.linaro.org/mailman/listinfo/lng-odp
