On 2 December 2016 at 11:18, Bill Fischofer <bill.fischo...@linaro.org>
> This is a good discussion. The key is to recognize that the ordering
> provided by ordered queues is of contexts, not events. When the
> scheduler dispatches an event from an ordered queue, it creates an
> ordered context that the receiving thread now runs in. That context is
> ordered with respect to all other contexts originating from the same
> ordered queue and persists until the next odp_schedule() call or until
> an explicit odp_schedule_release_order() call is made, though the
> latter call is defined as simply a "hint" that may or may not actually
> release the ordered context.
> An ordered context protects all enqueues made by the thread running in
> that ordered context. That means that no matter how few or how many
> odp_queue_enq() calls that thread makes, all of these calls will be
> ordered with respect to all such calls made by other threads running
> in other ordered contexts originating from the same ordered queue.
> Within the thread itself, the relative order of the odp_queue_enq()
> calls it makes determines the order of events generated by this
> ordered context since as a single thread it can only make once such
> call at a time.
> Ordered locks add an interesting wrinkle to this because they make it
> possible for ordered critical sections to exist within the overall
> parallel processing of these ordered contexts. If an ordered queue has
> N ordered locks then there are N possible ordered synchronization
> points that can be entered within the lifespan of a given ordered
> context. Note that there is no requirement that the thread actually
> "visit" any of these points. If another thread in a later ordered
> context is waiting on ordered lock i within its context then it will
> be delayed until such time as either this thread enters and then exits
> ordered lock i, or it releases its ordered context altogether, since
> this would provide definitive proof that it can never attempt to enter
> that lock at some later point.
> One of the key differences between the current order queue code and
> Matias' patch is the question of when the internal structure (queue
> element) is inspected when performing an odp_queue_enq() call. Because
> queue elements are shared objects, they are protected by an internal
> lock that must be acquired prior to manipulating that structure. The
> current code acquires that lock as part of the entry processing to
> odp_queue_enq() and then determines whether this thread's context
> represents the current queue order to determine whether the enqueue
> operation can be completed. If not, then the event is placed in the
> ordered queue's reorder queue where it will be processed at some later
> time when the queue's order "catches up" to this thread's order. In
> Matias' code, by contrast, odp_queue_enq() first determines if its
> order is current and if not it does not lock the queue element but
> rather temporarily stores the event into a local stash. The advantage
> here is that we avoid locking the queue element if we can't really
> complete the operation.
Thanks Bill for the explanation, helped me in learning and understanding
Matias new implementation. Caching in thread local to avoid queue
locking/unlocking is good idea.
> The disadvantage is that the read cannot
> release its order until its stash has been emptied, which means that
> it must stall rather than being able to continue processing another
> ordered context when attempting to release its current ordered
Yes, here is also my concern and comments after reviewing the patchset, a
thread schedules events from one ordered queue puts itself into a "line"
and stalling. Will not duplicate in replying since Bill commented already.
> I suspect further performance improvement can be made by combining
> these two ideas such that out of order enqueues are stashed locally
> and then flushed to the origin ordered queue's reorder queue upon
> order release as needed to avoid context release blockage. The real
> question is how many enqueues are expected within a given ordered
> context? The assumption has been that the average would be close to
> one, but obviously a stash becomes more beneficial the larger that
> number becomes.
> On Thu, Dec 1, 2016 at 1:32 AM, Elo, Matias (Nokia - FI/Espoo)
> <matias....@nokia-bell-labs.com> wrote:
> > On 1 Dec 2016, at 6:58, Yi He <yi...@linaro.org> wrote:
> > Hi, thanks Matias
> > The example is very helpful, one question follow it:
> > Previously I understand atomic queue like multiple producer/single
> > behaviour, so producers (enqueue threads) can still run in parallel?
> > Yes. This actually touches another subject which is queue thread safety.
> > odp_queue_create() takes 'odp_queue_param_t *param’ argument, which
> > enqueue and dequeue modes (ODP_QUEUE_OP_MT (default) /
> > ODP_QUEUE_OP_MT_UNSAFE / ODP_QUEUE_OP_DISABLED). Currently, there is
> > internal locking when doing enqueue in linux-generic so the argument has
> > real effect.
> > But in the example the two producer threads behaves like sequentially
> > enqueueing atomic queue?
> > A1, A2 (new allocated), A0 (original), B1, B2 (new allocated), B0
> > The threads processing events from an ordered queue have to synchronise
> > their enqueue operations somehow to guarantee event order in the atomic
> > queue. How this is actually done, is ODP implementation dependent.
> > For example in the new ordered queue implementation I try to avoid
> > synchronising/blocking as long as possible by locally caching enqueue
> > operations if it isn’t threads turn to do enqueue.
> > -Matias
> > Best Regards, Yi
> > On 28 November 2016 at 20:40, Elo, Matias (Nokia - FI/Espoo)
> > <matias....@nokia-bell-labs.com> wrote:
> >> On 28 Nov 2016, at 12:19, Yi He <yi...@linaro.org> wrote:
> >> Thanks Matias
> >> Very clearly documented:
> >> “
> >> * Event ordering is based on the
> >> * received event(s), but also other (newly allocated or stored) events
> >> are
> >> * ordered when enqueued within the same ordered context.
> >> “
> >> One question is that for scheduled events, their order are very clear
> >> determined by their original sequence in the receiving queue, but
> what's the
> >> relationship and sequence between new allocated events with the received
> >> ones, especially if the allocation happens in multiple threads.
> >> The events enqueued within the same ordered context should be received
> >> the same order as they were enqueued = FIFO (i.e. if the newly allocated
> >> events where enqueued before the original event, the new events will be
> >> received first). The order holds for the whole ordered context.
> >> For example:
> >> Two threads (A & B) schedule work from the same ordered queue and
> >> processed events to a common atomic queue . Both threads call
> >> and receive a single event -> A gets ordered context 1 and B gets 2.
> >> processing the event, thread A allocates two new events (A1&A2).
> Finally, it
> >> enqueues them to the atomic queue in the following order: A1, A2, A0
> >> (original event). Thread B does the same thing (B1, B2, B0).
> >> As the atomic context holds for the whole context the events are
> >> in the atomic queue in the following order: A1, A2, A0, B1, B2, B0,
> >> independent of which of threads performs the enqueue operations first.
> >> -Matias