On 2 December 2016 at 11:18, Bill Fischofer <bill.fischo...@linaro.org> wrote:
> This is a good discussion. The key is to recognize that the ordering > provided by ordered queues is of contexts, not events. When the > scheduler dispatches an event from an ordered queue, it creates an > ordered context that the receiving thread now runs in. That context is > ordered with respect to all other contexts originating from the same > ordered queue and persists until the next odp_schedule() call or until > an explicit odp_schedule_release_order() call is made, though the > latter call is defined as simply a "hint" that may or may not actually > release the ordered context. > > An ordered context protects all enqueues made by the thread running in > that ordered context. That means that no matter how few or how many > odp_queue_enq() calls that thread makes, all of these calls will be > ordered with respect to all such calls made by other threads running > in other ordered contexts originating from the same ordered queue. > > Within the thread itself, the relative order of the odp_queue_enq() > calls it makes determines the order of events generated by this > ordered context since as a single thread it can only make once such > call at a time. > > Ordered locks add an interesting wrinkle to this because they make it > possible for ordered critical sections to exist within the overall > parallel processing of these ordered contexts. If an ordered queue has > N ordered locks then there are N possible ordered synchronization > points that can be entered within the lifespan of a given ordered > context. Note that there is no requirement that the thread actually > "visit" any of these points. If another thread in a later ordered > context is waiting on ordered lock i within its context then it will > be delayed until such time as either this thread enters and then exits > ordered lock i, or it releases its ordered context altogether, since > this would provide definitive proof that it can never attempt to enter > that lock at some later point. > > One of the key differences between the current order queue code and > Matias' patch is the question of when the internal structure (queue > element) is inspected when performing an odp_queue_enq() call. Because > queue elements are shared objects, they are protected by an internal > lock that must be acquired prior to manipulating that structure. The > current code acquires that lock as part of the entry processing to > odp_queue_enq() and then determines whether this thread's context > represents the current queue order to determine whether the enqueue > operation can be completed. If not, then the event is placed in the > ordered queue's reorder queue where it will be processed at some later > time when the queue's order "catches up" to this thread's order. In > Matias' code, by contrast, odp_queue_enq() first determines if its > order is current and if not it does not lock the queue element but > rather temporarily stores the event into a local stash. The advantage > here is that we avoid locking the queue element if we can't really > complete the operation. Thanks Bill for the explanation, helped me in learning and understanding Matias new implementation. Caching in thread local to avoid queue locking/unlocking is good idea. > The disadvantage is that the read cannot > release its order until its stash has been emptied, which means that > it must stall rather than being able to continue processing another > ordered context when attempting to release its current ordered > context. > Yes, here is also my concern and comments after reviewing the patchset, a thread schedules events from one ordered queue puts itself into a "line" and stalling. Will not duplicate in replying since Bill commented already. > > I suspect further performance improvement can be made by combining > these two ideas such that out of order enqueues are stashed locally > and then flushed to the origin ordered queue's reorder queue upon > order release as needed to avoid context release blockage. The real > question is how many enqueues are expected within a given ordered > context? The assumption has been that the average would be close to > one, but obviously a stash becomes more beneficial the larger that > number becomes. > > On Thu, Dec 1, 2016 at 1:32 AM, Elo, Matias (Nokia - FI/Espoo) > <matias....@nokia-bell-labs.com> wrote: > > > > On 1 Dec 2016, at 6:58, Yi He <yi...@linaro.org> wrote: > > > > Hi, thanks Matias > > > > The example is very helpful, one question follow it: > > > > Previously I understand atomic queue like multiple producer/single > consumer > > behaviour, so producers (enqueue threads) can still run in parallel? > > > > > > Yes. This actually touches another subject which is queue thread safety. > > odp_queue_create() takes 'odp_queue_param_t *param’ argument, which > defines > > enqueue and dequeue modes (ODP_QUEUE_OP_MT (default) / > > ODP_QUEUE_OP_MT_UNSAFE / ODP_QUEUE_OP_DISABLED). Currently, there is > always > > internal locking when doing enqueue in linux-generic so the argument has > no > > real effect. > > > > > > But in the example the two producer threads behaves like sequentially > while > > enqueueing atomic queue? > > > > A1, A2 (new allocated), A0 (original), B1, B2 (new allocated), B0 > (original) > > > > > > > > The threads processing events from an ordered queue have to synchronise > > their enqueue operations somehow to guarantee event order in the atomic > > queue. How this is actually done, is ODP implementation dependent. > > > > For example in the new ordered queue implementation I try to avoid > > synchronising/blocking as long as possible by locally caching enqueue > > operations if it isn’t threads turn to do enqueue. > > > > -Matias > > > > > > Best Regards, Yi > > > > On 28 November 2016 at 20:40, Elo, Matias (Nokia - FI/Espoo) > > <matias....@nokia-bell-labs.com> wrote: > >> > >> > >> On 28 Nov 2016, at 12:19, Yi He <yi...@linaro.org> wrote: > >> > >> Thanks Matias > >> > >> Very clearly documented: > >> “ > >> * Event ordering is based on the > >> * received event(s), but also other (newly allocated or stored) events > >> are > >> * ordered when enqueued within the same ordered context. > >> “ > >> One question is that for scheduled events, their order are very clear > >> determined by their original sequence in the receiving queue, but > what's the > >> relationship and sequence between new allocated events with the received > >> ones, especially if the allocation happens in multiple threads. > >> > >> > >> > >> The events enqueued within the same ordered context should be received > in > >> the same order as they were enqueued = FIFO (i.e. if the newly allocated > >> events where enqueued before the original event, the new events will be > >> received first). The order holds for the whole ordered context. > >> > >> For example: > >> > >> Two threads (A & B) schedule work from the same ordered queue and > enqueue > >> processed events to a common atomic queue . Both threads call > odp_schedule() > >> and receive a single event -> A gets ordered context 1 and B gets 2. > While > >> processing the event, thread A allocates two new events (A1&A2). > Finally, it > >> enqueues them to the atomic queue in the following order: A1, A2, A0 > >> (original event). Thread B does the same thing (B1, B2, B0). > >> > >> As the atomic context holds for the whole context the events are > received > >> in the atomic queue in the following order: A1, A2, A0, B1, B2, B0, > >> independent of which of threads performs the enqueue operations first. > >> > >> -Matias > >> > > > > >