On Sun, Feb 16, 2014 at 6:58 AM, Eric Grant <[email protected]> wrote: > Colleagues, > > I have a thought experiment that might interest you. > > For our point of departure, let's revisit Dmitry's > "Multi-producer/multi-consumer SEH-based queue," > https://groups.google.com/forum/#!topic/lock-free/yOMgHaSmowA. Here, I'm > interested in the enqueue() routine, which Dmitry presented thus: > > void enqueue(node_t* node) > { > ASSERT(node); > node->next_ = 0; > node_t** prev = (node_t**) > _InterlockedExchange((long*)&tail_, (long)node); > ASSERT(prev); > // <--- the window of inconsistency is HERE (***) > prev[0] = node; > } > > Let's assume the x86-64 ISA and translate this into assembly language: > > mov node, prev // for 'xchg' below > movq $0, NEXT(node) // node->next = 0 > xchg prev, (tail) // prev = XCHG (&tail, node) > // window of inconsistency > mov node, NEXT(prev) // prev->next = node > > Dmitry observed that the "window of inconsistency" is one machine instruction > in length. We can see here that the one instruction is the final 'mov': if > the kernel preempts the enqueue() routine after the 'xchg' and before that > final 'mov', the queue suffers the trauma that requires making the > corresponding dequeue() routine so complicated. > > What if we could tell the kernel NOT to preempt the routine in this window? > To ask the same thing, what if we could direct the kernel to allow the > routine to "roll forward" through the window before the routine may be > preempted? Something like this: > > mov node, prev // for 'xchg' below > movq $0, NEXT(node) // node->next = 0 > xchg prev, (tail) // prev = XCHG (&tail, node) > > .rollforward_enter > > mov node, NEXT(prev) // prev->next = node > > .rollforward_leave > > Would this solve the problem? If we are using "wired" (i.e., non-pageable) > memory to hold all of the nodes, then I believe so. But if we are using > normally allocated memory, which can be paged out, then we face the potential > problem that the write to NEXT(prev) could result in a page fault, which > simply cannot be handled without an indeterminate delay (at least on > Linux/x86 or xnu/x86). Thus, rollforward alone does not satisfy. > > What if we read NEXT(prev) before the 'xchg' to ensure that the 'prev' node > was paged in? We could accomplish this by adding lines 3 and 4 below. Of > course, you will point out that the node read from TAIL(queue) in line 3 may > not be the same node (implicitly) read from TAIL(queue) in line 5; another > thread could have enqueued a new node in the meantime. But that newly > enqueued node will itself be paged in by virtue of the other thread's > execution of line 2. Like this: > > 1: mov node, prev // for 'xchg' below > 2: movq $0, NEXT(node) // node->next = 0 > 3: mov TAIL(queue), scratch // scratch = queue->tail > 4: mov NEXT(scratch), scratch // scratch = scratch->next (ignore > result) > 5: xchg prev, (tail) // prev = XCHG (&tail, node) > > .rollforward_enter > > 6: mov node, NEXT(prev) // prev->next = node > > .rollforward_leave > > Doubtless you will now point out that one or both threads could be preempted > between lines 2 and 5 and that during a sufficiently lengthy interruption, > the operating system could page out the very 'prev' node that we had counted > on being paged in as a result of having executed lines 2 through 4. This > brings us to another "what if?" > > What if we could tell the kernel that if it preempts us between line 2 and > line 5, it should "roll back" the instruction pointer to line 2 when it > returns control to the preempted thread? That might result in the > re-execution of lines 2 through 4 (or some subset thereof); however, like the > re-execution of instructions in a CAS loop, that would not be be problematic > here. More importantly, the hypothesized rollback would effectively ensure > that when line 6 is executed, it will have been executed only at the the end > of an uninterrupted sequence of instructions beginning with line 2. With > this guarantee, we can be certain that the 'prev' node returned by the 'xchg' > in line 5 will be paged in, such that the write to NEXT(prev) in line 6 will > NOT generate a page fault. Like this: > > 1: mov node, prev // for 'xchg' below > > .rollback_enter > > 2: movq $0, NEXT(node) // node->next = 0 > 3: mov TAIL(queue), scratch // scratch = queue->tail > 4: mov NEXT(scratch), scratch // scratch = scratch->next (ignore > result) > 5: xchg prev, (tail) // prev = XCHG (&tail, node) > > .rollforward_enter > > 6: mov node, NEXT(prev) // prev->next = node > > .rollforward_leave
Hi! How do you prevent that the data gets evicted from memory between 5 and 6? If you consider things like page faults and interrupts, I do not think that you can solve the problem of forward progress in user space. Even if you solve the problem on producer side, what if consumer hits page fault right after reading an element? It will have essentially the same effect of lack of forward progress. If you are in kernel space, then you can ensure some of the properties by disabling interrupts around the operations. Then you can be sure that each operaition finishes in N machine instructions. But even then, if you have a simple MOV instruction and the data is in memory, how long can it take? There is actually no upper bound. And this gets us back to where we started. So I argue that all that matters is common case performance. And my algorithm is already quite good from this point of view. If you go for hard real time, you need to start from designing and manufacturing own hardware. Then OS. Only then you can have guarantees that some operations will finish in bounded amount of time. Until then lock/wait-free algorithms are illusive. > Problem solved? No more window of inconsistency? Again, I believe so. > > Of course, I don't expect you to agree without an empirical demonstration. > But I am interested in any theoretical flaws that you see. Or any other > comments that you might have. > > Thanks, > Eric Grant > > -- > > --- > You received this message because you are subscribed to the Google Groups > "Scalable Synchronization Algorithms" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/lock-free/2413C3A7-19FD-4630-80BB-BE72718A8314%40eagrant.com. > For more options, visit https://groups.google.com/groups/opt_out. -- Dmitry Vyukov All about lockfree/waitfree algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/CAEeQi3s1qk%3DZ5SZOE4L5DyoyoA8niWXcaJyx4YKhAMAexstMCQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
