Colleagues, I have a thought experiment that might interest you.
For our point of departure, let's revisit Dmitry's "Multi-producer/multi-consumer SEH-based queue," https://groups.google.com/forum/#!topic/lock-free/yOMgHaSmowA. Here, I'm interested in the enqueue() routine, which Dmitry presented thus: void enqueue(node_t* node) { ASSERT(node); node->next_ = 0; node_t** prev = (node_t**) _InterlockedExchange((long*)&tail_, (long)node); ASSERT(prev); // <--- the window of inconsistency is HERE (***) prev[0] = node; } Let's assume the x86-64 ISA and translate this into assembly language: mov node, prev // for 'xchg' below movq $0, NEXT(node) // node->next = 0 xchg prev, (tail) // prev = XCHG (&tail, node) // window of inconsistency mov node, NEXT(prev) // prev->next = node Dmitry observed that the "window of inconsistency" is one machine instruction in length. We can see here that the one instruction is the final 'mov': if the kernel preempts the enqueue() routine after the 'xchg' and before that final 'mov', the queue suffers the trauma that requires making the corresponding dequeue() routine so complicated. What if we could tell the kernel NOT to preempt the routine in this window? To ask the same thing, what if we could direct the kernel to allow the routine to "roll forward" through the window before the routine may be preempted? Something like this: mov node, prev // for 'xchg' below movq $0, NEXT(node) // node->next = 0 xchg prev, (tail) // prev = XCHG (&tail, node) .rollforward_enter mov node, NEXT(prev) // prev->next = node .rollforward_leave Would this solve the problem? If we are using "wired" (i.e., non-pageable) memory to hold all of the nodes, then I believe so. But if we are using normally allocated memory, which can be paged out, then we face the potential problem that the write to NEXT(prev) could result in a page fault, which simply cannot be handled without an indeterminate delay (at least on Linux/x86 or xnu/x86). Thus, rollforward alone does not satisfy. What if we read NEXT(prev) before the 'xchg' to ensure that the 'prev' node was paged in? We could accomplish this by adding lines 3 and 4 below. Of course, you will point out that the node read from TAIL(queue) in line 3 may not be the same node (implicitly) read from TAIL(queue) in line 5; another thread could have enqueued a new node in the meantime. But that newly enqueued node will itself be paged in by virtue of the other thread's execution of line 2. Like this: 1: mov node, prev // for 'xchg' below 2: movq $0, NEXT(node) // node->next = 0 3: mov TAIL(queue), scratch // scratch = queue->tail 4: mov NEXT(scratch), scratch // scratch = scratch->next (ignore result) 5: xchg prev, (tail) // prev = XCHG (&tail, node) .rollforward_enter 6: mov node, NEXT(prev) // prev->next = node .rollforward_leave Doubtless you will now point out that one or both threads could be preempted between lines 2 and 5 and that during a sufficiently lengthy interruption, the operating system could page out the very 'prev' node that we had counted on being paged in as a result of having executed lines 2 through 4. This brings us to another "what if?" What if we could tell the kernel that if it preempts us between line 2 and line 5, it should "roll back" the instruction pointer to line 2 when it returns control to the preempted thread? That might result in the re-execution of lines 2 through 4 (or some subset thereof); however, like the re-execution of instructions in a CAS loop, that would not be be problematic here. More importantly, the hypothesized rollback would effectively ensure that when line 6 is executed, it will have been executed only at the the end of an uninterrupted sequence of instructions beginning with line 2. With this guarantee, we can be certain that the 'prev' node returned by the 'xchg' in line 5 will be paged in, such that the write to NEXT(prev) in line 6 will NOT generate a page fault. Like this: 1: mov node, prev // for 'xchg' below .rollback_enter 2: movq $0, NEXT(node) // node->next = 0 3: mov TAIL(queue), scratch // scratch = queue->tail 4: mov NEXT(scratch), scratch // scratch = scratch->next (ignore result) 5: xchg prev, (tail) // prev = XCHG (&tail, node) .rollforward_enter 6: mov node, NEXT(prev) // prev->next = node .rollforward_leave Problem solved? No more window of inconsistency? Again, I believe so. Of course, I don't expect you to agree without an empirical demonstration. But I am interested in any theoretical flaws that you see. Or any other comments that you might have. Thanks, Eric Grant -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/2413C3A7-19FD-4630-80BB-BE72718A8314%40eagrant.com. For more options, visit https://groups.google.com/groups/opt_out.
