Colleagues,

I have a thought experiment that might interest you.

For our point of departure, let's revisit Dmitry's 
"Multi-producer/multi-consumer SEH-based queue," 
https://groups.google.com/forum/#!topic/lock-free/yOMgHaSmowA.  Here, I'm 
interested in the enqueue() routine, which Dmitry presented thus:

        void enqueue(node_t* node)
        {
                ASSERT(node);
                node->next_ = 0;
                node_t** prev = (node_t**)
                        _InterlockedExchange((long*)&tail_, (long)node);
                ASSERT(prev);
                // <--- the window of inconsistency is HERE (***)
                prev[0] = node;
        }

Let's assume the x86-64 ISA and translate this into assembly language:

        mov     node, prev              // for 'xchg' below
        movq    $0, NEXT(node)          // node->next = 0
        xchg    prev, (tail)            // prev = XCHG (&tail, node)
        // window of inconsistency
        mov     node, NEXT(prev)        // prev->next = node 

Dmitry observed that the "window of inconsistency" is one machine instruction 
in length.  We can see here that the one instruction is the final 'mov':  if 
the kernel preempts the enqueue() routine after the 'xchg' and before that 
final 'mov', the queue suffers the trauma that requires making the 
corresponding dequeue() routine so complicated.

What if we could tell the kernel NOT to preempt the routine in this window?  To 
ask the same thing, what if we could direct the kernel to allow the routine to 
"roll forward" through the window before the routine may be preempted?  
Something like this:

        mov     node, prev              // for 'xchg' below
        movq    $0, NEXT(node)          // node->next = 0
        xchg    prev, (tail)            // prev = XCHG (&tail, node)

.rollforward_enter

        mov     node, NEXT(prev)        // prev->next = node 

.rollforward_leave

Would this solve the problem?  If we are using "wired" (i.e., non-pageable) 
memory to hold all of the nodes, then I believe so.  But if we are using 
normally allocated memory, which can be paged out, then we face the potential 
problem that the write to NEXT(prev) could result in a page fault, which simply 
cannot be handled without an indeterminate delay (at least on Linux/x86 or 
xnu/x86).  Thus, rollforward alone does not satisfy.

What if we read NEXT(prev) before the 'xchg' to ensure that the 'prev' node was 
paged in?  We could accomplish this by adding lines 3 and 4 below.  Of course, 
you will point out that the node read from TAIL(queue) in line 3 may not be the 
same node (implicitly) read from TAIL(queue) in line 5; another thread could 
have enqueued a new node in the meantime.  But that newly enqueued node will 
itself be paged in by virtue of the other thread's execution of line 2.  Like 
this:

1:      mov     node, prev              // for 'xchg' below
2:      movq    $0, NEXT(node)          // node->next = 0
3:      mov     TAIL(queue), scratch    // scratch = queue->tail
4:      mov     NEXT(scratch), scratch  // scratch = scratch->next (ignore 
result)
5:      xchg    prev, (tail)            // prev = XCHG (&tail, node)

.rollforward_enter

6:      mov     node, NEXT(prev)        // prev->next = node 

.rollforward_leave

Doubtless you will now point out that one or both threads could be preempted 
between lines 2 and 5 and that during a sufficiently lengthy interruption, the 
operating system could page out the very 'prev' node that we had counted on 
being paged in as a result of having executed lines 2 through 4.  This brings 
us to another "what if?"

What if we could tell the kernel that if it preempts us between line 2 and line 
5, it should "roll back" the instruction pointer to line 2 when it returns 
control to the preempted thread?  That might result in the re-execution of 
lines 2 through 4 (or some subset thereof); however, like the re-execution of 
instructions in a CAS loop, that would not be be problematic here.  More 
importantly, the hypothesized rollback would effectively ensure that when line 
6 is executed, it will have been executed only at the the end of an 
uninterrupted sequence of instructions beginning with line 2.  With this 
guarantee, we can be certain that the 'prev' node returned by the 'xchg' in 
line 5 will be paged in, such that the write to NEXT(prev) in line 6 will NOT 
generate a page fault.  Like this:

1:      mov     node, prev              // for 'xchg' below

.rollback_enter

2:      movq    $0, NEXT(node)          // node->next = 0
3:      mov     TAIL(queue), scratch    // scratch = queue->tail
4:      mov     NEXT(scratch), scratch  // scratch = scratch->next (ignore 
result)
5:      xchg    prev, (tail)            // prev = XCHG (&tail, node)

.rollforward_enter

6:      mov     node, NEXT(prev)        // prev->next = node 

.rollforward_leave

Problem solved?  No more window of inconsistency?  Again, I believe so.

Of course, I don't expect you to agree without an empirical demonstration.  But 
I am interested in any theoretical flaws that you see.  Or any other comments 
that you might have.

Thanks,
Eric Grant

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/2413C3A7-19FD-4630-80BB-BE72718A8314%40eagrant.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to