Re: Request for comments: leader/follower MPM design

Brian Pane Mon, 18 Feb 2002 18:32:34 -0800

Greg Ames wrote:

>Brian Pane wrote:
>
>>Relative to the current worker MPM design, the potential
>>advantages of this new design are:
>>  * Better cache utilization, because the stack makes it
>>    more likely that a recently active thread will serve
>>    the next request
>>  * An implementation that uses the new APR atomic API
>>    as a more scalable alternative to worker's global mutex
>>
>
>   * Fewer context switches?
>
>>worker_thread_info *atomic_pop(worker_thread_info *stack)
>>{
>>   worker_thread_info *prev, *node;
>>   do {
>>     node = *stack;
>>     if (!first) {
>>
>
>I assume you mean:
>       if (!node) {
>


oops, yes, that should say "node" instead of "first"

>>       return NULL;
>>     }
>>     prev = atomic_cas(stack, node->next, node);
>>   } while (prev != node->next);
>>}
>>
>
>I think there's a subtle problem here.  I've seen it before on SMP boxes,
>otherwise it would slip right past me.
>
>The line
>
>    rev = atomic_cas(stack, node->next, node);
>
>can get pretty hairy.  
>
>Assume thread Z is the listener and enters atomic_pop and starts to evaluate
>this line. "node" is A when it starts, and A->next is B when "node->next" is
>evaluated.  Then an interrupt happens and thread Z takes a little nap (hence its
>name :)
>
>In the mean time, two other threads enter atomic_pop, and A and B are popped,
>leaving stack pointing to C.  In the context of this discussion, I guess the
>threads represented by A and B both get their turn as a listener.  Thread A gets
>a request for a tiny .gif, finishes the request and does an atomic_push and
>blocks.  Now stack->A and A->next == C.  Thread B is off serving a huge file.
>
>Next thread Z wakes up from its nap.  It executes the guts of atomic_cas with
>the params it has already evaluated.  It succeeds on most platforms because
>"stack" points to A once again, so we swap B into "stack"  But node A's next
>field no longer points to B, its next field is now C.  Our data structure is now
>hosed.
>

Thanks for catching that.  In this particular design, I think
we'll be safe because only the current listener thread is ever
allowed to pop from the stack.  But the workaround that you
describe below will help solve another problem in the design:

>I've seen this fixed most often with a monotonically increasing counter in an
>adjacent field to "stack".  This is incremented during atomic_pop (or _push). 
>The counter and pointer are atomic_cas'ed as a pair.  Then it's pretty easy to
>see that things have changed and the loop should start over.
>

I like this technique of CAS'ing two variables at once.  In the
leader/follower algorithm that I originally posted, there is a
race condition that can occur when all the workers are busy and
one of them finishes handling its connection at about the same
time that the current listener accepts a new connection:

worker becoming idle:                      listener:
   atomically checks current_listener;        waiting for connection
     finds that there's already a                   |
     listener and decides to push                   |
     self onto idle_workers stack                   |
   [context switch happens here]                    v
              |                                [connection arrives]
              |                                     |
              |                                     v
              |                                atomically checks the stack;
              |                                  but the stack is empty, so
              |                                  it sets 
current_listener=NULL
              |                                     |
              |                                     v
              |                               [processing the request]
              v
    [thread wakes up here]
              |
              v
    atomically pushes self onto stack


In this scenario, we end up with no listener, but there's
an idle thread in the stack that ought to have become the
new listener.  The problem is that the current_listener
update and the stack push need to happen atomically.

We could solve this problem by combining the current_listener
pointer and the top-of-stack pointer into a single 32-bit
int so that threads can do a CAS on both at once.  Both of
these pointers can be replaced by thread IDs, so we can
pack them both into a 32-bit value as long as there are
fewer than 2^16 worker threads.

--Brian

>RS/6000 and PowerPC have load_and_reserve and store_conditional which lets you
>get by without the counter.  In this scenario, thread Z's store_conditional
>would fail even though the value of "stack" is the same as when it started, so
>it knows enough to start over.  Perhaps other platforms have something similar.
>
>Greg
>

Re: Request for comments: leader/follower MPM design

Reply via email to