Brian Pane wrote:

> Relative to the current worker MPM design, the potential
> advantages of this new design are:
>   * Better cache utilization, because the stack makes it
>     more likely that a recently active thread will serve
>     the next request
>   * An implementation that uses the new APR atomic API
>     as a more scalable alternative to worker's global mutex

   * Fewer context switches?

> worker_thread_info *atomic_pop(worker_thread_info *stack)
> {
>    worker_thread_info *prev, *node;
>    do {
>      node = *stack;
>      if (!first) {

I assume you mean:
       if (!node) {

>        return NULL;
>      }
>      prev = atomic_cas(stack, node->next, node);
>    } while (prev != node->next);
> }

I think there's a subtle problem here.  I've seen it before on SMP boxes,
otherwise it would slip right past me.

The line

    rev = atomic_cas(stack, node->next, node);

can get pretty hairy.  

Assume thread Z is the listener and enters atomic_pop and starts to evaluate
this line. "node" is A when it starts, and A->next is B when "node->next" is
evaluated.  Then an interrupt happens and thread Z takes a little nap (hence its
name :)

In the mean time, two other threads enter atomic_pop, and A and B are popped,
leaving stack pointing to C.  In the context of this discussion, I guess the
threads represented by A and B both get their turn as a listener.  Thread A gets
a request for a tiny .gif, finishes the request and does an atomic_push and
blocks.  Now stack->A and A->next == C.  Thread B is off serving a huge file.

Next thread Z wakes up from its nap.  It executes the guts of atomic_cas with
the params it has already evaluated.  It succeeds on most platforms because
"stack" points to A once again, so we swap B into "stack"  But node A's next
field no longer points to B, its next field is now C.  Our data structure is now
hosed.

I've seen this fixed most often with a monotonically increasing counter in an
adjacent field to "stack".  This is incremented during atomic_pop (or _push). 
The counter and pointer are atomic_cas'ed as a pair.  Then it's pretty easy to
see that things have changed and the loop should start over.

RS/6000 and PowerPC have load_and_reserve and store_conditional which lets you
get by without the counter.  In this scenario, thread Z's store_conditional
would fail even though the value of "stack" is the same as when it started, so
it knows enough to start over.  Perhaps other platforms have something similar.

Greg

Reply via email to