Re: Request for comments: leader/follower MPM design

2002-02-19 Thread Greg Ames

Brian Pane wrote:

 Thanks for catching that.  In this particular design, I think
 we'll be safe because only the current listener thread is ever
 allowed to pop from the stack. 

yeah, that occurred to me driving home yesterday.  If we can guarantee there is
only one popper at a time, that problem can't happen.  But if we want a more
general apr_atomic_pop which is safe to use with multiple simultaneous pops,
we'll have to fix it.  

 But the workaround that you
 describe below will help solve another problem in the design:
 
 I like this technique of CAS'ing two variables at once.  In the
 leader/follower algorithm that I originally posted, there is a
 race condition that can occur when all the workers are busy and
 one of them finishes handling its connection at about the same
 time that the current listener accepts a new connection:
 
 worker becoming idle:  listener:
atomically checks current_listener;waiting for connection
  finds that there's already a   |
  listener and decides to push   |
  self onto idle_workers stack   |
[context switch happens here]v
   |[connection arrives]
   | |
   | v
   |atomically checks the stack;
   |  but the stack is empty, so
   |  it sets
 current_listener=NULL
   | |
   | v
   |   [processing the request]
   v
 [thread wakes up here]
   |
   v
 atomically pushes self onto stack
 
 In this scenario, we end up with no listener, but there's
 an idle thread in the stack that ought to have become the
 new listener.  The problem is that the current_listener
 update and the stack push need to happen atomically.
 
 We could solve this problem by combining the current_listener
 pointer and the top-of-stack pointer into a single 32-bit
 int so that threads can do a CAS on both at once.  Both of
 these pointers can be replaced by thread IDs, so we can
 pack them both into a 32-bit value as long as there are
 fewer than 2^16 worker threads.

That should work.  I've also seen this type of problem solved with the
following:

* Check the other guy's stack
* If empty, add myself to my stack
* Check the other guy's stack again 
* If it's no longer empty, back out and start over. 

Both the producer and consumer have to do this.  In this case, think of
current_listener as being sort of a one deep stack.

I hope we will be able to do 64-bit CAS on most 32-bit platforms.  That would
simplify things greatly.

Greg



Re: Request for comments: leader/follower MPM design

2002-02-19 Thread Ian Holsman

Greg Ames wrote:
 Brian Pane wrote:
 
 
Thanks for catching that.  In this particular design, I think
we'll be safe because only the current listener thread is ever
allowed to pop from the stack. 

 
 yeah, that occurred to me driving home yesterday.  If we can guarantee there is
 only one popper at a time, that problem can't happen.  But if we want a more
 general apr_atomic_pop which is safe to use with multiple simultaneous pops,
 we'll have to fix it.  
 
 
But the workaround that you
describe below will help solve another problem in the design:

  
 
I like this technique of CAS'ing two variables at once.  In the
leader/follower algorithm that I originally posted, there is a
race condition that can occur when all the workers are busy and
one of them finishes handling its connection at about the same
time that the current listener accepts a new connection:

worker becoming idle:  listener:
   atomically checks current_listener;waiting for connection
 finds that there's already a   |
 listener and decides to push   |
 self onto idle_workers stack   |
   [context switch happens here]v
  |[connection arrives]
  | |
  | v
  |atomically checks the stack;
  |  but the stack is empty, so
  |  it sets
current_listener=NULL
  | |
  | v
  |   [processing the request]
  v
[thread wakes up here]
  |
  v
atomically pushes self onto stack

In this scenario, we end up with no listener, but there's
an idle thread in the stack that ought to have become the
new listener.  The problem is that the current_listener
update and the stack push need to happen atomically.

We could solve this problem by combining the current_listener
pointer and the top-of-stack pointer into a single 32-bit
int so that threads can do a CAS on both at once.  Both of
these pointers can be replaced by thread IDs, so we can
pack them both into a 32-bit value as long as there are
fewer than 2^16 worker threads.

 
 That should work.  I've also seen this type of problem solved with the
 following:
 
 * Check the other guy's stack
 * If empty, add myself to my stack
 * Check the other guy's stack again 
 * If it's no longer empty, back out and start over. 
 
 Both the producer and consumer have to do this.  In this case, think of
 current_listener as being sort of a one deep stack.
 
 I hope we will be able to do 64-bit CAS on most 32-bit platforms.  That would
 simplify things greatly.

linux is where the problem lies in general for CAS.
the have a cmpxchg function defined for most systems, which does a long 
(max).

I'll get the 32-bit versions uprunning first.. but I don't hold much 
hope of getting a apr-atomic-cas-64 going for all platforms..




 
 Greg
 
 






Re: Request for comments: leader/follower MPM design

2002-02-18 Thread Brian Pane

Greg Ames wrote:

Brian Pane wrote:

Relative to the current worker MPM design, the potential
advantages of this new design are:
  * Better cache utilization, because the stack makes it
more likely that a recently active thread will serve
the next request
  * An implementation that uses the new APR atomic API
as a more scalable alternative to worker's global mutex


   * Fewer context switches?

worker_thread_info *atomic_pop(worker_thread_info *stack)
{
   worker_thread_info *prev, *node;
   do {
 node = *stack;
 if (!first) {


I assume you mean:
   if (!node) {


oops, yes, that should say node instead of first

   return NULL;
 }
 prev = atomic_cas(stack, node-next, node);
   } while (prev != node-next);
}


I think there's a subtle problem here.  I've seen it before on SMP boxes,
otherwise it would slip right past me.

The line

rev = atomic_cas(stack, node-next, node);

can get pretty hairy.  

Assume thread Z is the listener and enters atomic_pop and starts to evaluate
this line. node is A when it starts, and A-next is B when node-next is
evaluated.  Then an interrupt happens and thread Z takes a little nap (hence its
name :)

In the mean time, two other threads enter atomic_pop, and A and B are popped,
leaving stack pointing to C.  In the context of this discussion, I guess the
threads represented by A and B both get their turn as a listener.  Thread A gets
a request for a tiny .gif, finishes the request and does an atomic_push and
blocks.  Now stack-A and A-next == C.  Thread B is off serving a huge file.

Next thread Z wakes up from its nap.  It executes the guts of atomic_cas with
the params it has already evaluated.  It succeeds on most platforms because
stack points to A once again, so we swap B into stack  But node A's next
field no longer points to B, its next field is now C.  Our data structure is now
hosed.


Thanks for catching that.  In this particular design, I think
we'll be safe because only the current listener thread is ever
allowed to pop from the stack.  But the workaround that you
describe below will help solve another problem in the design:

I've seen this fixed most often with a monotonically increasing counter in an
adjacent field to stack.  This is incremented during atomic_pop (or _push). 
The counter and pointer are atomic_cas'ed as a pair.  Then it's pretty easy to
see that things have changed and the loop should start over.


I like this technique of CAS'ing two variables at once.  In the
leader/follower algorithm that I originally posted, there is a
race condition that can occur when all the workers are busy and
one of them finishes handling its connection at about the same
time that the current listener accepts a new connection:

worker becoming idle:  listener:
   atomically checks current_listener;waiting for connection
 finds that there's already a   |
 listener and decides to push   |
 self onto idle_workers stack   |
   [context switch happens here]v
  |[connection arrives]
  | |
  | v
  |atomically checks the stack;
  |  but the stack is empty, so
  |  it sets 
current_listener=NULL
  | |
  | v
  |   [processing the request]
  v
[thread wakes up here]
  |
  v
atomically pushes self onto stack


In this scenario, we end up with no listener, but there's
an idle thread in the stack that ought to have become the
new listener.  The problem is that the current_listener
update and the stack push need to happen atomically.

We could solve this problem by combining the current_listener
pointer and the top-of-stack pointer into a single 32-bit
int so that threads can do a CAS on both at once.  Both of
these pointers can be replaced by thread IDs, so we can
pack them both into a 32-bit value as long as there are
fewer than 2^16 worker threads.

--Brian

RS/6000 and PowerPC have load_and_reserve and store_conditional which lets you
get by without the counter.  In this scenario, thread Z's store_conditional
would fail even though the value of stack is the same as when it started, so
it knows enough to start over.  Perhaps other platforms have something similar.

Greg