Re: Request for comments: leader/follower MPM design
Brian Pane wrote: Thanks for catching that. In this particular design, I think we'll be safe because only the current listener thread is ever allowed to pop from the stack. yeah, that occurred to me driving home yesterday. If we can guarantee there is only one popper at a time, that problem can't happen. But if we want a more general apr_atomic_pop which is safe to use with multiple simultaneous pops, we'll have to fix it. But the workaround that you describe below will help solve another problem in the design: I like this technique of CAS'ing two variables at once. In the leader/follower algorithm that I originally posted, there is a race condition that can occur when all the workers are busy and one of them finishes handling its connection at about the same time that the current listener accepts a new connection: worker becoming idle: listener: atomically checks current_listener;waiting for connection finds that there's already a | listener and decides to push | self onto idle_workers stack | [context switch happens here]v |[connection arrives] | | | v |atomically checks the stack; | but the stack is empty, so | it sets current_listener=NULL | | | v | [processing the request] v [thread wakes up here] | v atomically pushes self onto stack In this scenario, we end up with no listener, but there's an idle thread in the stack that ought to have become the new listener. The problem is that the current_listener update and the stack push need to happen atomically. We could solve this problem by combining the current_listener pointer and the top-of-stack pointer into a single 32-bit int so that threads can do a CAS on both at once. Both of these pointers can be replaced by thread IDs, so we can pack them both into a 32-bit value as long as there are fewer than 2^16 worker threads. That should work. I've also seen this type of problem solved with the following: * Check the other guy's stack * If empty, add myself to my stack * Check the other guy's stack again * If it's no longer empty, back out and start over. Both the producer and consumer have to do this. In this case, think of current_listener as being sort of a one deep stack. I hope we will be able to do 64-bit CAS on most 32-bit platforms. That would simplify things greatly. Greg
Re: Request for comments: leader/follower MPM design
Greg Ames wrote: Brian Pane wrote: Thanks for catching that. In this particular design, I think we'll be safe because only the current listener thread is ever allowed to pop from the stack. yeah, that occurred to me driving home yesterday. If we can guarantee there is only one popper at a time, that problem can't happen. But if we want a more general apr_atomic_pop which is safe to use with multiple simultaneous pops, we'll have to fix it. But the workaround that you describe below will help solve another problem in the design: I like this technique of CAS'ing two variables at once. In the leader/follower algorithm that I originally posted, there is a race condition that can occur when all the workers are busy and one of them finishes handling its connection at about the same time that the current listener accepts a new connection: worker becoming idle: listener: atomically checks current_listener;waiting for connection finds that there's already a | listener and decides to push | self onto idle_workers stack | [context switch happens here]v |[connection arrives] | | | v |atomically checks the stack; | but the stack is empty, so | it sets current_listener=NULL | | | v | [processing the request] v [thread wakes up here] | v atomically pushes self onto stack In this scenario, we end up with no listener, but there's an idle thread in the stack that ought to have become the new listener. The problem is that the current_listener update and the stack push need to happen atomically. We could solve this problem by combining the current_listener pointer and the top-of-stack pointer into a single 32-bit int so that threads can do a CAS on both at once. Both of these pointers can be replaced by thread IDs, so we can pack them both into a 32-bit value as long as there are fewer than 2^16 worker threads. That should work. I've also seen this type of problem solved with the following: * Check the other guy's stack * If empty, add myself to my stack * Check the other guy's stack again * If it's no longer empty, back out and start over. Both the producer and consumer have to do this. In this case, think of current_listener as being sort of a one deep stack. I hope we will be able to do 64-bit CAS on most 32-bit platforms. That would simplify things greatly. linux is where the problem lies in general for CAS. the have a cmpxchg function defined for most systems, which does a long (max). I'll get the 32-bit versions uprunning first.. but I don't hold much hope of getting a apr-atomic-cas-64 going for all platforms.. Greg
Re: Request for comments: leader/follower MPM design
Greg Ames wrote: Brian Pane wrote: Relative to the current worker MPM design, the potential advantages of this new design are: * Better cache utilization, because the stack makes it more likely that a recently active thread will serve the next request * An implementation that uses the new APR atomic API as a more scalable alternative to worker's global mutex * Fewer context switches? worker_thread_info *atomic_pop(worker_thread_info *stack) { worker_thread_info *prev, *node; do { node = *stack; if (!first) { I assume you mean: if (!node) { oops, yes, that should say node instead of first return NULL; } prev = atomic_cas(stack, node-next, node); } while (prev != node-next); } I think there's a subtle problem here. I've seen it before on SMP boxes, otherwise it would slip right past me. The line rev = atomic_cas(stack, node-next, node); can get pretty hairy. Assume thread Z is the listener and enters atomic_pop and starts to evaluate this line. node is A when it starts, and A-next is B when node-next is evaluated. Then an interrupt happens and thread Z takes a little nap (hence its name :) In the mean time, two other threads enter atomic_pop, and A and B are popped, leaving stack pointing to C. In the context of this discussion, I guess the threads represented by A and B both get their turn as a listener. Thread A gets a request for a tiny .gif, finishes the request and does an atomic_push and blocks. Now stack-A and A-next == C. Thread B is off serving a huge file. Next thread Z wakes up from its nap. It executes the guts of atomic_cas with the params it has already evaluated. It succeeds on most platforms because stack points to A once again, so we swap B into stack But node A's next field no longer points to B, its next field is now C. Our data structure is now hosed. Thanks for catching that. In this particular design, I think we'll be safe because only the current listener thread is ever allowed to pop from the stack. But the workaround that you describe below will help solve another problem in the design: I've seen this fixed most often with a monotonically increasing counter in an adjacent field to stack. This is incremented during atomic_pop (or _push). The counter and pointer are atomic_cas'ed as a pair. Then it's pretty easy to see that things have changed and the loop should start over. I like this technique of CAS'ing two variables at once. In the leader/follower algorithm that I originally posted, there is a race condition that can occur when all the workers are busy and one of them finishes handling its connection at about the same time that the current listener accepts a new connection: worker becoming idle: listener: atomically checks current_listener;waiting for connection finds that there's already a | listener and decides to push | self onto idle_workers stack | [context switch happens here]v |[connection arrives] | | | v |atomically checks the stack; | but the stack is empty, so | it sets current_listener=NULL | | | v | [processing the request] v [thread wakes up here] | v atomically pushes self onto stack In this scenario, we end up with no listener, but there's an idle thread in the stack that ought to have become the new listener. The problem is that the current_listener update and the stack push need to happen atomically. We could solve this problem by combining the current_listener pointer and the top-of-stack pointer into a single 32-bit int so that threads can do a CAS on both at once. Both of these pointers can be replaced by thread IDs, so we can pack them both into a 32-bit value as long as there are fewer than 2^16 worker threads. --Brian RS/6000 and PowerPC have load_and_reserve and store_conditional which lets you get by without the counter. In this scenario, thread Z's store_conditional would fail even though the value of stack is the same as when it started, so it knows enough to start over. Perhaps other platforms have something similar. Greg