Now that APR is beginning to support atomic compare-and-set (CAS) operations, I decided to revisit an old idea: use a "leader/follower" design to improve upon the efficiency of the worker MPM.
The following writeup describes my current design idea. It's somewhat radical, particularly in the parts where it tries to detect and correct race conditions rather than avoiding them. Thus I'd like to get a second opinion or two, from anyone who has an interest in concurrent programming problems and enough time to scrutinize the pseudocode. Thanks, --Brian Leader/follower MPM design Overview: -------- Each httpd child process has a fixed number of worker threads. There is no dedicated listener thread. Instead, the workers take turns serving as listener. When the current listener accepts a connection, it picks an idle worker thread to become the new listener. The idle threads are stored in a stack; when a worker finishes processing a connection, it pushes itself onto the stack. Thanks to IanH for pointing out that this is an implementation of a leader/followers design pattern, http://www.cs.wustl.edu/~schmidt/PDF/lf.pdf Relative to the current worker MPM design, the potential advantages of this new design are: * Better cache utilization, because the stack makes it more likely that a recently active thread will serve the next request * An implementation that uses the new APR atomic API as a more scalable alternative to worker's global mutex Per-worker data structure: ------------------------- struct worker_thread_info { apr_thread_mutex_t *mutex; /* normally locked by this worker, except * when it's waiting on the condition var */ apr_thread_cond_t *condition_variable; struct worker_thread_info *next; /* used to chain threads together in * the idle worker stack (see below) */ } Global data structures: ---------------------- /* The worker that's currently serving as a listener: * can be in one of three states * - pointing to some worker thread's info structure * - NULL: there is no current listener * - "pending" (points to a sentinel object with a unique address): the * current listener has just accepted a connection and is in * the process of selecting a new listener. (This state is * used to let us detect and correct a rare race condition, * rather than using a global mutex to prevent it.) */ static struct worker_thread_info *current_listener = NULL; /* A stack holding idle worker threads. While in this stack, * a worker thread is blocked on its own condition variable. */ static struct worker_thread_info *idle_worker_stack = NULL; Algorithms: ---------- worker_thread() { lock(my_mutex); /* will be unlocked only by cond_wait calls */ for (;;) { /* This thread must wait for its turn to become the listener: * If there is no listener right now, this thread can become the * new listener immediately. Otherwise, it must push itself onto * the stack of idle listeners. */ int become_listener = 0; do { /* If the current_listener is NULL, replace it with a pointer * to this thread's worker_thread_info struct */ last_listener = atomic_cas(current_listener, this, NULL); switch (last_listener) { case NULL: /* This thread is now the listener */ become_listener = 1; break; case pending: /* Intentional race condition: * The old listener is in the middle of choosing a new listener. * To keep things simple, just spin and retry until the old * listener is finished. * Note: the 100us sleep isn't strictly necessary. It's just * here to increase the # of CPU cycles available to the old * listener so that it can get out of pending state quickly. */ usleep(100us); break; default: /* There already is a listener, so push this thread * onto the idle stack */ atomic_push(idle_worker_stack, this); cond_wait(this->condition_variable, this->mutex); become_listener = 1; } } while (!become_listener); accept_connection(...); /* After accepting the connection, check the idle stack in * hopes of finding another thread to take this thread's place * as the listener. * But first, set the current_listener to pending to avoid * a race condition with any workers that might be trying * to enter listener state at exactly the same time. */ (void)atomic_cas(current_listener, this, pending); new_listener = atomic_pop(idle_worker_stack); if (new_listener == NULL) { /* There are no idle workers right now, so set the * current_listener to NULL. The first idle_worker * to finish handling its current connection will * become the new listener. */ (void)atomic_cas(current_listener, NULL, pending); } else { /* Tell the first idle worker to become the new listener */ (void)atomic_cas(current_listener, new_listener, pending); lock(new_listener->mutex); cond_signal(new_listener->condition_variable); unlock(new_listener->mutex); } process_connection(...); } } void atomic_push(worker_thread_info *stack, worker_thread_info *node) { worker_thread_info *prev; node->next = *stack; do { prev = atomic_cas(stack, node, node->next); } while (prev != node->next); } worker_thread_info *atomic_pop(worker_thread_info *stack) { worker_thread_info *prev, *node; do { node = *stack; if (!first) { return NULL; } prev = atomic_cas(stack, node->next, node); } while (prev != node->next); }