Request for comments: leader/follower MPM design

Brian Pane Sun, 17 Feb 2002 16:05:54 -0800

Now that APR is beginning to support atomic
compare-and-set (CAS) operations, I decided to
revisit an old idea: use a "leader/follower" design
to improve upon the efficiency of the worker MPM.


The following writeup describes my current design
idea.  It's somewhat radical, particularly in the
parts where it tries to detect and correct race
conditions rather than avoiding them.  Thus I'd
like to get a second opinion or two, from anyone
who has an interest in concurrent programming
problems and enough time to scrutinize the pseudocode.

Thanks,
--Brian


Leader/follower MPM design

Overview:
--------
Each httpd child process has a fixed number of worker threads.
There is no dedicated listener thread.  Instead, the workers
take turns serving as listener.  When the current listener
accepts a connection, it picks an idle worker thread to become
the new listener.  The idle threads are stored in a stack;
when a worker finishes processing a connection, it pushes
itself onto the stack.

Thanks to IanH for pointing out that this is an implementation
of a leader/followers design pattern,
  http://www.cs.wustl.edu/~schmidt/PDF/lf.pdf

Relative to the current worker MPM design, the potential
advantages of this new design are:
  * Better cache utilization, because the stack makes it
    more likely that a recently active thread will serve
    the next request
  * An implementation that uses the new APR atomic API
    as a more scalable alternative to worker's global mutex


Per-worker data structure:
-------------------------

  struct worker_thread_info {
    apr_thread_mutex_t *mutex; /* normally locked by this worker, except
                                * when it's waiting on the condition var */
    apr_thread_cond_t *condition_variable;
    struct worker_thread_info *next;  /* used to chain threads together in
                                       * the idle worker stack (see below)
                                       */
  }

Global data structures:
----------------------

  /* The worker that's currently serving as a listener:
   * can be in one of three states
   *    - pointing to some worker thread's info structure
   *    - NULL: there is no current listener
   *    - "pending" (points to a sentinel object with a unique address): the
   *        current listener has just accepted a connection and is in
   *        the process of selecting a new listener.  (This state is
   *        used to let us detect and correct a rare race condition,
   *        rather than using a global mutex to prevent it.)
   */
  static struct worker_thread_info *current_listener = NULL;

  /* A stack holding idle worker threads.  While in this stack,
   * a worker thread is blocked on its own condition variable.
   */
  static struct worker_thread_info *idle_worker_stack = NULL;


Algorithms:
----------

worker_thread()
{
  lock(my_mutex); /* will be unlocked only by cond_wait calls */

  for (;;) {

    /* This thread must wait for its turn to become the listener:
     * If there is no listener right now, this thread can become the
     * new listener immediately.  Otherwise, it must push itself onto
     * the stack of idle listeners.
     */
    int become_listener = 0;
    do {
      /* If the current_listener is NULL, replace it with a pointer
       * to this thread's worker_thread_info struct
       */
      last_listener = atomic_cas(current_listener, this, NULL);
      switch (last_listener) {
        case NULL:
          /* This thread is now the listener */
          become_listener = 1;
          break;
        case pending:
          /* Intentional race condition:
           * The old listener is in the middle of choosing a new listener.
           * To keep things simple, just spin and retry until the old
           * listener is finished.
           * Note: the 100us sleep isn't strictly necessary.  It's just
           * here to increase the # of CPU cycles available to the old
           * listener so that it can get out of pending state quickly.
           */
          usleep(100us);
          break;
        default:
          /* There already is a listener, so push this thread
           * onto the idle stack
           */
          atomic_push(idle_worker_stack, this);
          cond_wait(this->condition_variable, this->mutex);
          become_listener = 1;
      }
    } while (!become_listener);
 
    accept_connection(...);
 
    /* After accepting the connection, check the idle stack in
     * hopes of finding another thread to take this thread's place
     * as the listener.
     * But first, set the current_listener to pending to avoid
     * a race condition with any workers that might be trying
     * to enter listener state at exactly the same time.
     */
    (void)atomic_cas(current_listener, this, pending);

    new_listener = atomic_pop(idle_worker_stack);
    if (new_listener == NULL) {
      /* There are no idle workers right now, so set the
       * current_listener to NULL.  The first idle_worker
       * to finish handling its current connection will
       * become the new listener.
       */
      (void)atomic_cas(current_listener, NULL, pending);
    }
    else {
      /* Tell the first idle worker to become the new listener
       */
      (void)atomic_cas(current_listener, new_listener, pending);
      lock(new_listener->mutex);
      cond_signal(new_listener->condition_variable);
      unlock(new_listener->mutex);
    }
 
    process_connection(...);
  }
}


void atomic_push(worker_thread_info *stack, worker_thread_info *node)
{
   worker_thread_info *prev;
   node->next = *stack;
   do {
     prev = atomic_cas(stack, node, node->next);
   } while (prev != node->next);
}


worker_thread_info *atomic_pop(worker_thread_info *stack)
{
   worker_thread_info *prev, *node;
   do {
     node = *stack;
     if (!first) {
       return NULL;
     }
     prev = atomic_cas(stack, node->next, node);
   } while (prev != node->next);
}

Request for comments: leader/follower MPM design

Reply via email to