(Patch and system details at bottom)

Hi all. I've root-caused and written a patch for the children stuck on
futex problem described by both Sean Thorne in 2009 and Max Barry (who I
work with) in 2011.

The core of the problem is that modperl_tipool_putback_base only
broadcasts that there are more interpreters available when there were no
available interpreters prior to this putback. While this makes sense, it
can create a problem.

Notation:
A: Acquire an interpreter
P: Putback an interpreter
B: Broadcast a free intepreter (really a signal)
W: Wait on condition tipool->available (for free interpreter)
(x,y): x is number of free interpreters at this point. y is the number
in use.
The number at the beginning of a line is the thread number
Each line occurs within a single critical section (on mutex tipool->tiplock)

Expected behavior:
4 threads, 2 free interpreters
1: A (1,1)
2: A (2,0)
3: W
4: W
1: P (1,1) B
3: A (2,0)
2: P (1,1) B
4: A (2,0)
3: P (1,1) B
4: P (0,2) <-- No broadcast because there was an available interpreter
prior to this putback.

Broken behavior:
4 threads, 2 free interpreters
1: A (1,1)
2: A (2,0)
3: W
4: W
1: P (1,1) B
2: P (0,2) <-- No broadcast because there was an available interpreter
prior to this putback.
3: A (1,1)
3: P (0,2) <-- No broadcast because there was an available interpreter
prior to this putback.
(Broken)

Thread 4 will never be signaled to pick up an interpreter. This results
in the thread getting stuck on futex because sooner or later, apache
will tell this worker to die (due to MaxRequestsPerChild). So, the
parent thread will wait on the child threads joining, but one or more
child threads will never wake up due to this problem.

My proposed fix is to always broadcast the availability of an
interpreter, regardless of whether there were already any free. This
change passes all tests that I have found to throw at it as well as no
longer deadlocking when reproducing the problem according to Max's
instructions (http://pastebin.com/YDbmq84w).

My System Details:
uname -a: Linux modperl 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11
03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Apache: Custom build of 2.2.20 with ubuntu patches
(http://packages.ubuntu.com/source/oneiric/apache2)
modperl: Custom build of 2.0.5 with ubuntu patches
(http://packages.ubuntu.com/source/oneiric/libapache2-mod-perl2)
Build process: Standard ubuntu build process with following flags set:
DEB_BUILD_OPTIONS="nostrip parallel=2 debug"
CFLAGS="-g -O2 -DMP_TRACE=1 -DPERL_DESTRUCT_LEVEL=2 -DMP_DEBUG=1
-UMP_USE_GTOP -I/usr/include/libgtop-2.0/ -I/usr/include/glib-2.0/
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include/"

Patch:
--- src/modules/perl/modperl_tipool.c.old       2012-03-03
19:43:57.112152297 -0800
+++ src/modules/perl/modperl_tipool.c   2012-03-03 04:28:31.000000000 -0800
@@ -328,9 +328,9 @@
     MP_TRACE_i(MP_FUNC, "0x%lx now available (%d in use, %d running)",
                (unsigned long)listp->data, tipool->in_use, tipool->size);

+    modperl_tipool_broadcast(tipool);
     if (tipool->in_use == (tipool->cfg->max - 1)) {
         /* hurry up, another thread may be blocking */
-        modperl_tipool_broadcast(tipool);
         modperl_tipool_unlock(tipool);
         return;
     }


Please let me know how best to get this checked in and out. As you might
imagine, this futex problem has been causing us quite a few headaches :-)

Greg Rubin   

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@perl.apache.org
For additional commands, e-mail: dev-h...@perl.apache.org

Reply via email to