I had some time today, so I set up a QEMU VM to work on reproducing this, and I found some interesting things. First of all, I can reproduce it pretty consistently in the VM with this kernel (the -rt one), but I can't reproduce it at all with linux-image-3.2.0-4-amd64 (the default Wheezy one).
Also, I've discovered 2 ways to reproduce the problem (they were both working in the VM, but then I rebooted it and the first one stopped working). They're both sequences of commands to enter in GDB after starting it with the mutex_test file that I attached earlier (`gdb mutex_test`): Version 1 (This is roughly how I first found the problem, but sometimes it misses the kernel bug or something like that and (I think) just goes into an infinite loop in my buggy code): set follow-fork-mode child b mutex_test.cpp:43 run record cont Version 2 (This one seems to reproduce the kernel bug more reliably): set detach-on-fork off b main run record cont I also figured out the bug in my code: I had the logic flipped on whether the fastpath atomic 0->TID succeeded, so it was calling into the kernel when it had already locked the futex in userspace.

