I had some time today, so I set up a QEMU VM to work on reproducing this,
and I found some interesting things. First of all, I can reproduce it
pretty consistently in the VM with this kernel (the -rt one), but I can't
reproduce it at all with linux-image-3.2.0-4-amd64 (the default Wheezy one).

Also, I've discovered 2 ways to reproduce the problem (they were both
working in the VM, but then I rebooted it and the first one stopped
working). They're both sequences of commands to enter in GDB after starting
it with the mutex_test file that I attached earlier (`gdb mutex_test`):

Version 1 (This is roughly how I first found the problem, but sometimes it
misses the kernel bug or something like that and (I think) just goes into
an infinite loop in my buggy code):
set follow-fork-mode child
b mutex_test.cpp:43
run
record
cont

Version 2 (This one seems to reproduce the kernel bug more reliably):
set detach-on-fork off
b main
run
record
cont

I also figured out the bug in my code: I had the logic flipped on whether
the fastpath atomic 0->TID succeeded, so it was calling into the kernel
when it had already locked the futex in userspace.

Reply via email to