Hi,
I have checked how the linux kernel handles futex and I see a potential problem. When kaffe deadlocks with GCTest it is always hang with one thread in a pthread_mutex_unlock calling the futex syscall and the GC thread in pthread_mutex_lock calling the futex syscall. The first thread has been interrupted during the syscall by the GC when the world has been suspended. So the stack is:
sigwait <signal handler> futex syscall pthread_mutex_unlock
and the other one is: futex syscall pthread_mutex_lock GC thread
When I look into the kernel source tree I see this:
static int futex_wait(unsigned long uaddr, int val, unsigned long time)
{
DECLARE_WAITQUEUE(wait, current);
int ret, curval;
struct futex_q q;down_read(¤t->mm->mmap_sem);
the kernel then prepares the locking and unlock mmap_sem.
and
static int futex_wake(unsigned long uaddr, int nr_wake)
{
union futex_key key;
struct futex_hash_bucket *bh;
struct list_head *head;
struct futex_q *this, *next;
int ret;down_read(¤t->mm->mmap_sem);
the kernel iterates the semaphores and wakes up all threads.
what may happen if the signal handler is called after down_read in futex_wake ? => we are not able to call futex_wait because the application will deadlock because the first thread is frozen by a sigwait.
So either we have a limitation of the kernel either a bug if the analysis is correct. The only point is that I am not sure whether a signal is allowed to interrupt a syscall just in the middle futex_wake.
Regards,
Guilhem Lavaux.
_______________________________________________ kaffe mailing list [email protected] http://kaffe.org/cgi-bin/mailman/listinfo/kaffe
