Hi,

I have checked how the linux kernel handles futex and I see a potential problem. When kaffe deadlocks with GCTest it is always hang with one thread in a pthread_mutex_unlock calling the futex syscall and the GC thread in pthread_mutex_lock calling the futex syscall. The first thread has been interrupted during the syscall by the GC when the world has been suspended. So the stack is:

sigwait
<signal handler>
futex syscall
pthread_mutex_unlock


and the other one is: futex syscall pthread_mutex_lock GC thread

When I look into the kernel source tree I see this:
static int futex_wait(unsigned long uaddr, int val, unsigned long time)
{
       DECLARE_WAITQUEUE(wait, current);
       int ret, curval;
       struct futex_q q;

       down_read(&current->mm->mmap_sem);

the kernel then prepares the locking and unlock mmap_sem.

and

static int futex_wake(unsigned long uaddr, int nr_wake)
{
       union futex_key key;
       struct futex_hash_bucket *bh;
       struct list_head *head;
       struct futex_q *this, *next;
       int ret;

       down_read(&current->mm->mmap_sem);

the kernel iterates the semaphores and wakes up all threads.


what may happen if the signal handler is called after down_read in futex_wake ? => we are not able to call futex_wait because the application will deadlock because the first thread is frozen by a sigwait.


So either we have a limitation of the kernel either a bug if the analysis is correct. The only point is that I am not sure whether a signal is allowed to interrupt a syscall just in the middle futex_wake.


Regards,

Guilhem Lavaux.

_______________________________________________
kaffe mailing list
[email protected]
http://kaffe.org/cgi-bin/mailman/listinfo/kaffe

Reply via email to