Hi,
Last night we had one of our threaded builds on the trunk hang when
running make check on the test opal_condition in test/threads/
After running the test about 30-40 times, I was only able to get it to
hang once. Looking at it is gdb, we get:
(gdb) info threads
3 Thread 1084229984 (LWP 8450) 0x0000002a95e3bba9 in sched_yield ()
from /lib64/tls/libc.so.6
2 Thread 1094719840 (LWP 8451) 0xffffffffff600012 in ?? ()
1 Thread 182904955328 (LWP 8430) 0x0000002a9567309b in pthread_join
() from /lib64/tls/libpthread.so.0
(gdb) thread 2
[Switching to thread 2 (Thread 1094719840 (LWP 8451))]#0
0xffffffffff600012 in ?? ()
(gdb) bt
#0 0xffffffffff600012 in ?? ()
#1 0x0000000000000001 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 182904955328 (LWP 8430))]#0
0x0000002a9567309b in pthread_join () from /lib64/tls/libpthread.so.0
(gdb) bt
#0 0x0000002a9567309b in pthread_join () from /lib64/tls/libpthread.so.0
#1 0x0000002a95794a7d in opal_thread_join () from
/san/homedirs/mpiteam/mtt-runs/odin/20071204-Nightly/pb_2/installs/Bp80/src/openmpi-1.3a1r16847/opal/.libs/libopen-pal.so.0
#2 0x0000000000401684 in main ()
(gdb) thread 3
[Switching to thread 3 (Thread 1084229984 (LWP 8450))]#0
0x0000002a95e3bba9 in sched_yield () from /lib64/tls/libc.so.6
(gdb) bt
#0 0x0000002a95e3bba9 in sched_yield () from /lib64/tls/libc.so.6
#1 0x0000000000401216 in thr1_run ()
#2 0x0000002a95672137 in start_thread () from /lib64/tls/libpthread.so.0
#3 0x0000002a95e53113 in clone () from /lib64/tls/libc.so.6
(gdb)
I know, this is not very helpful, but I have no idea what is going on.
There have been no changes in this code area for a long time.
Has anyone else seen something like this? Any ideas what is going on?
Thanks,
Tim