Hello folks. I'm digging into this issue and thought maybe someone might remember anything from those days .
The piece of software is a server based on AOL-3.4 code, but with some proprietary modules added on top. The issue cannot be reproduced in a repetitive way -- but sometimes we find that one or another instance of the server (different installations, many machines) is not servicing requests anymore, even if the task load is very low. Upon inspection of a "locked" server (dumping core via attached gdb) we found, that the conn threads were waiting to join one another (like in a queue) - I'm talking about the last sequence of code in NsConnThread where each thread that exits joins the one that exited before it. The stack in those conn threads looks like this: Thread 3 (Thread 1075616096 (LWP 9692)): #0 0x0000003e7d508a7a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #1 0x00000000004dbb94 in Ns_CondWait (condPtr=0x61b548, mutexPtr=0x61b540) at pthread.c:577 #2 0x00000000004d9098 in Ns_ThreadJoin (threadPtr=0x401c90c0, argPtr=0x0) at thread.c:186 #3 0x000000000043f126 in JoinConnThread (threadPtr=0x401c90c0) at serv.c:1000 #4 0x000000000043ebba in NsConnThread (ignored=0x0) at serv.c:738 #5 0x00000000004d912d in NsThreadMain (arg=0x803af00) at thread.c:225 #6 0x0000003e7d506137 in start_thread () from /lib64/tls/libpthread.so.0 #7 0x0000003e7b9c7113 in clone () from /lib64/tls/libc.so.6 You can safely ignore line numbers in files, just watch the sequence of calls and you get the idea. The problem is the first thread in line is waiting on the condition variable, but the thread that it is supposed to join no longer exists (or so does the core file state). Hence we get the deadlock. I'm not necessarily implying that this is an issue in the nsd code (serv.c or anything else), it could be smth else -- but does anybody seen this kind of behavior before in AOLServer ? Any hint would be helpful. Thanks, Vlad -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.