Hello folks.

I'm digging into this issue and thought maybe someone might remember
anything from those days .

 

The piece of software is a server based on AOL-3.4 code, but with some
proprietary modules added on top.

The issue cannot be reproduced in a repetitive way   --   but sometimes we
find that one or another instance of the server (different installations,
many machines) is not servicing requests anymore, even if the task load is
very low. Upon inspection of a "locked" server (dumping core via attached
gdb) we found, that the conn threads were waiting to join one another (like
in a queue)  - I'm talking about the last sequence of code in NsConnThread
where each thread that exits joins the one that exited before it.  The stack
in those conn threads looks like this:

 

Thread 3 (Thread 1075616096 (LWP 9692)):
#0  0x0000003e7d508a7a in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/tls/libpthread.so.0
#1  0x00000000004dbb94 in Ns_CondWait (condPtr=0x61b548, mutexPtr=0x61b540)
at pthread.c:577
#2  0x00000000004d9098 in Ns_ThreadJoin (threadPtr=0x401c90c0, argPtr=0x0)
at thread.c:186
#3  0x000000000043f126 in JoinConnThread (threadPtr=0x401c90c0) at
serv.c:1000
#4  0x000000000043ebba in NsConnThread (ignored=0x0) at serv.c:738
#5  0x00000000004d912d in NsThreadMain (arg=0x803af00) at thread.c:225
#6  0x0000003e7d506137 in start_thread () from /lib64/tls/libpthread.so.0
#7  0x0000003e7b9c7113 in clone () from /lib64/tls/libc.so.6

 

You can safely ignore line numbers in files, just watch the sequence of
calls and you get the idea.

The problem is the first thread in line is waiting on the condition
variable, but the thread that it is supposed to join no longer exists (or so
does the core file state). Hence we get the deadlock.

 

I'm not necessarily implying that this is an issue in the nsd code (serv.c
or anything else), it could be smth else   --   but does anybody seen this
kind of behavior before in AOLServer ?  Any hint would be helpful.

 

Thanks,

Vlad

 



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to