Okay, there is a slight bug here in the new cvs code.

What is happening is that sometimes, the driver thread and a connection
thread are creating threads at the same time. I've added a few log statements, 
which makes the bug show up less often, but I can easily reproduce it in a 
few minutes at most (7-100k requests).

Here is the instrumented proc, minus the code comments:

void
NsCreateConnThread(Pool *poolPtr, int joinThreads)
{
    ConnData *dataPtr;

    Ns_Log(Notice, "Creating Threads, joinThreads = %i", joinThreads);

    if (joinThreads) {

        NsJoinConnThreads();
    }

    dataPtr = ns_malloc(sizeof(ConnData));

    dataPtr->poolPtr = poolPtr;
    dataPtr->connPtr = NULL;
    Ns_ThreadCreate(NsConnThread, dataPtr, 0, &dataPtr->thread);

    Ns_Log(Notice, "Done Creating Threads");
}
 
Also I've added a log just before a thread is going to create a new thread to 
replace itself (from the NsConnThread)

        poolPtr->threads.current++;
        poolPtr->threads.idle++;
        Ns_Log(Notice, "About to exit and create new conn thread"); 
        Ns_MutexUnlock(&poolPtr->lock);
        NsCreateConnThread(poolPtr, 0);

The logs which lead up to fatal 11:

[24/Oct/2007:14:20:20.924295][26946.1126394176][-default:36551-] Notice: Added 
thread to join list for thread exit
[24/Oct/2007:14:20:20.924335][26946.1126394176][-default:36551-] Notice: 
exiting: exceeded max connections per thread
[24/Oct/2007:14:20:20.925328][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.925450][26946.1121110336][-nssock3:driver-] Notice: 
Creating Threads, joinThreads = 1
[24/Oct/2007:14:20:20.925508][26946.1121110336][-nssock3:driver-] Notice: Done 
Creating Threads
[24/Oct/2007:14:20:20.926428][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.927418][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.928339][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.929239][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.930144][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.930271][26946.1181346112][-default:36554-] Notice: Added 
thread to join list for thread exit
[24/Oct/2007:14:20:20.930293][26946.1181346112][-default:36554-] Notice: About 
to exit and create new conn thread
[24/Oct/2007:14:20:20.930301][26946.1181346112][-default:36554-] Notice: 
Creating Threads, joinThreads = 0
[24/Oct/2007:14:20:20.931191][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.932100][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.932986][26946.1121110336][-nssock3:driver-] Notice: 
Closing Sock in SockClose
[24/Oct/2007:14:20:20.933628][26946.1130621248][-default:36553-] Notice: Added 
thread to join list for thread exit
[24/Oct/2007:14:20:20.933650][26946.1130621248][-default:36553-] Notice: 
exiting: exceeded max connections per thread
[24/Oct/2007:14:20:20.936221][26946.1126394176][-default:36555-] Notice: Added 
thread to join list for thread exit
[24/Oct/2007:14:20:20.936242][26946.1126394176][-default:36555-] Notice: 
exiting: exceeded max connections per thread
[24/Oct/2007:14:20:20.937692][26946.1121110336][-nssock3:driver-] Notice: 
Creating Threads, joinThreads = 1
[24/Oct/2007:14:20:20.937722][26946.1121110336][-nssock3:driver-] Fatal: 
received fatal signal 11



Summary:
 thread 36554: adds itself to join list
 thread 36554: about to exit mutex and run NsCreateConnThread
 thread 36554: At top of NsCreateConnThread
 thread 36554: sleeping
 
 driver thred: closing a sock, three times
 
 thread 36553: adds itself to join list, then exits (Doesn't create thread)
 thread 36555: adds itself to join list, then exits (Doesn't create thread)

 driver thred: At top of NsCreateConnThread
 driver thred: receives signal 11.


Regardless of the actual cause of this bug, this is very illustrative of
how quickly action switches from one thread to another. 

I'm guessing that since the original code only had the driver creating threads,
there might be something unprotected, or maybe something is deleted, then 
assumed
to exist. 

I kind of stumbled upon this when trying to test a pool setup like this:
  
ns_pools set default -minthreads 1 -maxthreads 5 -maxconns 5 -timeout 10

Then I ran ab on a 28k static file (very fast to serve):

ab -n 120000 -c 120 http://192.168.1.102:8000/bgwrite/sns-thumb.jpg

With higher -maxconns I doubt this bug would show up in real life. 

tom jackson

On Wednesday 24 October 2007 10:10, Tom Jackson wrote:
> I've fixed the last bug in my code, but I haven't yet focused on the
> timeout issue as I'm benchmarking the current cvs code.


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to