Okay, there is a slight bug here in the new cvs code. What is happening is that sometimes, the driver thread and a connection thread are creating threads at the same time. I've added a few log statements, which makes the bug show up less often, but I can easily reproduce it in a few minutes at most (7-100k requests).
Here is the instrumented proc, minus the code comments: void NsCreateConnThread(Pool *poolPtr, int joinThreads) { ConnData *dataPtr; Ns_Log(Notice, "Creating Threads, joinThreads = %i", joinThreads); if (joinThreads) { NsJoinConnThreads(); } dataPtr = ns_malloc(sizeof(ConnData)); dataPtr->poolPtr = poolPtr; dataPtr->connPtr = NULL; Ns_ThreadCreate(NsConnThread, dataPtr, 0, &dataPtr->thread); Ns_Log(Notice, "Done Creating Threads"); } Also I've added a log just before a thread is going to create a new thread to replace itself (from the NsConnThread) poolPtr->threads.current++; poolPtr->threads.idle++; Ns_Log(Notice, "About to exit and create new conn thread"); Ns_MutexUnlock(&poolPtr->lock); NsCreateConnThread(poolPtr, 0); The logs which lead up to fatal 11: [24/Oct/2007:14:20:20.924295][26946.1126394176][-default:36551-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.924335][26946.1126394176][-default:36551-] Notice: exiting: exceeded max connections per thread [24/Oct/2007:14:20:20.925328][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.925450][26946.1121110336][-nssock3:driver-] Notice: Creating Threads, joinThreads = 1 [24/Oct/2007:14:20:20.925508][26946.1121110336][-nssock3:driver-] Notice: Done Creating Threads [24/Oct/2007:14:20:20.926428][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.927418][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.928339][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.929239][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.930144][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.930271][26946.1181346112][-default:36554-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.930293][26946.1181346112][-default:36554-] Notice: About to exit and create new conn thread [24/Oct/2007:14:20:20.930301][26946.1181346112][-default:36554-] Notice: Creating Threads, joinThreads = 0 [24/Oct/2007:14:20:20.931191][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.932100][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.932986][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.933628][26946.1130621248][-default:36553-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.933650][26946.1130621248][-default:36553-] Notice: exiting: exceeded max connections per thread [24/Oct/2007:14:20:20.936221][26946.1126394176][-default:36555-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.936242][26946.1126394176][-default:36555-] Notice: exiting: exceeded max connections per thread [24/Oct/2007:14:20:20.937692][26946.1121110336][-nssock3:driver-] Notice: Creating Threads, joinThreads = 1 [24/Oct/2007:14:20:20.937722][26946.1121110336][-nssock3:driver-] Fatal: received fatal signal 11 Summary: thread 36554: adds itself to join list thread 36554: about to exit mutex and run NsCreateConnThread thread 36554: At top of NsCreateConnThread thread 36554: sleeping driver thred: closing a sock, three times thread 36553: adds itself to join list, then exits (Doesn't create thread) thread 36555: adds itself to join list, then exits (Doesn't create thread) driver thred: At top of NsCreateConnThread driver thred: receives signal 11. Regardless of the actual cause of this bug, this is very illustrative of how quickly action switches from one thread to another. I'm guessing that since the original code only had the driver creating threads, there might be something unprotected, or maybe something is deleted, then assumed to exist. I kind of stumbled upon this when trying to test a pool setup like this: ns_pools set default -minthreads 1 -maxthreads 5 -maxconns 5 -timeout 10 Then I ran ab on a 28k static file (very fast to serve): ab -n 120000 -c 120 http://192.168.1.102:8000/bgwrite/sns-thumb.jpg With higher -maxconns I doubt this bug would show up in real life. tom jackson On Wednesday 24 October 2007 10:10, Tom Jackson wrote: > I've fixed the last bug in my code, but I haven't yet focused on the > timeout issue as I'm benchmarking the current cvs code. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.