On Wednesday 24 October 2007 01:07, Gustaf Neumann wrote: > concerning the code in cvs head: > the server creates now worker threads when a > - thread exits and more requests are queued, > and no other thread is idle, or when a > - thread exits and the number of connection threads > falls below minthreads, > both cases unless shutdown. The handled thread > exists are typically due to exceeding maxconns > or the idle timeout. > > This is what my tests show. What are the problems? Or are > you taking about the version in cvs before these changes? > Of course, before these changes, minthreads was respected > only at startup, queued requests were not processed. >
I've pointed out several times that I'm looking at the operation of the queue from the point of view of performance (both economy and responsiveness), not the specific numbers (min, max, timeout, maxconns) being maintained. Also, I'm looking at the impact of choosing different combinations of numbers. So for instance, if Andrew thinks it is a bug for minthreads to be violated, it is best to not use a thread timeout. That fixes the issue instantly. I don't think the operation of the queue has to hold your hand to keep you from making bad decisions. But lets address this a little more carefully before rushing in to fix something which may logically appear to be a bug, but probably isn't one. And also we need to look at the solution. I'll start with yours. If minthreads is taken as a sacred number, then your solution doesn't cut it. The thread exits and then recreates another one. This is merely cosmetic, and it only appears correct if you don't notice that the number actually went below minthreads. If there was a correct solution, it would prevent the violation of this lower limit, not fix it up after it happens. As an example, just look at the original code. How did threads escape exit at startup if there is a timeout? 388 if (poolPtr->threads.current <= poolPtr->threads.min) { 389 timePtr = NULL; 390 } ... So there was already some thought at preventing thread exit based upon a timeout, if minthreads would be violated. So the question is: why isn't this code working? There are a number of bugs in this code, and fixing it up at the end, after a thread has exited doesn't remove the actual bugs. An individual thread sits in a while loop waiting for a queued request this loop, and the following check is currently: status = NS_OK; while (!poolPtr->shutdown && status == NS_OK && poolPtr->queue.wait.firstPtr == NULL) { /* nothing is queued, we wait for a queue entry */ status = Ns_CondTimedWait(&poolPtr->cond, &poolPtr->lock, timePtr); } if (poolPtr->queue.wait.firstPtr == NULL) { msg = "timeout waiting for connection"; break; } Status starts out as NS_OK, the entire while loop is skipped if there is a waiting request, poolPtr->queue.wait.firstPtr != NULL. Once the wait is done, the conditions are again checked, the loop is skipped if status becomes NS_TIMEOUT, or if there is a waiting request (or shutdown). The problem is that we made the decision to exit on timeout before we knew if exiting would violate the minthread condition. So we could do a wait, then again check if we will violate this condition, and if so, update status to NS_OK, and avoid unnecessary exit. If we move the timeout code inside the loop, we also avoid repeating/executing this anywhere else. But there is another bug/error in the code. You can set timeout to <= 0. In this case, the code happily sets the timeout (as a real future time), but will immediately exit in wait. I've fixed the last bug in my code, but I haven't yet focused on the timeout issue as I'm benchmarking the current cvs code. Current CVS: Running the latest code, I discovered the same issue that my code had, but has been eliminated. If maxthreads is reduced [ns_pools set mypool -maxthreads x] the number of threads doesn't decrease. It also doesn't go down based upon load. Instead it goes immediately to original maxthreads and stays there. I'll need to be more exact in my descriptions later, but basically, the queue operation goes to max, which is better than it was before, when it was hard to get even up to min. However, as an overall performance goal, threadpools should moderate their behavior, as activity may move from one threadpool to another. If everything stays at max, this isn't achieved. The original problem is easy to describe now that I have done a bunch of testing: build up a large enough queue so that all remaining threads will not be able to service them, then stop sending requests. This is what was happening under Apache Bench. Send a 1000 requests, as long as concurrency is a little bigger than current threads, things get stuck. It doesn't necessarily appear to be a thread exit issue. Other problems also show up. The driver thread also does sock cleanup, which gets stuck. These stuck socks are dropped connections, they just appear to be in a queue. In fact, they are not waiting to be serviced, the sock is just waiting for cleanup. Another issue was the fact that the driver thread was not doing a condBroadcast on thread create, and if thread create is skipped, it just does a condSignal. Apparently it is not a good idea to just do condSignal. I haven't checked what happens if this is changed, but already my queue works very well. I haven't seen any stuck requests that are not dropped (but I'm still looking). The current code in cvs makes it difficult to view the operation of the queue. The main problem is that threads.queued is a cumulative total, a totally pointless value. queue.wait.num has the correct value, but this is unavailable (and should be) to the ns_pools command. Without the ability to query this information while benchmarking it is difficult to tell what is going on. The biggest problem is assuming that using apache bench to send 10 concurrent requests means that AOLserver is only queueing 10 (at most) requests at one time. However this is not necessarily true. First, a request might come in faster than a thread can recycle. This is one reason that the current CVS code pushes threads.current all the way up to maxthreads. But I have also been experimenting with [ns_conn channel] to send the request to a worker thread to relieve the connection thread. When this is used, queued requests can build up much higher values. tom jackson > My first solution was based on a strategy, where a > thread did not exit, when there was more to do (more > queued requests), allthough it should exit due to maxconns. > However, a user specifying maxconns might have good > reasons for doing so, therefore the value must be respected. > Avoiding exit in this situation is not viable option IMHO. > > It can be discussed, whether the idle timeout should be > obeyed e.g. in situations where number of threads is > below minthreads. In the code in cvs the thread exits > and is recreated as in other cases, when it falls below > minthreads. It would be more efficient to avoid the > timeout in such situations, but one might end up > with quite old worker threads, which are more > sensitive to growth. > > -gustaf neumann > > > -- > AOLserver - http://www.aolserver.com/ > > To Remove yourself from this list, simply send an email to > <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the > email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.