Tom Jackson schrieb:
I've pointed out several times that I'm looking at the operation of the queue from the point of view of performance (both economy and responsiveness), not the specific numbers (min, max, timeout, maxconns) being maintained.
...
So for instance, if Andrew thinks it is a bug for minthreads to be violated, it is best to not use a thread timeout. That fixes the issue instantly.
to obey minthreads helps to improve performance, since when all connection threads are gone, there will be a delay for thread creation upon a new request. I agree with Andrew that it is a bug if - as it was before - minthreads are only available at server start,
and disappear as soon the threads timeout.

If minthreads is taken as a sacred number, then your solution doesn't cut it. The thread exits and then recreates another one. This is merely cosmetic, and it only appears correct if you don't notice that the number actually went below minthreads. If there was a correct solution, it would prevent the violation of this lower limit, not fix it up after it happens.
i do not agree here. maxconns should be obeyed as well. If a thread has served the specified number of items, it should exit. If the number of threads is reduced below minthreads, a fresh thread should be created. Keeping the thread instead
alife is not correct. If one does not like the behavior, one could
specify maxconns=9999999999.

As an example, just look at the original code. How did threads escape exit at startup if there is a timeout?
  388  if (poolPtr->threads.current <= poolPtr->threads.min) {
  389       timePtr = NULL;
  390   } ...


So there was already some thought at preventing thread exit based upon a timeout, if minthreads would be violated. So the question is: why isn't this code working?
simply because it does not consider that maxconn cycles may lead as well to thread exits.
There are a number of bugs in this code, and fixing it up at the end, after a thread has exited doesn't remove the actual bugs. An individual thread sits in a while loop waiting for a queued request this loop, and the following check is currently:

    status = NS_OK;
    while (!poolPtr->shutdown
           && status == NS_OK
           && poolPtr->queue.wait.firstPtr == NULL) {
/* nothing is queued, we wait for a queue entry */
        status = Ns_CondTimedWait(&poolPtr->cond, &poolPtr->lock, timePtr);
    }

    if (poolPtr->queue.wait.firstPtr == NULL) {
        msg = "timeout waiting for connection";
        break;
    }

Status starts out as NS_OK, the entire while loop is skipped if there is a waiting request, poolPtr->queue.wait.firstPtr != NULL.
this is fine, since the waiting request can be immediately processed without the
condwait.
Once the wait is done, the conditions are again checked, the loop is skipped if status becomes NS_TIMEOUT, or if there is a waiting request (or shutdown).

The problem is that we made the decision to exit on timeout before we knew if exiting would violate the minthread condition. So we could do a wait, then again check if we will violate this condition, and if so, update status to NS_OK, and avoid unnecessary exit. If we move the timeout code inside the loop, we also avoid repeating/executing this anywhere else.
this won't get rid of the exits due to maxconns. The code after your snippet assumes, that
it will be reached only when there is a request ready for processing.
This is what was happening under Apache Bench. Send a 1000 requests, as long as concurrency is a little bigger than current threads, things get stuck. It doesn't necessarily appear to be a thread exit issue. Other problems also show up. The driver thread also does sock cleanup, which gets stuck. These stuck socks are dropped connections, they just appear to be in a queue. In fact, they are not waiting to be serviced, the sock is just waiting for cleanup.
I have noticed this as well, but have not tried to fix this. Sometimes the driver
stops queueing requests, although some connection threads are idling around.
Maybe this is the problem that jeff rogers had already fixed for 4.0?
Another issue was the fact that the driver thread was not doing a condBroadcast on thread create, and if thread create is skipped, it just does a condSignal. Apparently it is not a good idea to just do condSignal.
there was another problem in the book-keeping of idle etc. which i have fixed
today (committed to cvs). The problem was that under heavy concurrency,
NsQueueConn() did not know if there are conneciton threads in a cond
wait or not. A condSignal  is the right thing. It does not make sense to
broadcast and wake up multiple waiting threads for a single request.

i was not able to recreate the crash that you have reported in the other mail.
However, with ab with small files and pure aolserver (without openacs),
i could verify that some of the counters were wrong. The decision to
create a thread or not in NsQueueConn() was based on a counter that
was wrong, since it did not take into account that some more threads
are already starting in the background. To address this issuess, there
are now two additional variables, namely starting (# of currently
starting threads) and waiting (# of threads in a cond wait).
It would be great if you could test this code again in your environment.
The current code in cvs makes it difficult to view the operation of the queue. The main problem is that threads.queued is a cumulative total, a totally pointless value.
well, for statistics, it might make sense.
queue.wait.num has the correct value, but this is unavailable (and should be) to the ns_pools command. Without the ability to query this information while benchmarking it is difficult to tell what is going on.
it would be simple to return it via pool commands, but that might already
break some applications. but i do agree, that one should return that value.
Is there an objection from the community to return the number of currently
queued requests via "ns_pools get default"?
The biggest problem is assuming that using apache bench to send 10 concurrent requests means that AOLserver is only queueing 10 (at most) requests at one time.
there is no indication that someone assumes this. The original problem with the high number of queued requests is most easily created with slow requests and slow starting threads. The number are controlled via limits (defaults for maxrun
and maxwait is 100).

-gustaf


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to