Tom Jackson schrieb:
I've pointed out several times that I'm looking at the operation of the queue
from the point of view of performance (both economy and responsiveness), not
the specific numbers (min, max, timeout, maxconns) being maintained.
...
So for instance, if Andrew thinks it is a bug for minthreads to be violated,
it is best to not use a thread timeout. That fixes the issue instantly.
to obey minthreads helps to improve performance, since when all
connection threads are
gone, there will be a delay for thread creation upon a new request. I
agree with
Andrew that it is a bug if - as it was before - minthreads are only
available at server start,
and disappear as soon the threads timeout.
If minthreads is taken as a sacred number, then your solution doesn't cut it.
The thread exits and then recreates another one. This is merely cosmetic, and
it only appears correct if you don't notice that the number actually went
below minthreads.
If there was a correct solution, it would prevent the violation of this lower
limit, not fix it up after it happens.
i do not agree here. maxconns should be obeyed as well. If a thread has
served the
specified number of items, it should exit. If the number of threads is
reduced
below minthreads, a fresh thread should be created. Keeping the thread
instead
alife is not correct. If one does not like the behavior, one could
specify maxconns=9999999999.
As an example, just look at the original code. How did threads escape exit at
startup if there is a timeout?
388 if (poolPtr->threads.current <= poolPtr->threads.min) {
389 timePtr = NULL;
390 } ...
So there was already some thought at preventing thread exit based upon a
timeout, if minthreads would be violated. So the question is: why isn't this
code working?
simply because it does not consider that maxconn cycles may lead as well
to thread exits.
There are a number of bugs in this code, and fixing it up at the end, after a
thread has exited doesn't remove the actual bugs.
An individual thread sits in a while loop waiting for a queued request this
loop, and the following check is currently:
status = NS_OK;
while (!poolPtr->shutdown
&& status == NS_OK
&& poolPtr->queue.wait.firstPtr == NULL) {
/*
nothing is queued, we wait for a queue entry
*/
status = Ns_CondTimedWait(&poolPtr->cond, &poolPtr->lock, timePtr);
}
if (poolPtr->queue.wait.firstPtr == NULL) {
msg = "timeout waiting for connection";
break;
}
Status starts out as NS_OK, the entire while loop is skipped if there is a
waiting request, poolPtr->queue.wait.firstPtr != NULL.
this is fine, since the waiting request can be immediately processed
without the
condwait.
Once the wait is done, the conditions are again checked, the loop is skipped
if status becomes NS_TIMEOUT, or if there is a waiting request (or shutdown).
The problem is that we made the decision to exit on timeout before we knew if
exiting would violate the minthread condition.
So we could do a wait, then again check if we will violate this condition, and
if so, update status to NS_OK, and avoid unnecessary exit. If we move the
timeout code inside the loop, we also avoid repeating/executing this anywhere
else.
this won't get rid of the exits due to maxconns. The code after your
snippet assumes, that
it will be reached only when there is a request ready for processing.
This is what was
happening under Apache Bench. Send a 1000 requests, as long as concurrency is
a little bigger than current threads, things get stuck. It doesn't
necessarily appear to be a thread exit issue. Other problems also show up.
The driver thread also does sock cleanup, which gets stuck. These stuck socks
are dropped connections, they just appear to be in a queue. In fact, they are
not waiting to be serviced, the sock is just waiting for cleanup.
I have noticed this as well, but have not tried to fix this. Sometimes
the driver
stops queueing requests, although some connection threads are idling around.
Maybe this is the problem that jeff rogers had already fixed for 4.0?
Another issue was the fact that the driver thread was not doing a
condBroadcast on thread create, and if thread create is skipped, it just does
a condSignal. Apparently it is not a good idea to just do condSignal.
there was another problem in the book-keeping of idle etc. which i have
fixed
today (committed to cvs). The problem was that under heavy concurrency,
NsQueueConn() did not know if there are conneciton threads in a cond
wait or not. A condSignal is the right thing. It does not make sense to
broadcast and wake up multiple waiting threads for a single request.
i was not able to recreate the crash that you have reported in the other
mail.
However, with ab with small files and pure aolserver (without openacs),
i could verify that some of the counters were wrong. The decision to
create a thread or not in NsQueueConn() was based on a counter that
was wrong, since it did not take into account that some more threads
are already starting in the background. To address this issuess, there
are now two additional variables, namely starting (# of currently
starting threads) and waiting (# of threads in a cond wait).
It would be great if you could test this code again in your environment.
The current code in cvs makes it difficult to view the operation of the queue.
The main problem is that threads.queued is a cumulative total, a totally
pointless value.
well, for statistics, it might make sense.
queue.wait.num has the correct value, but this is
unavailable (and should be) to the ns_pools command. Without the ability to
query this information while benchmarking it is difficult to tell what is
going on.
it would be simple to return it via pool commands, but that might already
break some applications. but i do agree, that one should return that value.
Is there an objection from the community to return the number of currently
queued requests via "ns_pools get default"?
The biggest problem is assuming that using apache bench to send 10
concurrent requests means that AOLserver is only queueing 10 (at most)
requests at one time.
there is no indication that someone assumes this. The original problem
with the
high number of queued requests is most easily created with slow requests
and
slow starting threads. The number are controlled via limits (defaults
for maxrun
and maxwait is 100).
-gustaf
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]>
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.