Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs

Gustaf Neumann Fri, 26 Oct 2007 18:17:19 -0700

Tom Jackson schrieb:

I've pointed out several times that I'm looking at the operation of the queuefrom the point of view of performance (both economy and responsiveness), notthe specific numbers (min, max, timeout, maxconns) being maintained.

...

So for instance, if Andrew thinks it is a bug for minthreads to be violated,it is best to not use a thread timeout. That fixes the issue instantly.

to obey minthreads helps to improve performance, since when allconnection threads aregone, there will be a delay for thread creation upon a new request. Iagree withAndrew that it is a bug if - as it was before - minthreads are onlyavailable at server start,

and disappear as soon the threads timeout.

If minthreads is taken as a sacred number, then your solution doesn't cut it.The thread exits and then recreates another one. This is merely cosmetic, andit only appears correct if you don't notice that the number actually wentbelow minthreads.If there was a correct solution, it would prevent the violation of this lowerlimit, not fix it up after it happens.

i do not agree here. maxconns should be obeyed as well. If a thread hasserved thespecified number of items, it should exit. If the number of threads isreducedbelow minthreads, a fresh thread should be created. Keeping the threadinstead

alife is not correct. If one does not like the behavior, one could
specify maxconns=9999999999.

As an example, just look at the original code. How did threads escape exit atstartup if there is a timeout?
  388  if (poolPtr->threads.current <= poolPtr->threads.min) {
  389       timePtr = NULL;
  390   } ...
So there was already some thought at preventing thread exit based upon atimeout, if minthreads would be violated. So the question is: why isn't thiscode working?

simply because it does not consider that maxconn cycles may lead as wellto thread exits.

There are a number of bugs in this code, and fixing it up at the end, after athread has exited doesn't remove the actual bugs.An individual thread sits in a while loop waiting for a queued request thisloop, and the following check is currently:
    status = NS_OK;
    while (!poolPtr->shutdown
           && status == NS_OK
           && poolPtr->queue.wait.firstPtr == NULL) {
/*nothing is queued, we wait for a queue entry*/
        status = Ns_CondTimedWait(&poolPtr->cond, &poolPtr->lock, timePtr);
    }

    if (poolPtr->queue.wait.firstPtr == NULL) {
        msg = "timeout waiting for connection";
        break;
    }
Status starts out as NS_OK, the entire while loop is skipped if there is awaiting request, poolPtr->queue.wait.firstPtr != NULL.

this is fine, since the waiting request can be immediately processedwithout the

condwait.

Once the wait is done, the conditions are again checked, the loop is skippedif status becomes NS_TIMEOUT, or if there is a waiting request (or shutdown).
The problem is that we made the decision to exit on timeout before we knew ifexiting would violate the minthread condition.So we could do a wait, then again check if we will violate this condition, andif so, update status to NS_OK, and avoid unnecessary exit. If we move thetimeout code inside the loop, we also avoid repeating/executing this anywhereelse.

this won't get rid of the exits due to maxconns. The code after yoursnippet assumes, that

it will be reached only when there is a request ready for processing.

This is what washappening under Apache Bench. Send a 1000 requests, as long as concurrency isa little bigger than current threads, things get stuck. It doesn'tnecessarily appear to be a thread exit issue. Other problems also show up.The driver thread also does sock cleanup, which gets stuck. These stuck socksare dropped connections, they just appear to be in a queue. In fact, they arenot waiting to be serviced, the sock is just waiting for cleanup.

I have noticed this as well, but have not tried to fix this. Sometimesthe driver

stops queueing requests, although some connection threads are idling around.
Maybe this is the problem that jeff rogers had already fixed for 4.0?

Another issue was the fact that the driver thread was not doing acondBroadcast on thread create, and if thread create is skipped, it just doesa condSignal. Apparently it is not a good idea to just do condSignal.

there was another problem in the book-keeping of idle etc. which i havefixed

today (committed to cvs). The problem was that under heavy concurrency,
NsQueueConn() did not know if there are conneciton threads in a cond
wait or not. A condSignal  is the right thing. It does not make sense to
broadcast and wake up multiple waiting threads for a single request.

i was not able to recreate the crash that you have reported in the othermail.

However, with ab with small files and pure aolserver (without openacs),
i could verify that some of the counters were wrong. The decision to
create a thread or not in NsQueueConn() was based on a counter that
was wrong, since it did not take into account that some more threads
are already starting in the background. To address this issuess, there
are now two additional variables, namely starting (# of currently
starting threads) and waiting (# of threads in a cond wait).
It would be great if you could test this code again in your environment.

The current code in cvs makes it difficult to view the operation of the queue.The main problem is that threads.queued is a cumulative total, a totallypointless value.

well, for statistics, it might make sense.

queue.wait.num has the correct value, but this isunavailable (and should be) to the ns_pools command. Without the ability toquery this information while benchmarking it is difficult to tell what isgoing on.

it would be simple to return it via pool commands, but that might already
break some applications. but i do agree, that one should return that value.
Is there an objection from the community to return the number of currently
queued requests via "ns_pools get default"?

The biggest problem is assuming that using apache bench to send 10concurrent requests means that AOLserver is only queueing 10 (at most)requests at one time.

there is no indication that someone assumes this. The original problemwith thehigh number of queued requests is most easily created with slow requestsandslow starting threads. The number are controlled via limits (defaultsfor maxrun

and maxwait is 100).

-gustaf


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs

Reply via email to