Johnny Lam writes:

Increasing the number of daemon processes does not totally eliminate the problem, it only makes it less likely to occur.

You trimmed the question in my last post that essentially asked if this same problem can occur with a default courier-authlib setup. In a default courier-authlib setup, if I configure X authdaemond processes to start, and then if X+1 connections arrive simultaneously, what is supposed to happen? If one of the connections will timeout and fail, then is there another solution aside from increasing the number of authdaemond processes started? I apologize in advance that I don't understand enough to read the source code to find the answer.

It's certainly possible, but the chances are miniscule. All Posix systems have a queue of incoming connections that are waiting to be accepted by a listening process. If a connection requests comes in, but all child processes are busy, the connection request will wait until a process becomes available.

Under normal circumstances an average authentication request takes a fraction of a second to complete, and waiting connection requests will likely go through after only a brief delay.

In your case, a child process will NOT become available, because all of the existing ones are trying to connect back to the authdaemon socket, and no processes will be available until the existing authentication requests are completed, but they can't complete until their own connection goes through. Changing the number of processes only affects the number of simultaneous connections that must arrive within a predetermined amount of time before they wedge themselves.

Under normal circumstances, you're really going to have problems only if you're using something like LDAP or SQL, the back end goes down and the existing connections stall, or the actual database query stalls for some reason. Then, the existing processes get stuck quickly, and new authentication requests begin to fail as well.

Let's run some ballpark calculation. Let's say an average authentication requests takes 100 milliseconds (probably an overkill), and in your case the first 90 milliseconds are spent doing whatever, then you reconnect back to authdaemon for a fast lookup which takes the remaining 10 milliseconds.

It becomes obvious that if, under the default configuration, if five connection requests arrive within the first 90 milliseconds, all of them will fail.

Now, let's take a regular 100 milliseconds authentication request. I checked the code, authdaemon waits ten seconds for a connection request to go through, before failing (and then it waits for a response for another ten seconds, after sending the request, but that's not relevant here).

So, a single authdaemon process can chew through ten requests per second, or a hundred requests in ten seconds. Five processes, therefore, will process five hundred authentication requests in five seconds.

So, authdaemon will be able to handle a peak load of up to five hundred authentication requests in the space of 100 milliseconds (versus a peak load of five requests in 90 milliseconds with your custom authdaemon). None of them will fail because, eventually, a process will free up before the ten seconds run out.

Of course, getting _another_ five hundred requests in the next 100 milliseconds isn't going to work, because there's going to be four hundred unprocessed requests left, from the first batch of five hundred.

So, if you finish working out the math, your computations will show that, with a 100 millisecond average response time, the default authdaemon configuration will be able to handle a steady load of a hundred authentication requests per second, and temporarily accomodate an extra load of up to four hundred more requests (which will continue to accumulate, with the extra four hundred "buffer" beginning to dissipate once the average number of authentication requests falls below a hundred a second).

But with your custom authdaemon, as soon as you get five authentication requests within a 100 milliseconds, everything will break. None of the other math makes any difference. You can make the situation a bit better by increasing the number of processes, which will give a little benefit, but won't come nowhere the ballpark of a stock authdaemon.

Here's a suggestion. In authdaemond.c, pre() is a static function. Try making it a global, and invoking it from your custom module, instead of reconnecting. That might do the trick.

Attachment: pgpttvZCefL7M.pgp
Description: PGP signature

Reply via email to