Re: mod_cgid and accept() loop

Amol Dev Sun, 18 Mar 2007 08:53:30 -0800

I did not notice any unusual activity in access log or any problem in syslog 
during or before the time these error were logged. It could well be the kernel 
issue. I will run this problem with HP Apache support team.

The problem did not happened for long time and not sure what is tiggering it. 
We might end up having a local modification in mod_cgid.c to check for 
ECONNABORTED before I could put mod_cgid module back in. Just have to make sure 
the daemon will be relaunched taking on requests without problem if that 
happens.

Thanks,
Amol

----- Original Message ----
From: Jeff Trawick <[EMAIL PROTECTED]>
To: [email protected]
Sent: Sunday, March 18, 2007 6:05:33 AM
Subject: Re: mod_cgid and accept() loop

On 3/17/07, Amol Dev <[EMAIL PROTECTED]> wrote:
> After running the Apache-2.0.58 server on mod_cgid on HPUX B.11.23 PA for 3-4 
> days all of sudden I see the following errors in error_log.
>
> "[Fri Mar 16 07:23:53 2007] [error] (231)Software caused connection abort: 
> Error accepting on cgid socket"
>
> There were 18 millons such entries in 30 minutes which mean the cgid daemon 
> was under infinite loop.

        len = sizeof(unix_addr);
        sd2 = accept(sd, (struct sockaddr *)&unix_addr, &len);
        if (sd2 < 0) {
            if (errno != EINTR) {
                ap_log_error(APLOG_MARK, APLOG_ERR, errno,
                             (server_rec *)data,
                             "Error accepting on cgid socket");
            }
            continue;
        }

>  Error '231'  is ECONNABORTED, which is not handled by mod_cgid and puts the
>accept() into infinite loop.

no, ECONNABORTED will generate a log message and go back into accept
and wait for a new connection; it takes an infinite number of such
connections (or kernel acting like there is) to create an infinite
loop there

perhaps the kernel is confused?  some unknown glitch caused a
connection to be aborted once, and kernel has left it on an internal
queue even after accept() is called?

> Not sure why would this socket be shutdown() by anything. But if it does get
>ECONNABORTED how should mod_cgid handle it?

It handles it correctly today IMHO.

Without information on root cause of the kernel acting like there is
an endless number of aborted connections to the mod_cgid socket, I
wouldn't suggest any change to Apache.

>  Should we handle this error by setting daemon_should_exit++? Does that 
> respawn
>new daemon without interruption?

You may wish to make a local modification to have the cgid process
exit if, for example, 10 consecutive calls to accept() return
-1/ECONNABORTED.

You may first want to try to catch it happening again and use tusc to
see if child process(es) handling request are repeatedly trying to
connect to mod_cgid's socket.  If they're not doing anything wrong,
see about applicable kernel patches.

If by chance you're using HP's Apache-based server and have support
for it, give them a call.  If anybody has heard of this before they
would likely be in the know.

____________________________________________________________________________________
We won't tell. Get more on shows you hate to love 
(and love to hate): Yahoo! TV's Guilty Pleasures list.
http://tv.yahoo.com/collections/265

Re: mod_cgid and accept() loop

Reply via email to