Re: mod_perl + Apache 1.3.28

Radu Greab Wed, 27 Aug 2003 16:32:13 +0000

I'm glad (OK, I've spent half a day on this) I haven't followed the
suggestion in Bricolage docs to use Apache 1.3.27, but went with
Apache 1.3.28. Today I stepped on the same issue like you have and I
have found the cause of it.


> I just ran into some very strange issues with connections to a
> PostgreSQL database (using DBD::Pg) getting unceremoniously dropped by
> my mod_perl application. I was getting error messages such as these:
> 
> [Tue Jul 22 13:21:44 2003] [error] [client 207.172.11.148] Unable to
> turn AutoCommit off: DBD driver has not implemented the AutoCommit
> attribute at /usr/local/lib/perl5/site_perl/5.8.0/Bric/Util/DBI.pm line
> 528, <GEN71> line 57.
> 
> Database handle destroyed without explicit disconnect at
> /usr/local/lib/perl5/site_perl/5.8.0/Bric/Util/DBI.pm line 1661.
> Database handle destroyed without explicit disconnect at
> /usr/local/lib/perl5/site_perl/5.8.0/Bric/Util/DBI.pm line 1661.
> commit ineffective with AutoCommit enabled at
> /usr/local/lib/perl5/site_perl/5.8.0/Bric/Util/DBI.pm line 571.
> Database handle destroyed without explicit disconnect at
> /usr/local/lib/perl5/site_perl/5.8.0/Bric/Util/DBI.pm line 1661.
> 
> The PostgreSQL log had these errors:
> 
> FATAL:  Socket command type H unknown
> FATAL:  Socket command type 0 unknown
> LOG:  pq_flush: send() failed: Broken pipe
> 
> After a great deal of experimentation and recompilation, I finally
> identified Apache 1.3.28 as the culprit. When I reverted to 1.3.27, the
> problem went away.
> 
> I had a look at the Apache changelog to see what might have caused this
> issue in Apache 1.3.28, and the most relevant item I found was this:
> 
>    *) Certain 3rd party modules would bypass the Apache API and not
>       invoke ap_cleanup_for_exec() before creating sub-processes.
>       To such a child process, Apache's file descriptors (lock
>       fd's, log files, sockets) were accessible, allowing them
>       direct access to Apache log file etc.  Where the OS allows,
>       we now add proactive close functions to prevent these file
>       descriptors from leaking to the child processes.
>       [Jim Jagielski, Martin Kraemer]

Yes, this is the change that causes the problem.

> Note that although I replicated this problem on FreeBSD and on Mac OS X
> (and had a report of the same issue from a Solaris user) with mod_perl
> 1.28 and 1.27, it disappeared when I ran Apache in single-process mode.
> I'm guessing that the problem is with either a) Apache (ack!); b) The
> way Apache binds mod_perl (I've got mod_perl 1.x statically compiled
> against Apache 1.3.x, of course); or c) Something to do with the way
> DBI or DBD::Pg does something. My guess is b, but wouldn't rule out a;
> c would be pretty strange.

Unfortunately it is item a.

Here is an edited and commented trace of an httpd child when it stalls and
the DBD::Pg problems are seen later on the error log:

- new connection accepted on file descriptor 3:

16:07:20.572866 accept(16, {sin_family=AF_INET, sin_port=htons(37416), 
sin_addr=inet_addr("10.0.0.2")}}, [16]) = 3
16:07:26.829293 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
16:07:26.829526 getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), 
sin_addr=inet_addr("10.0.0.2")}}, [16]) = 0
16:07:26.829833 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
16:07:26.830051 brk(0x97ef000)          = 0x97ef000
16:07:26.830484 read(3, "GET / HTTP/1.0\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux 
i686; en-US; rv:1.3) Gecko/20030312\r\nAccept: 
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1\r\nAccept-Language:
 en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate,compress;q=0.9\r\nAccept-Charset: 
ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 300\r\nCookie: 
BRICOLAGE=42ad03c41c7dd9e3c78e293b9db9b533\r\nVia: 1.1 yx.primIT.ro:3128 
(Squid/2.4.STABLE6)\r\nX-Forwarded-For: 127.0.0.1\r\nHost: 
yx.primit.ro:8080\r\nCache-Control: max-age=259200\r\nConnection: keep-alive\r\n\r\n", 
4096) = 594

- sending response:

16:07:26.966408 write(3, "HTTP/1.1 302 Found\r\nDate: Wed, 27 Aug 2003 13:07:26 
GMT\r\nServer: Apache/1.3.28 (Unix) mod_perl/1.28\r\nSet-Cookie: BRICOLAGE=Expired; 
path=/; expires=Tue, 26-Aug-2003 13:07:26 GMT\r\nSet-Cookie: 
BRICOLAGE=65d4b58545312108b53de34bf37b9cdb; path=/\r\nLocation: 
http://yx.primit.ro:8080/login\r\nConnection: close\r\nContent-Type: text/html; 
charset=iso-8859-1\r\n\r\n<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 
2.0//EN\">\n<HTML><HEAD>\n<TITLE>302 Found</TITLE>\n</HEAD><BODY>\n<H1>Found</H1>\nThe 
document has moved <A 
HREF=\"http://yx.primit.ro:8080/login\";>here</A>.<P>\n<HR>\n<ADDRESS>Apache/1.3.28 
Server at <A HREF=\"mailto:[EMAIL PROTECTED]">_default_</A> Port 
8080</ADDRESS>\n</BODY></HTML>\n", 665) = 665

- closing socket in http_main.c:lingering_close() with ap_bclose():

16:07:26.969367 shutdown(3, 1 /* send */) = 0
16:07:26.970772 select(4, [3], NULL, NULL, {2, 0}) = 1 (in [3], left {2, 0})
16:07:26.971476 read(3, "", 512)        = 0
16:07:26.971675 close(3)                = 0

- connecting to PostgreSQL (probably as a post-request cleanup), the
file descriptor returned by kernel is 3:

16:07:26.981715 socket(PF_UNIX, SOCK_STREAM, 0) = 3
16:07:26.982081 connect(3, {sin_family=AF_UNIX, path="/tmp/.s.PGSQL.5432"}, 20) = 0

- some SQL queries skipped

- here the Apache internal cleanup handlers run, closing again socket 3
in alloc.c:socket_cleanup(). Unfortunately they close DBD::Pg's
connection:

16:07:27.086828 rt_sigaction(SIGALRM, {0x80bb4d0, [], SA_RESTORER, 0x42029188}, 
{0x80bb4d0, [], SA_RESTORER|SA_INTERRUPT, 0x42029188}, 8) = 0
16:07:27.087563 close(3)                = 0
16:07:27.087792 time(NULL)              = 1061989647

- waiting and accepting a new request, again on socket 3:

16:07:27.088569 accept(16, {sin_family=AF_INET, sin_port=htons(37418), 
sin_addr=inet_addr("10.0.0.2")}}, [16]) = 3
16:07:27.117229 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
16:07:27.117424 getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), 
sin_addr=inet_addr("10.0.0.2")}}, [16]) = 0
16:07:27.117729 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
16:07:27.117984 read(3, "GET /login HTTP/1.0\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux 
i686; en-US; rv:1.3) Gecko/20030312\r\nAccept: 
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1\r\nAccept-Language:
 en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate,compress;q=0.9\r\nAccept-Charset: 
ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 300\r\nCookie: 
BRICOLAGE=65d4b58545312108b53de34bf37b9cdb\r\nVia: 1.1 yx.primIT.ro:3128 
(Squid/2.4.STABLE6)\r\nX-Forwarded-For: 127.0.0.1\r\nHost: 
yx.primit.ro:8080\r\nCache-Control: max-age=259200\r\nConnection: keep-alive\r\n\r\n", 
4096) = 599

- an SQL ping is probably launched before running a new SQL query to
construct the response. DBD::Pg thinks that it is still connected to
PostgreSQL on socket 3, but Apache stealed its socket:

16:07:27.132452 send(3, "Q \0", 3, 0)   = 3
16:07:27.133113 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_IGN}, 8) = 0

- DBD::Pg remains blocked on this select waiting for an answer from
PostgreSQL, but in reality it is blocked waiting for data from the web
browser:

16:07:27.133344 select(4, [3], [], [3], NULL <unfinished ...>


Based on the above I constructed a simple test case. A mod_perl handler
serves a request and in the post request cleanup phase it opens a file
and keeps it open because it intends to write to it for every following
request (it tries to simulate the cached connection to an expensive
resource, like a connection to PostgreSQL). The handler's code is
below. Configure an URL to access the handler and make two requests for
that URL. After the second request, in the case of a "good" Apache, you
should see in file /tmp/output a second line, "This should go to
/tmp/output.". In the case of a buggy Apache, /tmp/output will contain
only one line, "The first line.".

package TTT;
use Apache::Constants;
use IO::Handle;
our $fh;

sub handler {
    my $r = shift;
    $r->warn("Client file descriptors are: IN=",
             $r->connection->fileno(0),
             ", OUT=", $r->connection->fileno(1),
             ". \$fh file descriptor is: ", (defined $fh ? $fh->fileno : -1));
    $r->send_http_header('text/plain');
    if (defined $fh) {
        die "Can't write to /tmp/output: $!" unless print $fh "This should go to 
/tmp/output.\n";
        $r->print("Check /tmp/output.\n");
    } else {
        $r->print("Now repeat the request.\n");
        $r->register_cleanup(\&open_fh);
    }
    return OK;
}

sub open_fh {
    open $fh, ">/tmp/output" or die "Couldn't open /dev/null: $!";
    $fh->autoflush(1);
    print STDERR "\$fh opened with file descriptor: ", $fh->fileno, "\n";
    print $fh "The first line.\n";
}

1;


Basically the problem is caused by the recent change in Apache 1.3.28
where the connection socket is wrongly closed for the second
time. Between 1.3.27 and 1.3.28 some code was changed this way:

in http_main.c:
-       ap_note_cleanups_for_fd(ptrans, csd);
+       ap_note_cleanups_for_socket_ex(ptrans, csd, 1);

On UNIX systems, ap_bclose from buff.c, calls ap_pclosef() which closes
the file descriptors and "disarms" the cleanup for the just closed file
descriptor so that a close() is not called on an already closed by
Apache file descriptor. This worked OK up to and including 1.3.27
because the pairs matched: ap_note_cleanups_for_fd() when creating a
socket and ap_pclosef() when closing it. Now, in 1.3.28, there are wrong
pairs: ap_note_cleanups_for_socket() which adds a different type of
cleanup than for files, and still ap_pclosef() to close the sockets.

The patch below seems to fix the problem by making ap_bclose() use
ap_pclosesocket() on all platforms when the file descriptor is a
socket. I'm not however an Apache internals guru so take it with
caution. Also, maybe an Apache full audit regarding this issue would be
nice.

Should I open an Apache bug report or could somebody on this list,
closer to Apache team, pass quicker this issue up to them?


Thanks,
Radu Greab


--- apache_1.3.28.orig/src/main/buff.c  Mon Feb  3 19:13:20 2003
+++ apache_1.3.28/src/main/buff.c       Wed Aug 27 18:27:05 2003
@@ -1497,7 +1497,6 @@
        rc1 = ap_bflush(fb);
     else
        rc1 = 0;
-#if defined(WIN32) || defined(NETWARE) || defined(CYGWIN_WINSOCK) 
     if (fb->flags & B_SOCKET) {
        rc2 = ap_pclosesocket(fb->pool, fb->fd);
        if (fb->fd_in != fb->fd) {
@@ -1506,24 +1505,13 @@
        else {
            rc3 = 0;
        }
-    }
-#if !defined(NETWARE) && !defined(CYGWIN_WINSOCK) 
-    else if (fb->hFH != INVALID_HANDLE_VALUE) {
+    } else {
+#if defined(WIN32)
+    if (fb->hFH != INVALID_HANDLE_VALUE) {
         rc2 = ap_pcloseh(fb->pool, fb->hFH);
         rc3 = 0;
     }
-#endif
     else {
-#elif defined(BEOS)
-    if (fb->flags & B_SOCKET) {
-       rc2 = ap_pclosesocket(fb->pool, fb->fd);
-       if (fb->fd_in != fb->fd) {
-           rc3 = ap_pclosesocket(fb->pool, fb->fd_in);
-       }
-       else {
-           rc3 = 0;
-       }
-    } else {
 #endif
        rc2 = ap_pclosef(fb->pool, fb->fd);
        if (fb->fd_in != fb->fd) {
@@ -1532,7 +1520,8 @@
        else {
            rc3 = 0;
        }
-#if defined(WIN32) || defined (BEOS) || defined(NETWARE) || defined(CYGWIN_WINSOCK) 
+    }
+#if defined(WIN32)
     }
 #endif
 
--- apache_1.3.28.orig/src/main/http_main.c     Fri May 23 18:25:24 2003
+++ apache_1.3.28/src/main/http_main.c  Wed Aug 27 19:22:27 2003
@@ -6013,7 +6013,6 @@
         * in our buffers.  If possible, try to avoid a hard close until the
         * client has ACKed our FIN and/or has stopped sending us data.
         */
-       ap_kill_cleanups_for_socket(ptrans, csd);
 
 #ifdef NO_LINGCLOSE
        ap_bclose(conn_io);     /* just close it */

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: mod_perl + Apache 1.3.28

Reply via email to