Am 20.01.2015 um 08:45 schrieb Ruediger Pluem:
On 01/19/2015 11:40 PM, Rainer Jung wrote:
I noticed a hanging child process on our ASF server aurora.
It currently uses 2.4.11 (plus the post tag commit) and event MPM.
Most processes exiting due to MaxConnectionsPerChild get cleaned up after some 
time but this one doesn't. It now hangs
for more than an hour. I'll let it hang. In case anyone has a good question I 
can answer with gdb let me know.

It shows a strange connection view in the server status table:

PID     Connections     Threads Async connections
total   accepting       busy    idle    writing keep-alive      closing
93557   1       yes     0       0       0       0       0

So it has 1 connection, but 0s in all other columns.

The connection can be seen by lsof:

  FD     TYPE             DEVICE   SIZE/OFF   NODE NAME
txt     VREG     183,3400335528   36497117 275235 
/x1/www/archive.apache.org/dist/cordova/cordova-3.4.0-src.zip
   9u    PIPE 0xfffffe061ecfab60      16384        ->0xfffffe061ecfacb8
  10u    PIPE 0xfffffe061ecfacb8          0        ->0xfffffe061ecfab60
  24u  KQUEUE 0xfffffe033071be00                   count=0, state=0x2
  41u    IPv4 0xfffffe01316243d0        0t0    TCP 
127.0.0.1:35849->127.0.0.1:8050 (CLOSE_WAIT)
  83u    IPv4 0xfffffe0255d08b70        0t0    TCP 
127.0.0.1:52023->127.0.0.1:8050 (CLOSE_WAIT)
108u    IPv4 0xfffffe09990eeb70        0t0    TCP 
127.0.0.1:22532->127.0.0.1:8050 (CLOSE_WAIT)

This is the established connectioN:

110u    IPv4 0xfffffe0255ab4b70        0t0    TCP 
192.87.106.229:http->179.206.174.192:65496 (ESTABLISHED)

And this is likely the file being served on that connection:

126r    VREG     183,3400335528   36497117 275235 
/x1/www/archive.apache.org/dist/cordova/cordova-3.4.0-src.zip
156u    IPv4 0xfffffe048d0ff3d0        0t0    TCP 
127.0.0.1:26685->127.0.0.1:8050 (CLOSE_WAIT)
229u    IPv4 0xfffffe0131d013d0        0t0    TCP 
127.0.0.1:31538->127.0.0.1:8050 (CLOSE_WAIT)

netstat shows:

Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4       0  87650 192.87.106.229.80      179.206.174.192.65496 ESTABLISHED

so there's 87650 bytes in the send-q. Most lilely the client hans't acked what 
we send.

Isn't it weird that the connection remains in this state for an hour? I would 
guess the OS tries to resent whats in the
buffer and if doesn't get  ACK'ed it would somehow timeout the TCP connection 
and assume the peer is gone.

I was misguided by the server-status table. I have another child process now with one open connection, all zeroes in the other server status columns and for that I checked network and process activity. And indeed ktrace and tcpdump both show, that the process still does send some data every few seconds. It just takes very long for the big file served and the low rate of transfer. Thanks for putting me on the right track. I will check for a few more "hanging" children during the next days whether they actually still serve files.

So it all boils down to why the connection isn't shown in any of the state columns in the server status table. Plus it could be interesting to limit the time a closing child still serves requests, because it keeps blocking a scoreboard entry and aborting one or two requests every now and them might be better than running out of children.

Regards,

Rainer

Reply via email to