Re: [GENERAL] What could keep a connection / query alive?

2011-03-29 Thread Derrick Rice
On Tue, Mar 29, 2011 at 2:54 PM, Derrick Rice wrote:

>
>> Try trussing the backend process.  You may find it in a network IO wait
>> trying to send data to a client that is hung or over a socket that was
>> timed out by a firewall or network equipment.
>>
>> Such a condition will cause the backend to be unable to hear the
>> cancel.  The statement will still show as running in pg_stat_activity.
>>
>> SIGTERM on such a backend will probably also fall on deaf ears.
>
>
> I'm aware of that condition, which is exactly what the keepalive settings
> are supposed to detect.
>

So I spent some time reading Linux-2.6 TCP code and my previous statement is
downright wrong.  Keepalive is only in use when there is no data
unacknowledged and no data to send.  Retransmission timeouts are in use for
those other scenarios.

In any case, I would have expected a retransmission timeout.  My new
hypothesis based on output from `ss' is that a firewall, NAT, or VPN of my
users is putting the connection into persist mode (setting the window size
to 0) when the end point of the connection is unresponsive.  Furthermore, I
think that firewall is continuing to respond to the persist probes of my
machine until it finally decides that the end point is gone.  At which point
it might be ignoring future probes, starting the retransmission timeouts for
my machine.

So I'm not looking for any further help here, since this isn't a PostgreSQL
issue.  If I resolve the problem I'll let you all know just for
entertainment purposes :)

Thanks

Derrick


Re: [GENERAL] What could keep a connection / query alive?

2011-03-29 Thread Derrick Rice
On Tue, Mar 29, 2011 at 3:17 AM, Jerry Sievers wrote:

>
> > What can cause this?? Why would these tcp and statement timeout settings
> not terminate the backend?
>
> Try trussing the backend process.  You may find it in a network IO wait
> trying to send data to a client that is hung or over a socket that was
> timed out by a firewall or network equipment.
>
> Such a condition will cause the backend to be unable to hear the
> cancel.  The statement will still show as running in pg_stat_activity.
>
> SIGTERM on such a backend will probably also fall on deaf ears.


I'm aware of that condition, which is exactly what the keepalive settings
are supposed to detect.

# strace -p 32307
Process 32307 attached - interrupt to quit
send(6, "\252\0\17\0\0\0\01042810425\0\0\0\01010010333\0\0\0\27"..., 880, 0

The client is remote (not unix-domain socket) so I expect tcp_keepalive
settings to kill this connection after 32 minutes.  That's not happening.
Not sure where else to look.

Derrick


Re: [GENERAL] What could keep a connection / query alive?

2011-03-29 Thread Jerry Sievers
Derrick Rice  writes:

> I'm investigating (using 8.2) an instance of a database client
> connection remaining open in a single query well past statement
> timeout settings.? I understand that severed TCP connections can cause
> the backend to hang until the connection is closed, but our tcp
> keepalive settings should have recognized that condition after about
> 30 minutes.? The connection and backend didn't terminate for nearly 90
> minutes.
>
> What can cause this?? Why would these tcp and statement timeout settings not 
> terminate the backend?

Try trussing the backend process.  You may find it in a network IO wait
trying to send data to a client that is hung or over a socket that was
timed out by a firewall or network equipment.

Such a condition will cause the backend to be unable to hear the
cancel.  The statement will still show as running in pg_stat_activity.

SIGTERM on such a backend will probably also fall on deaf ears.

This has been my experience several times in an environment of
EnterpriseDB 8.2 systems on Solaris 10.  

YMMV
>

-- 
Jerry Sievers
e: gsiever...@comcast.net
p: 305.321.1144

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general