Re: [libpqxx-general] SIGPIPE received when connection is lost because the server is down

Jeroen T. Vermeulen Wed, 15 Nov 2006 23:08:39 -0800

On Thu, November 16, 2006 00:00, Leandro Lucarella wrote:

>> otherwise you shouldn't see SIGPIPE at all.  The way it normally works
>> is
>> this:
>>
>> 1. Your backend goes down, dropping its end of the connecting socket.
>>
>> 2. The C API, libpq, gets an error return code on the next attempt to
>> use
>> the socket and handles it by noting that the connection has died.
>
> No, the C API, libpq, does not use MSG_NOPIPE when send()ing and
> recv()ing (I've checked the source code), so when the other end of the
> connection goes down, a SIGPIPE signal is sent to the process.


But AFAICS the send/recv operation should *also*, after the signal has
been handled, fail and return an errno that describes the situation.  What
libpq does is just check for a negative return value and read errno.


>> I haven't tried killing the backend while a libpq/libpqxx client was
>> locally connected, so I haven't run across the SIGPIPE.
>
> I insist this has nothing to do with running locally or remotely (at
> least if "backend" is what I think it is, the postgresql server, but
> maybe I'm wrong, I'm new to postgresql).

Yes, "backend" is the server process.  It's been over a year since I last
looked into this particular bit of error handling in libpq, so I'm a bit
fuzzy on the details.  It's the E* error codes, not the SIG* signals that
matter here--and IIRC there are separate ones for broken Unix-domain
connections and broken TCP connections.  If that is the case, the signal
may be the same for both cases even if the errno codes are different.

Shock horror update: it looks like the fix for the libpq bug I mentioned
did not make it into CVS somehow!  Check pqReadData() and pqSendSome()
here:

http://developer.postgresql.org/cvsweb.cgi/pgsql/src/interfaces/libpq/fe-misc.c?rev=1.130;content-type=text%2Fx-cvsweb-markup

The default way of handling errors there, apart from a series of known
error codes, is still to issue an error message but leave the connection
in "CONNECTION_OK" state.  Unless it's been handled elsewhere, that could
hide some types of connection error and possibly make libpqxx and libpq
itself go on trying for longer than necessary.

See first discussion here (I can't find a followup discussion I do
remember taking place):

http://www.nabble.com/libpq-and-connection-failures-tf123204.html#a339088



> Yes, that's what I plan to do, but I wanted to check if is there any
> more elegant solution, to tell libpqxx to tell libpq to use MSG_NOPIPE
> =) and/or to check if this is a known issue and to collect others
> experience.

Possibly the core team felt that applications might want to see the
signals for itself.  You may have to delve into the main postgres mailing
lists.  I did some digging myself, and I fear there may be a lot more
coming.  :-/


>> IIRC the bug was fixed in updates of all supported major versions around
>> the time 8.1 came out.
>
> I'm using postgresql 8.1 and libpqxx 2.6.8. I can discard this
> possibility?

Now that I see that the fix is not in CVS as I thought, no.  :-(


>> Three recommendations: set SIG_PIPE to SIG_IGN; ensure your libpq is up
>> to
>> date; and if you still have the slow timeouts after that, mess with your
>> networking stack (very carefully of course) to make it give up faster.
>
> So the keep-alive solution is discarded? I don't like to mess arround
> with the TCP general configuration because postgres is not the only
> service in the machine and I other services don't need so short timeouts.

I could try to build some form of keepalive support, but I don't have much
time to work on it at the moment.  There does seem to be some keepalive
mechanism in libpq; I guess that using it would also require help from the
application.


Jeroen


_______________________________________________
Libpqxx-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general

Re: [libpqxx-general] SIGPIPE received when connection is lost because the server is down

Reply via email to