On Wed, November 15, 2006 07:03, Leandro Lucarella wrote:

> 1) The first problem I found is that when I stop the postgresql server
> muy program receives a SIGPIPE when doing, for example, a
> transaction.exec(). I was expecting an exception.

I gather you're running the backend and your program on the same machine,
otherwise you shouldn't see SIGPIPE at all.  The way it normally works is
this:

1. Your backend goes down, dropping its end of the connecting socket.

2. The C API, libpq, gets an error return code on the next attempt to use
the socket and handles it by noting that the connection has died.

3. When libpqxx sees this, it throws broken_connection.  It doesn't
involve itself with signals at all, which helps portability and reduces
the risk of interfering with your program.

So the place to look for detailed documentation on signal handling is the
libpq documentation.  But I guess this is an issue that the libpqxx docs
should at least mention.

I haven't tried killing the backend while a libpq/libpqxx client was
locally connected, so I haven't run across the SIGPIPE.  As long as you
don't let it terminate your program, however (just set it to SIG_IGN, for
example) you should get the exception you were expecting.


> 2) Another thing I don't like is a message is written to the standard
> output ("FATAL: terminating connection due to administrator command"),
> even when I'm a good boy and handle the SIGPIPE signal =). This is
> really not that important but I think it's not a good thing for a
> library to print messages to the standard output when you have the error
> codes to do whatever you feel is better to handle the error.

Actually, the message goes to the "notice processor," a callback function
for handling errors and warnings.  The default notice processor prints to
stderr, and libpqxx is not involved in the process.  You can set a
different notice processor if you like, however.


> 3) When the connection is lost because a network problem, the libpqxx
> methods (like transaction.exec()) keeps waiting way too long for the
> connection to be reestablished and then fails after a long time with
> SIGPIPE again (but without the "FATAL" error message).

This can be a symptom of two known problems:

1. There used to be a bug in libpq where only "broken pipe" was recognized
as terminating a connection, but there's a separate error code for
timeouts.

This one was actually discovered as a result of another libpqxx user
running into the long timeout, so libpqxx could possibly be doing
something to make it worse.  There certainly is a lot of retry logic in
there.  On the other hand it was libpqxx's error handling that made it
possible to pinpoint the problem, so perhaps that is the reason it wasn't
fixed before.

IIRC the bug was fixed in updates of all supported major versions around
the time 8.1 came out.

2. Your OS may simply be taking a long time to give up on a network
connection.  There's nothing I can do about it, but you can.  See below.


> I know none of this problems are really from libpqxx: 1) is because
> libpq don't use MSG_NOSIGNAL flag when send()ing or recv()ing data with
> the socket (I know this is probably a feature, not a bug, but I think it
> would be great and much more C++-friendly if you could raise an
> exception instead of catching a singal).

It would, but libpqxx already throws the exception and all your program
should need to do is stop the program from terminating when the signal
arrives.  I think it's really up to the main program to decide what to do
about signals.  If every library it links in feels free to mess with
signal handling, where does it end?

So I think the best I can do about this is to document it.


> 2) is again libpq's fault, but
> is there any way to tell libpq to be quiet?

Sure.  Just create a nonnoticer object and pass an auto_ptr referencing it
to the connection's set_noticer() function.


> 3) I guess is just TCP's
> fault to be so badass and wait that long, but what about a keep-alive +
> TTL to try to figure out when the connection is lost in a shorter time
> (like the "connect_timeout" parameter in the connection string, which
> works only when connecting, but not when doing a query for example).

There are ways of changing how your kernel sees timeouts without messing
with the IP packets, but they'll be OS-dependent. See ip(7) and tcp(7).


> I'm open to suggestions, both workarrounds for my code and enhancements
> to libpqxx/libpq.

Three recommendations: set SIG_PIPE to SIG_IGN; ensure your libpq is up to
date; and if you still have the slow timeouts after that, mess with your
networking stack (very carefully of course) to make it give up faster.


Jeroen


_______________________________________________
Libpqxx-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general

Reply via email to