Jeroen T. Vermeulen escribió: > On Wed, November 15, 2006 07:03, Leandro Lucarella wrote: > >> 1) The first problem I found is that when I stop the postgresql server >> muy program receives a SIGPIPE when doing, for example, a >> transaction.exec(). I was expecting an exception. > > I gather you're running the backend and your program on the same machine,
Not allways, I use it both on the same machine and on another one, both with the same problem. > otherwise you shouldn't see SIGPIPE at all. The way it normally works is > this: > > 1. Your backend goes down, dropping its end of the connecting socket. > > 2. The C API, libpq, gets an error return code on the next attempt to use > the socket and handles it by noting that the connection has died. No, the C API, libpq, does not use MSG_NOPIPE when send()ing and recv()ing (I've checked the source code), so when the other end of the connection goes down, a SIGPIPE signal is sent to the process. The only way (I know) libpq could return an error code when this happend is adding MSG_NOPIPE flag to send() and recv() calls. > 3. When libpqxx sees this, it throws broken_connection. It doesn't > involve itself with signals at all, which helps portability and reduces > the risk of interfering with your program. Of course is not libpqxx who is raising the signal, is the OS itself. > So the place to look for detailed documentation on signal handling is the > libpq documentation. But I guess this is an issue that the libpqxx docs > should at least mention. Agree, I've checked the FAQ, The Toubleshooting section[1] (question "Why does my program crash when it fails to connect to the database?") and doesn't mention this issue. [1] http://thaiopensource.org/development/libpqxx/wiki/FaqTroubleshooting > I haven't tried killing the backend while a libpq/libpqxx client was > locally connected, so I haven't run across the SIGPIPE. I insist this has nothing to do with running locally or remotely (at least if "backend" is what I think it is, the postgresql server, but maybe I'm wrong, I'm new to postgresql). > As long as you > don't let it terminate your program, however (just set it to SIG_IGN, for > example) you should get the exception you were expecting. Yes, that's what I plan to do, but I wanted to check if is there any more elegant solution, to tell libpqxx to tell libpq to use MSG_NOPIPE =) and/or to check if this is a known issue and to collect others experience. >> 3) When the connection is lost because a network problem, the libpqxx >> methods (like transaction.exec()) keeps waiting way too long for the >> connection to be reestablished and then fails after a long time with >> SIGPIPE again (but without the "FATAL" error message). > > This can be a symptom of two known problems: > > 1. There used to be a bug in libpq where only "broken pipe" was recognized > as terminating a connection, but there's a separate error code for > timeouts. > > This one was actually discovered as a result of another libpqxx user > running into the long timeout, so libpqxx could possibly be doing > something to make it worse. There certainly is a lot of retry logic in > there. On the other hand it was libpqxx's error handling that made it > possible to pinpoint the problem, so perhaps that is the reason it wasn't > fixed before. > > IIRC the bug was fixed in updates of all supported major versions around > the time 8.1 came out. I'm using postgresql 8.1 and libpqxx 2.6.8. I can discard this possibility? > 2. Your OS may simply be taking a long time to give up on a network > connection. There's nothing I can do about it, but you can. See below. Well, there is. A lot of programs use a keep-alive to test the connection bypassing the long TCP timeouts. Its a hack, I know, but is all an application layer can do with TCP =) >> I know none of this problems are really from libpqxx: 1) is because >> libpq don't use MSG_NOSIGNAL flag when send()ing or recv()ing data with >> the socket (I know this is probably a feature, not a bug, but I think it >> would be great and much more C++-friendly if you could raise an >> exception instead of catching a singal). > > It would, but libpqxx already throws the exception and all your program > should need to do is stop the program from terminating when the signal > arrives. I think it's really up to the main program to decide what to do > about signals. If every library it links in feels free to mess with > signal handling, where does it end? Is not exactly "mess with signal handling" in the sense you don't even need to install a signal handler to avoid this, you just have to add a flag to send() and recv(). The library provides a layer of abstraction, and I don't care if the connection is out because a SIGPIPE or what, I don't even care if the connection use TCP or a message queue or shared memory to talk to the server. All I care is the connection is lost, and this should be informed with an exception no matter what method are you using to talk to the server. But I now this is a hard topic to agree on, and anyways is not a libpqxx issue (or is not an issue libpqxx could fix without support in libpq). > So I think the best I can do about this is to document it. Fair enough. >> 2) is again libpq's fault, but >> is there any way to tell libpq to be quiet? > > Sure. Just create a nonnoticer object and pass an auto_ptr referencing it > to the connection's set_noticer() function. Great! Thanks. >> 3) I guess is just TCP's >> fault to be so badass and wait that long, but what about a keep-alive + >> TTL to try to figure out when the connection is lost in a shorter time >> (like the "connect_timeout" parameter in the connection string, which >> works only when connecting, but not when doing a query for example). > > There are ways of changing how your kernel sees timeouts without messing > with the IP packets, but they'll be OS-dependent. See ip(7) and tcp(7). > > >> I'm open to suggestions, both workarrounds for my code and enhancements >> to libpqxx/libpq. > > Three recommendations: set SIG_PIPE to SIG_IGN; ensure your libpq is up to > date; and if you still have the slow timeouts after that, mess with your > networking stack (very carefully of course) to make it give up faster. So the keep-alive solution is discarded? I don't like to mess arround with the TCP general configuration because postgres is not the only service in the machine and I other services don't need so short timeouts. Thanks for your time. -- Leandro Lucarella Integratech S.A. 4571-5252 _______________________________________________ Libpqxx-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/libpqxx-general
