Re: [opendbx] FW: [BUGS] BUG #5837: PQstatus() fails to report lost connection

Murray S. Kucherawy Tue, 25 Jan 2011 14:44:55 -0800

> -----Original Message-----
> From: Norbert Sendetzky [mailto:norb...@linuxnetworks.de]
> Sent: Tuesday, January 25, 2011 2:30 PM
> To: OpenDBX devel list
> Subject: Re: [opendbx] FW: [BUGS] BUG #5837: PQstatus() fails to report lost 
> connection
> 
> > This will require some changes to lib/backends/pgsql_basic.c and/or
> > the OpenDBX documentation, I'm afraid.
> 
> Could you explain to me in detail where the changes must happen? Then I
> will see to make an update soon.


The background: A connection is established, some queries are run and return 
successfully.  Then postgresql is deliberately restarted.

Here's what you're doing.  A call to odbx_query() after the restart returns 
normally (which is strange by itself...).  In odbx_result(), you:

- call PGgetResult(), it returns non-NULL
- call PGgetResultStatus(), it reports PGRES_FATAL_ERROR
- call PQstatus(), it returns CONNECTION_OK
- you return -1

Then I call odbx_error_type(), which remembers that the handle got 
PGRES_FATAL_ERROR but also got CONNECTION_OK, so it just returns 1 (no 
reconnect required).  The reason PQstatus() appears to give a false result is 
because the TCP part of the connection was fine (there was no I/O error; EOF 
hasn't been reached).  Interestingly enough if you ask for the error string 
matching the fatal error at this point, it does tell you that the connection 
has been reset by administrator action.

The problem now is that, since no reconnect is attempted, all future queries 
fail on that handle.

The issue is that PQstatus() relies on internal state that has not been updated 
to reflect that the connection is dead.  According to the libpq people, that's 
because you didn't repeat PGgetResult() until it returned NULL.    I guess on 
an administrative restart of the server, all connections are notified of this, 
so there's I/O pending for read on the socket at the client.  odbx_result() 
causes this message to be read, but EOF isn't reached yet so PQstatus() 
continues to show the connection as usable.  So apparently you have to call 
PGgetResult() again anyway, even though PGgetResultStatus() has indicated a 
fatal error.  I agree that this seems strange; I likened it to select() 
returning an indication that some descriptor is read-ready but also has an 
exception, and then stating that the user needs to call read() repeatedly until 
EOF just to get errno to tell you what happened.

So unless they have a change of heart and actually fix this, it sounds like 
you're going to have to call PGgetResult() repeatedly, caching all the possible 
results, on the first call to odbx_result(), and then pull them out one at a 
time when the user calls it again.  That's the only way PQstatus() will tell 
you the truth on a fatal error.

Or, you could "pass the buck" and require your users to do the same thing as 
libpq, namely keep calling odbx_result() until the end of the result set is 
reached, so that you get the "true" PQstatus() value and then the user can use 
odbx_error_type() with correct results.

Neither solution is especially pretty.

> Thanks for your help

My pleasure.  Sorry for the bad news.   :-)

-MSK


------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
libopendbx-devel mailing list
libopendbx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libopendbx-devel
http://www.linuxnetworks.de/doc/index.php/OpenDBX

Re: [opendbx] FW: [BUGS] BUG #5837: PQstatus() fails to report lost connection

Reply via email to