Sorry, I just found that my version won't work properly on Solaris. Attached is the corrected one. Sorry for creating so many messages :-)
On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
Attached is the patch in the way that I like it to be. On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > And I don't like the idea of removing PID in "start" function. The > standard approach if to remove it after stopping application. Other > way it could lead to attempt of starting a second copy of application. > > On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > > I like the idea of the patch, but honestly I don't like how it's > > implemented. It shall call (as Andrew suggested) "monitor" function to > > check that pgsql is up or down instead of spreading the same code all > > around the script. I'd like to review the idea and prepare another > > patch if everybody is agree. > > > > On 2/23/07, Keisuke MORI <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > We have found a several problems with pgsql RA through our testing. > > > It 'fails to failover' in some scenarios. > > > I'm proposing a patch to fix them. > > > > > > Problem description: > > > > > > 1) The first 'monitor' may fail even if the postmaster was > > > successfully launched. > > > > > > This is because 'start' of the pgsql may return before the > > > postmaster gets ready to answer to a psql query issued by > > > 'monitor', since it only checks the existance of postmaster > > > process. The postmaster can take a few minitues to get ready > > > to answer, particularly when it needs to recover the database > > > after a crash. Even if no recovery is necessary, we observed > > > that it sometimes fails in some of our test cases. > > > > > > 2) The postmaster fails to startup when 'postmaster.pid' file > > > was left over from the previous crash. > > > > > > 3) 'stop' doest not execute the fast mode shutdown effectively, > > > because it executes the immediate mode shutdown at the very > > > next moment. The fast mode shutdown can take a few minutes > > > to complete to flush the database log. > > > > > > This isn't a critical problem, but it may result to take a > > > time longer to complete the failover (according to our > > > database team). It is preferable to wait to complete the fast > > > mode shutdown as long as possible. > > > > > > > > > Proposals to fix: > > > > > > 1) In 'start', wait until the postmaster gets ready to answer by > > > checking as same as 'monitor' does. > > > The maximum wait time to complete to startup can be > > > customized by an additional parameter 'start_wait'. > > > > > > 2) Add a cleanup code for 'postmaster.pid' when stop and before starting. > > > > > > 3) In 'stop', wait until the postmaster completes to the fast > > > mode shutdown. > > > The maximum wait time to complete to shutdown can be > > > customized by an additional parameter 'stop_wait. > > > > > > > > > The attached patch is for the latest -dev. > > > > > > Regards, > > > > > > Keisuke MORI > > > NTT DATA Intellilink Corporation > > > > > > > > > _______________________________________________________ > > > Linux-HA-Dev: [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > > > Home Page: http://linux-ha.org/ > > > > > > > > > > > >
pgsql.in.patch
Description: Binary data
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
