And I don't like the idea of removing PID in "start" function. The standard approach if to remove it after stopping application. Other way it could lead to attempt of starting a second copy of application.
On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
I like the idea of the patch, but honestly I don't like how it's implemented. It shall call (as Andrew suggested) "monitor" function to check that pgsql is up or down instead of spreading the same code all around the script. I'd like to review the idea and prepare another patch if everybody is agree. On 2/23/07, Keisuke MORI <[EMAIL PROTECTED]> wrote: > Hi, > > We have found a several problems with pgsql RA through our testing. > It 'fails to failover' in some scenarios. > I'm proposing a patch to fix them. > > Problem description: > > 1) The first 'monitor' may fail even if the postmaster was > successfully launched. > > This is because 'start' of the pgsql may return before the > postmaster gets ready to answer to a psql query issued by > 'monitor', since it only checks the existance of postmaster > process. The postmaster can take a few minitues to get ready > to answer, particularly when it needs to recover the database > after a crash. Even if no recovery is necessary, we observed > that it sometimes fails in some of our test cases. > > 2) The postmaster fails to startup when 'postmaster.pid' file > was left over from the previous crash. > > 3) 'stop' doest not execute the fast mode shutdown effectively, > because it executes the immediate mode shutdown at the very > next moment. The fast mode shutdown can take a few minutes > to complete to flush the database log. > > This isn't a critical problem, but it may result to take a > time longer to complete the failover (according to our > database team). It is preferable to wait to complete the fast > mode shutdown as long as possible. > > > Proposals to fix: > > 1) In 'start', wait until the postmaster gets ready to answer by > checking as same as 'monitor' does. > The maximum wait time to complete to startup can be > customized by an additional parameter 'start_wait'. > > 2) Add a cleanup code for 'postmaster.pid' when stop and before starting. > > 3) In 'stop', wait until the postmaster completes to the fast > mode shutdown. > The maximum wait time to complete to shutdown can be > customized by an additional parameter 'stop_wait. > > > The attached patch is for the latest -dev. > > Regards, > > Keisuke MORI > NTT DATA Intellilink Corporation > > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > > >
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
