Attached is the patch in the way that I like it to be.
On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
And I don't like the idea of removing PID in "start" function. The standard approach if to remove it after stopping application. Other way it could lead to attempt of starting a second copy of application. On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > I like the idea of the patch, but honestly I don't like how it's > implemented. It shall call (as Andrew suggested) "monitor" function to > check that pgsql is up or down instead of spreading the same code all > around the script. I'd like to review the idea and prepare another > patch if everybody is agree. > > On 2/23/07, Keisuke MORI <[EMAIL PROTECTED]> wrote: > > Hi, > > > > We have found a several problems with pgsql RA through our testing. > > It 'fails to failover' in some scenarios. > > I'm proposing a patch to fix them. > > > > Problem description: > > > > 1) The first 'monitor' may fail even if the postmaster was > > successfully launched. > > > > This is because 'start' of the pgsql may return before the > > postmaster gets ready to answer to a psql query issued by > > 'monitor', since it only checks the existance of postmaster > > process. The postmaster can take a few minitues to get ready > > to answer, particularly when it needs to recover the database > > after a crash. Even if no recovery is necessary, we observed > > that it sometimes fails in some of our test cases. > > > > 2) The postmaster fails to startup when 'postmaster.pid' file > > was left over from the previous crash. > > > > 3) 'stop' doest not execute the fast mode shutdown effectively, > > because it executes the immediate mode shutdown at the very > > next moment. The fast mode shutdown can take a few minutes > > to complete to flush the database log. > > > > This isn't a critical problem, but it may result to take a > > time longer to complete the failover (according to our > > database team). It is preferable to wait to complete the fast > > mode shutdown as long as possible. > > > > > > Proposals to fix: > > > > 1) In 'start', wait until the postmaster gets ready to answer by > > checking as same as 'monitor' does. > > The maximum wait time to complete to startup can be > > customized by an additional parameter 'start_wait'. > > > > 2) Add a cleanup code for 'postmaster.pid' when stop and before starting. > > > > 3) In 'stop', wait until the postmaster completes to the fast > > mode shutdown. > > The maximum wait time to complete to shutdown can be > > customized by an additional parameter 'stop_wait. > > > > > > The attached patch is for the latest -dev. > > > > Regards, > > > > Keisuke MORI > > NTT DATA Intellilink Corporation > > > > > > _______________________________________________________ > > Linux-HA-Dev: [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > > Home Page: http://linux-ha.org/ > > > > > > >
pgsql.in.patch
Description: Binary data
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
