On 2/24/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On 2/23/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > On 2/23/07, Keisuke MORI <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > We have found a several problems with pgsql RA through our testing.
> > > It 'fails to failover' in some scenarios.
> > > I'm proposing a patch to fix them.
> > >
> > > Problem description:
> > >
> > > 1) The first 'monitor' may fail even if the postmaster was
> > >    successfully launched.
> > >
> > >    This is because 'start' of the pgsql may return before the
> > >    postmaster gets ready to answer to a psql query issued by
> > >    'monitor', since it only checks the existance of postmaster
> > >    process. The postmaster can take a few minitues to get ready
> > >    to answer, particularly when it needs to recover the database
> > >    after a crash. Even if no recovery is necessary, we observed
> > >    that it sometimes fails in some of our test cases.
> > >
> > > 2) The postmaster fails to startup when 'postmaster.pid' file
> > >    was left over from the previous crash.
> > >
> > > 3) 'stop' doest not execute the fast mode shutdown effectively,
> > >    because it executes the immediate mode shutdown at the very
> > >    next moment.  The fast mode shutdown can take a few minutes
> > >    to complete to flush the database log.
> > >
> > >    This isn't a critical problem, but it may result to take a
> > >    time longer to complete the failover (according to our
> > >    database team). It is preferable to wait to complete the fast
> > >    mode shutdown as long as possible.
> > >
> > >
> > > Proposals to fix:
> > >
> > > 1) In 'start', wait until the postmaster gets ready to answer by
> > >    checking as same as 'monitor' does.
> > >    The maximum wait time to complete to startup can be
> > >    customized by an additional parameter 'start_wait'.
> > >
> > > 2) Add a cleanup code for 'postmaster.pid' when stop and before starting.
> > >
> > > 3) In 'stop', wait until the postmaster completes to the fast
> > >    mode shutdown.
> > >    The maximum wait time to complete to shutdown can be
> > >    customized by an additional parameter 'stop_wait.
> > >
> > >
> > > The attached patch is for the latest -dev.
> >
> > I'd be more inclined to go with something like the patch below.
> >
> > The function of start_wait and stop_wait is just as easily achieved by
> > setting the action's timeout.  Its also harder to mess up (ie. by
> > setting start_wait to longer than the start action's timeout).
> >
> > diff -r 959f2c429fc3 resources/OCF/pgsql.in
> > --- a/resources/OCF/pgsql.in    Fri Feb 23 10:59:12 2007 +0100
> > +++ b/resources/OCF/pgsql.in    Fri Feb 23 12:18:53 2007 +0100
> > @@ -197,15 +197,12 @@ pgsql_start() {
> >        return $OCF_ERR_GENERIC
> >     fi
> >
> > -    if ! pgsql_status
> > -    then
> > -       sleep 5
> > -       if ! pgsql_status
> > -       then
> > -           echo "ERROR: PostgreSQL is not running!"
> > -            return $OCF_ERR_GENERIC
> > -       fi
> > -    fi
> > +
> > +    active=0
> > +    while [ $active != 0 ]; do
> > +       pgsql_monitor
> > +       active=$?
> > +    done
>
> So if for some reason PostgreSQL fails to start we'll have an endless
> loop here. Am I right?

only until the action's timeout is reached and the LRM terminates the action

Actually it'll never get into that loop:

active=0
while [ $active != 0 ]; do

Do you see why? :-)



>
> >
> >     return $OCF_SUCCESS
> >  }
> > @@ -227,6 +224,13 @@ pgsql_stop() {
> >        runasowner "$PGCTL -D $PGDATA stop -m immediate > /dev/null 2>&1"
> >     fi
> >
> > +    active=$OCF_NOT_RUNNING
> > +    while [ $active != $OCF_NOT_RUNNING ]; do
> > +       pgsql_monitor
> > +       active=$?
> > +    done
>
> And here.
>
> > +
> > +    rm -f $PIDFILE
> >     return $OCF_SUCCESS
> >  }
> > _______________________________________________________
> > Linux-HA-Dev: [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> >
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to