Hi Serge,
This revision seems to be able to break off the problem.
>pgsql_status() {
>if [ -f $PIDFILE ]
>then
>PID=`head -n 1 $PIDFILE`
>kill -0 $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1
>| grep $PID >/dev/null 2>&1
>else
>: No pid file
>false
>fi
>}
I confirm it and report it by the end of tomorrow.
Regards,
Hideo Yamauchi.
--- Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2008 at 9:13 PM, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> > On Wed, Feb 20, 2008 at 7:24 PM, HIDEO YAMAUCHI
> >
> > <[EMAIL PROTECTED]> wrote:
> >
> >
> > > Hi Serge,
> > >
> > > I confirmed new patch.
> > >
> > > But, there is a problem as ever.
> > >
> > > I watched a source of pg_ctl.
> > > The following functions are called by the status check.
> > >
> > > ----------------------pg_ctl.c(from
> > PostgreSQL8.3)---------------------------
> > > static bool
> > > postmaster_is_alive(pid_t pid)
> > > {
> > > if (pid == getpid())
> > > return false;
> > > #ifndef WIN32
> > > if (pid == getppid())
> > > return false;
> > > #endif
> > > if (kill(pid, 0) == 0)
> > > return true;
> > > return false;
> > > }
> > > -----------------------------------------------------------
> > >
> > > In the source, I carry out kill -0 in the last and seem to know
> > confirmation.
> > >
> > > Therefore, a problem occurs with the following irregular cases.
> > >
> > > For example...
> > >
> > > 1)Prepare PGDATA into the shared device.
> > > 2)PostgreSQL starts in each node. (Constitution of Actvie/Active)
> > > node A --- Postmaster PID 2222.(pgsql1)
> > > node B --- Postmaster PID 2222.(pgsql2)
> > > 3)Reboot an node A.(reboot -nf)
> > > 4)A PID file is in condition to have stayed.
> > > 5)Fail-over. (pgsql1)
> > > 6)In the node B, there is a process same PID.
> > >
> > > After all is not it impossible only in pg_ctl?
> > >
> >
> > That's possible. But honestly I don't know how to deal with a such
> > rather exceptional situation. Even manual pg_ctl start would fail in
> > this case because pg_ctl would find that active process. The only
> > option to avoid this would be using "monitor" function (the one that
> > uses psql to connect to an instance) as a status function. In this
> > case RA will always try to connect to the instance and supposedly
> > report the right status.
> > What do you think? Should we accept the a small risk or should we
> > convert that status into real monitor?
> >
>
> Sorry, using "monitor" for "status" is impossible. If PostgreSQL is up
> on a virtual IP monitor will always succeed doesn't metter on which
> node PostgreSQL is up.
>
> There are some other options like using fuser to check that a process
> with a given PID really holds PG_DATA directory opened, simething like
> that:
>
> pgsql_status() {
> if [ -f $PIDFILE ]
> then
> PID=`head -n 1 $PIDFILE`
> kill -0 $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1
> | grep $PID >/dev/null 2>&1
> else
> : No pid file
> false
> fi
> }
>
> Or using /proc/$PID/cwd to check that it's a symlink to the correct
> PG_DATA but I'm not sure that it worth it.
>
> --
> Serge Dubrouski.
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems