On Wed, Feb 20, 2008 at 9:13 PM, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2008 at 7:24 PM, HIDEO YAMAUCHI
>
> <[EMAIL PROTECTED]> wrote:
>
>
> > Hi Serge,
> >
> > I confirmed new patch.
> >
> > But, there is a problem as ever.
> >
> > I watched a source of pg_ctl.
> > The following functions are called by the status check.
> >
> > ----------------------pg_ctl.c(from
> PostgreSQL8.3)---------------------------
> > static bool
> > postmaster_is_alive(pid_t pid)
> > {
> > if (pid == getpid())
> > return false;
> > #ifndef WIN32
> > if (pid == getppid())
> > return false;
> > #endif
> > if (kill(pid, 0) == 0)
> > return true;
> > return false;
> > }
> > -----------------------------------------------------------
> >
> > In the source, I carry out kill -0 in the last and seem to know
> confirmation.
> >
> > Therefore, a problem occurs with the following irregular cases.
> >
> > For example...
> >
> > 1)Prepare PGDATA into the shared device.
> > 2)PostgreSQL starts in each node. (Constitution of Actvie/Active)
> > node A --- Postmaster PID 2222.(pgsql1)
> > node B --- Postmaster PID 2222.(pgsql2)
> > 3)Reboot an node A.(reboot -nf)
> > 4)A PID file is in condition to have stayed.
> > 5)Fail-over. (pgsql1)
> > 6)In the node B, there is a process same PID.
> >
> > After all is not it impossible only in pg_ctl?
> >
>
> That's possible. But honestly I don't know how to deal with a such
> rather exceptional situation. Even manual pg_ctl start would fail in
> this case because pg_ctl would find that active process. The only
> option to avoid this would be using "monitor" function (the one that
> uses psql to connect to an instance) as a status function. In this
> case RA will always try to connect to the instance and supposedly
> report the right status.
> What do you think? Should we accept the a small risk or should we
> convert that status into real monitor?
>
Sorry, using "monitor" for "status" is impossible. If PostgreSQL is up
on a virtual IP monitor will always succeed doesn't metter on which
node PostgreSQL is up.
There are some other options like using fuser to check that a process
with a given PID really holds PG_DATA directory opened, simething like
that:
pgsql_status() {
if [ -f $PIDFILE ]
then
PID=`head -n 1 $PIDFILE`
kill -0 $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1
| grep $PID >/dev/null 2>&1
else
: No pid file
false
fi
}
Or using /proc/$PID/cwd to check that it's a symlink to the correct
PG_DATA but I'm not sure that it worth it.
--
Serge Dubrouski.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems