On Wed, Feb 20, 2008 at 9:13 PM, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2008 at 7:24 PM, HIDEO YAMAUCHI
>
> <[EMAIL PROTECTED]> wrote:
>
>
> > Hi Serge,
>  >
>  >  I confirmed new patch.
>  >
>  >  But, there is a problem as ever.
>  >
>  >  I watched a source of pg_ctl.
>  >  The following functions are called by the status check.
>  >
>  >  ----------------------pg_ctl.c(from 
> PostgreSQL8.3)---------------------------
>  >  static bool
>  >  postmaster_is_alive(pid_t pid)
>  >  {
>  >         if (pid == getpid())
>  >                 return false;
>  >  #ifndef WIN32
>  >         if (pid == getppid())
>  >                 return false;
>  >  #endif
>  >         if (kill(pid, 0) == 0)
>  >                 return true;
>  >         return false;
>  >  }
>  >  -----------------------------------------------------------
>  >
>  >  In the source, I carry out kill -0 in the last and seem to know 
> confirmation.
>  >
>  >  Therefore, a problem occurs with the following irregular cases.
>  >
>  > For example...
>  >
>  >  1)Prepare PGDATA into the shared device.
>  >  2)PostgreSQL starts in each node. (Constitution of Actvie/Active)
>  >   node A --- Postmaster PID 2222.(pgsql1)
>  >   node B --- Postmaster PID 2222.(pgsql2)
>  >  3)Reboot an node A.(reboot -nf)
>  >  4)A PID file is in condition to have stayed.
>  >  5)Fail-over. (pgsql1)
>  >  6)In the node B, there is a process same PID.
>  >
>  >  After all is not it impossible only in pg_ctl?
>  >
>
>  That's possible. But honestly I don't know how to deal with a such
>  rather exceptional situation. Even manual pg_ctl start would fail in
>  this case because pg_ctl would find that active process. The only
>  option to avoid this would be using "monitor" function (the one that
>  uses psql to connect to an instance) as a status function. In this
>  case RA will always try to connect to the instance and supposedly
>  report the right status.
>  What do you think? Should we accept the a small risk or should we
>  convert that status into real monitor?
>

Sorry, using "monitor" for "status" is impossible. If PostgreSQL is up
on a virtual IP monitor will always succeed doesn't metter on which
node PostgreSQL is up.

There are some other options like using fuser to check that a process
with a given PID really holds PG_DATA directory opened, simething like
that:

pgsql_status() {
    if [ -f $PIDFILE  ]
    then
        PID=`head -n 1 $PIDFILE`
        kill -0  $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1
| grep $PID >/dev/null 2>&1
    else
        : No pid file
        false
    fi
}

Or using /proc/$PID/cwd to check that it's a symlink to the correct
PG_DATA but I'm not sure that it worth it.

-- 
Serge Dubrouski.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to