Hi Serge,

This revision seems to be able to break off the problem.

>pgsql_status() { 
>if [ -f $PIDFILE ] 
>then 
>PID=`head -n 1 $PIDFILE` 
>kill -0 $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1 
>| grep $PID >/dev/null 2>&1 
>else 
>: No pid file 
>false 
>fi 
>} 

I confirm it and report it by the end of tomorrow.

Regards,
Hideo Yamauchi.


--- Serge Dubrouski <[EMAIL PROTECTED]> wrote:

> On Wed, Feb 20, 2008 at 9:13 PM, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> > On Wed, Feb 20, 2008 at 7:24 PM, HIDEO YAMAUCHI
> >
> > <[EMAIL PROTECTED]> wrote:
> >
> >
> > > Hi Serge,
> >  >
> >  >  I confirmed new patch.
> >  >
> >  >  But, there is a problem as ever.
> >  >
> >  >  I watched a source of pg_ctl.
> >  >  The following functions are called by the status check.
> >  >
> >  >  ----------------------pg_ctl.c(from 
> > PostgreSQL8.3)---------------------------
> >  >  static bool
> >  >  postmaster_is_alive(pid_t pid)
> >  >  {
> >  >         if (pid == getpid())
> >  >                 return false;
> >  >  #ifndef WIN32
> >  >         if (pid == getppid())
> >  >                 return false;
> >  >  #endif
> >  >         if (kill(pid, 0) == 0)
> >  >                 return true;
> >  >         return false;
> >  >  }
> >  >  -----------------------------------------------------------
> >  >
> >  >  In the source, I carry out kill -0 in the last and seem to know 
> > confirmation.
> >  >
> >  >  Therefore, a problem occurs with the following irregular cases.
> >  >
> >  > For example...
> >  >
> >  >  1)Prepare PGDATA into the shared device.
> >  >  2)PostgreSQL starts in each node. (Constitution of Actvie/Active)
> >  >   node A --- Postmaster PID 2222.(pgsql1)
> >  >   node B --- Postmaster PID 2222.(pgsql2)
> >  >  3)Reboot an node A.(reboot -nf)
> >  >  4)A PID file is in condition to have stayed.
> >  >  5)Fail-over. (pgsql1)
> >  >  6)In the node B, there is a process same PID.
> >  >
> >  >  After all is not it impossible only in pg_ctl?
> >  >
> >
> >  That's possible. But honestly I don't know how to deal with a such
> >  rather exceptional situation. Even manual pg_ctl start would fail in
> >  this case because pg_ctl would find that active process. The only
> >  option to avoid this would be using "monitor" function (the one that
> >  uses psql to connect to an instance) as a status function. In this
> >  case RA will always try to connect to the instance and supposedly
> >  report the right status.
> >  What do you think? Should we accept the a small risk or should we
> >  convert that status into real monitor?
> >
> 
> Sorry, using "monitor" for "status" is impossible. If PostgreSQL is up
> on a virtual IP monitor will always succeed doesn't metter on which
> node PostgreSQL is up.
> 
> There are some other options like using fuser to check that a process
> with a given PID really holds PG_DATA directory opened, simething like
> that:
> 
> pgsql_status() {
>     if [ -f $PIDFILE  ]
>     then
>         PID=`head -n 1 $PIDFILE`
>         kill -0  $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1
> | grep $PID >/dev/null 2>&1
>     else
>         : No pid file
>         false
>     fi
> }
> 
> Or using /proc/$PID/cwd to check that it's a symlink to the correct
> PG_DATA but I'm not sure that it worth it.
> 
> -- 
> Serge Dubrouski.
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to