On Wed, Feb 20, 2008 at 7:24 PM, HIDEO YAMAUCHI
<[EMAIL PROTECTED]> wrote:
> Hi Serge,
>
>  I confirmed new patch.
>
>  But, there is a problem as ever.
>
>  I watched a source of pg_ctl.
>  The following functions are called by the status check.
>
>  ----------------------pg_ctl.c(from PostgreSQL8.3)---------------------------
>  static bool
>  postmaster_is_alive(pid_t pid)
>  {
>         if (pid == getpid())
>                 return false;
>  #ifndef WIN32
>         if (pid == getppid())
>                 return false;
>  #endif
>         if (kill(pid, 0) == 0)
>                 return true;
>         return false;
>  }
>  -----------------------------------------------------------
>
>  In the source, I carry out kill -0 in the last and seem to know confirmation.
>
>  Therefore, a problem occurs with the following irregular cases.
>
> For example...
>
>  1)Prepare PGDATA into the shared device.
>  2)PostgreSQL starts in each node. (Constitution of Actvie/Active)
>   node A --- Postmaster PID 2222.(pgsql1)
>   node B --- Postmaster PID 2222.(pgsql2)
>  3)Reboot an node A.(reboot -nf)
>  4)A PID file is in condition to have stayed.
>  5)Fail-over. (pgsql1)
>  6)In the node B, there is a process same PID.
>
>  After all is not it impossible only in pg_ctl?
>

That's possible. But honestly I don't know how to deal with a such
rather exceptional situation. Even manual pg_ctl start would fail in
this case because pg_ctl would find that active process. The only
option to avoid this would be using "monitor" function (the one that
uses psql to connect to an instance) as a status function. In this
case RA will always try to connect to the instance and supposedly
report the right status.
What do you think? Should we accept the a small risk or should we
convert that status into real monitor?

-- 
Serge Dubrouski.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to