On Wed, Feb 20, 2008 at 7:24 PM, HIDEO YAMAUCHI
<[EMAIL PROTECTED]> wrote:
> Hi Serge,
>
> I confirmed new patch.
>
> But, there is a problem as ever.
>
> I watched a source of pg_ctl.
> The following functions are called by the status check.
>
> ----------------------pg_ctl.c(from PostgreSQL8.3)---------------------------
> static bool
> postmaster_is_alive(pid_t pid)
> {
> if (pid == getpid())
> return false;
> #ifndef WIN32
> if (pid == getppid())
> return false;
> #endif
> if (kill(pid, 0) == 0)
> return true;
> return false;
> }
> -----------------------------------------------------------
>
> In the source, I carry out kill -0 in the last and seem to know confirmation.
>
> Therefore, a problem occurs with the following irregular cases.
>
> For example...
>
> 1)Prepare PGDATA into the shared device.
> 2)PostgreSQL starts in each node. (Constitution of Actvie/Active)
> node A --- Postmaster PID 2222.(pgsql1)
> node B --- Postmaster PID 2222.(pgsql2)
> 3)Reboot an node A.(reboot -nf)
> 4)A PID file is in condition to have stayed.
> 5)Fail-over. (pgsql1)
> 6)In the node B, there is a process same PID.
>
> After all is not it impossible only in pg_ctl?
>
That's possible. But honestly I don't know how to deal with a such
rather exceptional situation. Even manual pg_ctl start would fail in
this case because pg_ctl would find that active process. The only
option to avoid this would be using "monitor" function (the one that
uses psql to connect to an instance) as a status function. In this
case RA will always try to connect to the instance and supposedly
report the right status.
What do you think? Should we accept the a small risk or should we
convert that status into real monitor?
--
Serge Dubrouski.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems