On 2008-02-27T20:39:13, Keisuke MORI <[EMAIL PROTECTED]> wrote:
Hi Keisuke-san,
thanks for your patch and contribution. I have to apologize in the name
of everyone for the late feedback.
I really appreciate the idea of monitoring processes directly, and
receiving async failure notifications to reduce fail-over times.
I have just discussed this with Dejan and Andrew, and we think that the
best path forward, alas necessary before inclusion, is to
- Make procd independent of Pacemaker. It should talk only to the RAs
and the LRM.
- RAs should "sign in" with it for the processes they want monitored,
instead of listing the processes in the procd configuration section
(means it gets decoupled from the CIB further). The RAs could write a
record to /var/run/heartbeat/procd/<resource-id>, for example.
The RAs would add/remove the required processes on start/promote or
demote/stop. (So procd itself would not need to be master-slave.)
I'm afraid that having users manually specify process lists in the CIB
really is not workable - the users will not be able to get this
right.
- Instead of respawning procd, there should be a resource agent which
starts/stops (and monitors!) procd. You already have one, but why
doesn't it go into resources/OCF/ ?
- procd should talk to the LRM to insert a "fake" failed resource
action, which would then cause the CRM/PE to handle the resource as
failed and initiate recovery. (This is not currently possible with the
LRM client library; you could exec crm_resource -F, which would mean
you no longer have a build-time dependency on the CRM.)
- This would have the advantage of decoupling procd from pacemaker as
well as heartbeat. It could be included with the LRM/RA package build,
and possibly be useful with other cluster managers too.
I think all that would help simplify the code.
> +#define RSCID_LEN 128 /* ref. include/lrm/lrm_api.h */
> +#define MAX_PID_LEN 256 /* ref. lrm/lrmd/lrmd.h */
> +#define MAX_LISTEN_NUM 10 /* ref. lib/clplumbing/ipcsocket.c */
If you're referencing from other include files, please do include the
includes as to avoid diverging header definitions.
Regards,
Lars
--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/