On 2008-08-08T11:27:24, Satomi Taniguchi <[EMAIL PROTECTED]> wrote:
> I'm in difficulty because of STONITH for the node that is in the middle
> of doing kdump.
> For example, when kernel panic occurs, kdump is executed in second
> kernel on a node.
> But it is killed by STONITH before kdump finishes, and consequently
> nothing is dumped.
> I know that waiting kdump to be over means increasing failover time.
> But this is a serious problem for failure analysis.
Yes, this problem exists; we can't really wait until the dump has
completed with the fail-over.
> So, I intend to make a STONITH plugin which checks a target node is
> doing kdump or not.
Right. A node which is actively doing kdump can be considered "fenced",
and thus the STONITH requirements are satisfied.
> It is for using with an usual sniper STONITH plugin in a group.
> If the target node is doing kdump, the plugin considers that STONITH has
> succeeded.
>
> First, what do you think about this idea?
> Your comments and suggestions are really appreciated.
This makes sense.
There is one missing bit though; a node not doing kdump needs to be
STONITH'ed; so, failure of the kdump-stonith plugin should "escalate" to
the next plugin. I'm not sure the current STONITH subsystem can handle
this.
> Second, I would like to hear your opinion about the following.
> I think a timeout setting shuold be necessary for STONITH plugin.
>
> This is what I noticed while developing the plugin above,
> tengine and parent-stonithd each have their timeout settings
> based on "cluster-delay" and "default-action-timeout",
> but child-stonithd doesn't have its own.
> So, an user has no way to set definitely how long STONITH plugin may
> take time.
> To increase the values of "cluster-delay" and "default-action-timeout"
> may permit a plugin to take longer time, but it is far from their
> substance and its effect is so big.
Yes, being able to somehow specify a per-plugin "fence" timeout would be
useful. The "start" and "stop" timeouts can be set, but not the actual
stonith ops ...
I think it would make sense if they could be specified as regular
operations in the CIB, and then would be passed to stonithd somewhow.
Regards,
Lars
--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/