Re: [Linux-ha-dev] To avoid STONITH for a node which is doing kdump

Satomi TANIGUCHI Fri, 19 Sep 2008 00:01:46 -0700

Hi Andrew,

Thanks a lot for your reply!



Andrew Beekhof wrote:

On Thu, Sep 11, 2008 at 07:09, Satomi Taniguchi
<[EMAIL PROTECTED]> wrote:

Hi Lars and Andrew,

I considered about the way to tell tengine how long it should lengthen
timeout
without telling STONITH resources' ids.

My idea is the following.

(1) add stonith op in <operations>.
   For example:
   <clone id="clnStonith">
     [...snip...]
     <group id="grpStonith">
       <primitive id="cgpStonith-kdumpcheck" class="stonith"
type="external/kdumpcheck">
         [...snip...]
         <operations>
           <op name="stonith" interval="0"
id="cgpStonith-kdumpcheck-stonith" timeout="60s"/>
         </operations>
       </primitive>
       <primitive id="cgpStonith-ssh" class="stonith" type="external/ssh">
         [...snip...]
         <operations>
           <op name="stonith" interval="0" id="cgpStonith-ssh-stonith"
timeout="20s"/>
         </operations>
       </primitive>
     </group>
   </clone>

(2) add 3 items in action graph.
   i) CRM_meta_plugin_num: the number of STONITH plugin running in the
cluster.
  ii) CRM_meta_stonith_plugin_dataset: the information of each STONITH
plugin's id and timeout. The format is "resource_id=timeout_value(ms)", and
delimiter is " ".
 iii) CRM_meta_total_plugin_timeout: the sum total of all STONITH plugins'
timeout values.
   For example:
   <crm_event id="22" operation="stonith" operation_key="stonith"
on_node="node1" on_node_uuid="c064967c-147b-4a28-a3f8-a3f23d637edd">
     <attributes CRM_meta_on_node="node1"

 CRM_meta_on_node_uuid="ebe5a7cb-608e-4df1-b2c1-5955c5083c2a"
                 CRM_meta_plugin_num="4"
                 CRM_meta_stonith_action="reboot"

CRM_meta_stonith_plugin_dataset="cgpStonith-kdumpcheck:0=60000
cgpStonith-ssh:0=20000 cgpStonith-kdumpcheck:1=60000 cgpStonith-ssh:1=20000"
                 CRM_meta_total_plugin_timeout="160000"
                 crm_feature_set="3.0" />
   </crm_event>

(3) in tengine, lengthen its transition_timeout based on
   CRM_meta_total_plugin_timeout.
   It doesn't need to know which STONITH device is going to be used.
   In addition, also lengthen timeout value which it notifies to stonithd.

(4) in stonithd, analyze CRM_meta_stonith_plugin_dataset with making use of
   CRM_meta_plugin_num when it does fence operation.
   And set timeout function for the plugin which it is going to execute
   by SetTrackedProcTimeouts() as if lrmd does.



Honestly, I want to get the information of STONITH plugins which is running
on the node that it is going to do STONITH operation.
But I have no idea to get it in pengine.

I implemented a prototype, and it seems to work well.
I would like to hear your opinions.


Personally I think this is unnecessarily complicated.

I'm sure what you have works well, but would favor a single
stonith-timeout configuration option which it is up to the admin to
set appropriately (passed to the TE in the same way as cluster-delay).

Is "stonith-timeout" a configuration option per cluster?
(In other words, is it written in <cluster_property_set>?)

I consider it is better to be able to set timeout value for each STONITH plugin
as if we can set the value for each resource.
Because each STONITH device has its own characteristic.


In my opinion, this would be sufficient for most scenarios^ and the
chance of it being configured correctly is much higher.
It also requires a whole lot less CPU to figure out what value to use ;-)

Yes.
less CPU is one of the most important matters for customers!



^ Clusters with multiple types of devices can simply pick the highest
timeout and clusters with cascaded stonith setups (is that even
possible at the moment?) just add all the timeouts all together.

I'm confused.
With this sentence, it seems that "stonith-timeout" is an option per plugin...

Cascaded stonith setup can be realized by setting two or more plugins in a group
at the moment.
As far as I confirm, if the first plugin in a group is failed,
the second one is executed.
And if the first one succeeds, the second one is _not_ executed.
If it is an unexpected behavior, please let me know the correct one.

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] To avoid STONITH for a node which is doing kdump

Reply via email to