On Thu, Sep 11, 2008 at 07:09, Satomi Taniguchi
<[EMAIL PROTECTED]> wrote:
> Hi Lars and Andrew,
>
> I considered about the way to tell tengine how long it should lengthen
> timeout
> without telling STONITH resources' ids.
>
> My idea is the following.
>
> (1) add stonith op in <operations>.
>    For example:
>    <clone id="clnStonith">
>      [...snip...]
>      <group id="grpStonith">
>        <primitive id="cgpStonith-kdumpcheck" class="stonith"
> type="external/kdumpcheck">
>          [...snip...]
>          <operations>
>            <op name="stonith" interval="0"
> id="cgpStonith-kdumpcheck-stonith" timeout="60s"/>
>          </operations>
>        </primitive>
>        <primitive id="cgpStonith-ssh" class="stonith" type="external/ssh">
>          [...snip...]
>          <operations>
>            <op name="stonith" interval="0" id="cgpStonith-ssh-stonith"
> timeout="20s"/>
>          </operations>
>        </primitive>
>      </group>
>    </clone>
>
> (2) add 3 items in action graph.
>    i) CRM_meta_plugin_num: the number of STONITH plugin running in the
> cluster.
>   ii) CRM_meta_stonith_plugin_dataset: the information of each STONITH
> plugin's id and timeout. The format is "resource_id=timeout_value(ms)", and
> delimiter is " ".
>  iii) CRM_meta_total_plugin_timeout: the sum total of all STONITH plugins'
> timeout values.
>    For example:
>    <crm_event id="22" operation="stonith" operation_key="stonith"
> on_node="node1" on_node_uuid="c064967c-147b-4a28-a3f8-a3f23d637edd">
>      <attributes CRM_meta_on_node="node1"
>
>  CRM_meta_on_node_uuid="ebe5a7cb-608e-4df1-b2c1-5955c5083c2a"
>                  CRM_meta_plugin_num="4"
>                  CRM_meta_stonith_action="reboot"
>
> CRM_meta_stonith_plugin_dataset="cgpStonith-kdumpcheck:0=60000
> cgpStonith-ssh:0=20000 cgpStonith-kdumpcheck:1=60000 cgpStonith-ssh:1=20000"
>                  CRM_meta_total_plugin_timeout="160000"
>                  crm_feature_set="3.0" />
>    </crm_event>
>
> (3) in tengine, lengthen its transition_timeout based on
>    CRM_meta_total_plugin_timeout.
>    It doesn't need to know which STONITH device is going to be used.
>    In addition, also lengthen timeout value which it notifies to stonithd.
>
> (4) in stonithd, analyze CRM_meta_stonith_plugin_dataset with making use of
>    CRM_meta_plugin_num when it does fence operation.
>    And set timeout function for the plugin which it is going to execute
>    by SetTrackedProcTimeouts() as if lrmd does.
>
>
>
> Honestly, I want to get the information of STONITH plugins which is running
> on the node that it is going to do STONITH operation.
> But I have no idea to get it in pengine.
>
> I implemented a prototype, and it seems to work well.
> I would like to hear your opinions.

Personally I think this is unnecessarily complicated.

I'm sure what you have works well, but would favor a single
stonith-timeout configuration option which it is up to the admin to
set appropriately (passed to the TE in the same way as cluster-delay).

In my opinion, this would be sufficient for most scenarios^ and the
chance of it being configured correctly is much higher.
It also requires a whole lot less CPU to figure out what value to use ;-)


^ Clusters with multiple types of devices can simply pick the highest
timeout and clusters with cascaded stonith setups (is that even
possible at the moment?) just add all the timeouts all together.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to