On Thu, Sep 11, 2008 at 07:09, Satomi Taniguchi <[EMAIL PROTECTED]> wrote: > Hi Lars and Andrew, > > I considered about the way to tell tengine how long it should lengthen > timeout > without telling STONITH resources' ids. > > My idea is the following. > > (1) add stonith op in <operations>. > For example: > <clone id="clnStonith"> > [...snip...] > <group id="grpStonith"> > <primitive id="cgpStonith-kdumpcheck" class="stonith" > type="external/kdumpcheck"> > [...snip...] > <operations> > <op name="stonith" interval="0" > id="cgpStonith-kdumpcheck-stonith" timeout="60s"/> > </operations> > </primitive> > <primitive id="cgpStonith-ssh" class="stonith" type="external/ssh"> > [...snip...] > <operations> > <op name="stonith" interval="0" id="cgpStonith-ssh-stonith" > timeout="20s"/> > </operations> > </primitive> > </group> > </clone> > > (2) add 3 items in action graph. > i) CRM_meta_plugin_num: the number of STONITH plugin running in the > cluster. > ii) CRM_meta_stonith_plugin_dataset: the information of each STONITH > plugin's id and timeout. The format is "resource_id=timeout_value(ms)", and > delimiter is " ". > iii) CRM_meta_total_plugin_timeout: the sum total of all STONITH plugins' > timeout values. > For example: > <crm_event id="22" operation="stonith" operation_key="stonith" > on_node="node1" on_node_uuid="c064967c-147b-4a28-a3f8-a3f23d637edd"> > <attributes CRM_meta_on_node="node1" > > CRM_meta_on_node_uuid="ebe5a7cb-608e-4df1-b2c1-5955c5083c2a" > CRM_meta_plugin_num="4" > CRM_meta_stonith_action="reboot" > > CRM_meta_stonith_plugin_dataset="cgpStonith-kdumpcheck:0=60000 > cgpStonith-ssh:0=20000 cgpStonith-kdumpcheck:1=60000 cgpStonith-ssh:1=20000" > CRM_meta_total_plugin_timeout="160000" > crm_feature_set="3.0" /> > </crm_event> > > (3) in tengine, lengthen its transition_timeout based on > CRM_meta_total_plugin_timeout. > It doesn't need to know which STONITH device is going to be used. > In addition, also lengthen timeout value which it notifies to stonithd. > > (4) in stonithd, analyze CRM_meta_stonith_plugin_dataset with making use of > CRM_meta_plugin_num when it does fence operation. > And set timeout function for the plugin which it is going to execute > by SetTrackedProcTimeouts() as if lrmd does. > > > > Honestly, I want to get the information of STONITH plugins which is running > on the node that it is going to do STONITH operation. > But I have no idea to get it in pengine. > > I implemented a prototype, and it seems to work well. > I would like to hear your opinions.
Personally I think this is unnecessarily complicated. I'm sure what you have works well, but would favor a single stonith-timeout configuration option which it is up to the admin to set appropriately (passed to the TE in the same way as cluster-delay). In my opinion, this would be sufficient for most scenarios^ and the chance of it being configured correctly is much higher. It also requires a whole lot less CPU to figure out what value to use ;-) ^ Clusters with multiple types of devices can simply pick the highest timeout and clusters with cascaded stonith setups (is that even possible at the moment?) just add all the timeouts all together. _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
