Hi Andrew, Thanks a lot for your reply!
Andrew Beekhof wrote:
On Thu, Sep 11, 2008 at 07:09, Satomi Taniguchi <[EMAIL PROTECTED]> wrote:Hi Lars and Andrew, I considered about the way to tell tengine how long it should lengthen timeout without telling STONITH resources' ids. My idea is the following. (1) add stonith op in <operations>. For example: <clone id="clnStonith"> [...snip...] <group id="grpStonith"> <primitive id="cgpStonith-kdumpcheck" class="stonith" type="external/kdumpcheck"> [...snip...] <operations> <op name="stonith" interval="0" id="cgpStonith-kdumpcheck-stonith" timeout="60s"/> </operations> </primitive> <primitive id="cgpStonith-ssh" class="stonith" type="external/ssh"> [...snip...] <operations> <op name="stonith" interval="0" id="cgpStonith-ssh-stonith" timeout="20s"/> </operations> </primitive> </group> </clone> (2) add 3 items in action graph. i) CRM_meta_plugin_num: the number of STONITH plugin running in the cluster. ii) CRM_meta_stonith_plugin_dataset: the information of each STONITH plugin's id and timeout. The format is "resource_id=timeout_value(ms)", and delimiter is " ". iii) CRM_meta_total_plugin_timeout: the sum total of all STONITH plugins' timeout values. For example: <crm_event id="22" operation="stonith" operation_key="stonith" on_node="node1" on_node_uuid="c064967c-147b-4a28-a3f8-a3f23d637edd"> <attributes CRM_meta_on_node="node1" CRM_meta_on_node_uuid="ebe5a7cb-608e-4df1-b2c1-5955c5083c2a" CRM_meta_plugin_num="4" CRM_meta_stonith_action="reboot" CRM_meta_stonith_plugin_dataset="cgpStonith-kdumpcheck:0=60000 cgpStonith-ssh:0=20000 cgpStonith-kdumpcheck:1=60000 cgpStonith-ssh:1=20000" CRM_meta_total_plugin_timeout="160000" crm_feature_set="3.0" /> </crm_event> (3) in tengine, lengthen its transition_timeout based on CRM_meta_total_plugin_timeout. It doesn't need to know which STONITH device is going to be used. In addition, also lengthen timeout value which it notifies to stonithd. (4) in stonithd, analyze CRM_meta_stonith_plugin_dataset with making use of CRM_meta_plugin_num when it does fence operation. And set timeout function for the plugin which it is going to execute by SetTrackedProcTimeouts() as if lrmd does. Honestly, I want to get the information of STONITH plugins which is running on the node that it is going to do STONITH operation. But I have no idea to get it in pengine. I implemented a prototype, and it seems to work well. I would like to hear your opinions.Personally I think this is unnecessarily complicated. I'm sure what you have works well, but would favor a single stonith-timeout configuration option which it is up to the admin to set appropriately (passed to the TE in the same way as cluster-delay).
Is "stonith-timeout" a configuration option per cluster? (In other words, is it written in <cluster_property_set>?) I consider it is better to be able to set timeout value for each STONITH plugin as if we can set the value for each resource. Because each STONITH device has its own characteristic.
In my opinion, this would be sufficient for most scenarios^ and the chance of it being configured correctly is much higher. It also requires a whole lot less CPU to figure out what value to use ;-)
Yes. less CPU is one of the most important matters for customers!
^ Clusters with multiple types of devices can simply pick the highest timeout and clusters with cascaded stonith setups (is that even possible at the moment?) just add all the timeouts all together.
I'm confused. With this sentence, it seems that "stonith-timeout" is an option per plugin... Cascaded stonith setup can be realized by setting two or more plugins in a group at the moment. As far as I confirm, if the first plugin in a group is failed, the second one is executed. And if the first one succeeds, the second one is _not_ executed. If it is an unexpected behavior, please let me know the correct one.
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
