High Priyanka,
What I wrote in this message consists of pieces of my memory and
suppositions. So, confirm the contents of this messages by yourself.
> now i hope only one node among rest 3 nodes will be choosen to stonith the
> errant node (please correct me if it does not happen this way) , my question
> is , which node among rest 3 nodes will be choosen for stonithing the errant
> node. i mean to ask how does this election happen?
case1:
If the errant node isn't a DC node, Eiether of the rest sane 3 must be the DC.
The DC trys to shoot the errant node by itself if it has some stonith-plugin
instances that are configured to shoot the errant node and run normally.
Each instance is choosen to shoot the errant node one by one. If one of
them succeeds, the try is over. This case conforms with your hope.
case2:
If the errant node isn't a DC node and the DC node doesn't have any usable
stonith-plugin instance to shoot the errant node, the DC node calls for
the rest sane 2 non DC nodes to shoot the errant node. It is possible
that each of the 2 nodes has some usable stonith-plugin instances and
succeeds to shoot the errant node. That means this case may go against
your hope.
case3:
If the DC node runs into a crash for some reason like a rise of kernel
panic, the rest sane 3 elect one of themselves as a new DC node.
If the new DC has some usable stonith-plugins to shoot the old DC node,
see case1, otherwise see case2. Wheter your hope fulfilled or not
depends on which case things go along with.
case4:
If some operation for some resource instance fails, and "fencing" is set
in the corresponding "on_fail" attribute, and the failure happens on the
DC node, the DC node calls for the rest 3 to shoot itself.
Like case2, it is possible that plural nodes shoot the DC node.
In addition to this, the cluster that losts its DC forgets that
it once succeeded fencing. Soon the rest 3 are aware that their
DC has disappear but they can't know how it happened.
That means there happens case3. In this case, your hope must go
againt your hope.
Priyanka Ranjan wrote:
Thanks Takenaka,
ok, i have another question . suppose in this 4 nodes cluster , 1 node
fails and as we know that stonith for that node is running on all other 3
nodes (as u said 3x4 instances
run in a cluster.).
now i hope only one node among rest 3 nodes will be choosen to stonith the
errant node (please correct me if it does not happen this way) , my question
is , which node among rest 3 nodes will be choosen for stonithing the errant
node. i mean to ask how does this election happen?
Regards,
On Mon, Mar 9, 2009 at 2:58 PM, Takenaka Kazuhiro <
[email protected]> wrote:
Hello Priyanka.
Note, What I wrote below is my supposition.
Actually, I didn't try any stonith configuration for an
cluster with nodes more than 3. If someone knows stonith much
and finds my misunderstanding, please point it out.
I very appreciate it.
suppose i am using ilo stonith where one stonith cant shoot more than one
node , so i will have to create 4 stoniths for 4 node.
...
now i want to know how the partition B will be able to fence partition A
as
the stontih required for stonithing partition A are not available in
partition B , they are running in partition A ( on node1 and node2) .
4 stonith-plugin instances in an cluster are not enough
for your situation. You must run 3 instances on every node.
like this:(fig.a)
node01 : node02, node03, node04
node02 : node01, node03, node04
node03 : node01, node02, node04
node04 : node01, node02, node03
Each left term of ":" in every line means a shooter node name.
Each right term of ":" means a list of target nodes.
Each pair of a shooter node and a taget node requires one
stonith-plugin instance. So, in this case, 3x4 instances
run in a cluster.
Otherwise, You can use 4 clone resources.
Every clone resource runs one stonith-plugin instance on each node.
like this:(fig.b)
node01 : node01, node02, node03, node04
node02 : node01, node02, node03, node04
node03 : node01, node02, node03, node04
node04 : node01, node02, node03, node04
In this case, 4x4 instances run in a cluster.
On every node, there is a instance which shootes the node on which
itself runs. But the stonith mechanism of Heartbeat doesn't use it when
a fencing operation is caused. So, these stonith-plugin instances work
same in the case of fig.a on fencing.
As just descrived, constructing a cib.xml by clone is easier than
the way of fig.a, but it demands needless suicide instances.
In both ways, any single stonith-device corruption, even if
it just is a cable trouble, breaks plural stonith-plugin instances,
because each stonith-device is watched out from each instance of
every node.
After the corruption fixed, every stalled stonith-plugin must be
resumed by the Heartbeat commands to make the cluster sane.
If the cluster consists of many nodes, it must be very bothersome.
Priyanka Ranjan wrotes:
just want to clear one thing here. suppose i have four nodes cluster
then
do i need to configure and run stonith-node1 (which keep info about
node1)
on all other three nodes (i.e node2, node3 and node4)
not necessary. running it on any one of the other nodes is
enough.
suppose i am using ilo stonith where one stonith cant shoot more than one
node , so i will have to create 4 stoniths for 4 node.
Lets say stonith-node1 is running on node 2 and stonith node-2 is running
on
node1 we get a network failure which result node1 , node2 to be one
partiton A and and node3 , node4 on other partion B
now i want to know how the partition B will be able to fence partition A
as
the stontih required for stonithing partition A are not available in
partition B , they are running in partition A ( on node1 and node2) .
Thanks a lot in advance for your help,
Regards,
Priyanka.
Thanks,
Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
--
Takenaka Kazuhiro <[email protected]>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
--
Takenaka Kazuhiro <[email protected]>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems