Hello all, Making some tests with SC 3.2, i have faced some situations that are at least, interesting..
1- First i shutdown the two nodes, and after that i did try to boot just "one" node, and for my surprise, the node did not boot. The message was about the "other node is unreachable through this path". I have waited more than an hour, to see if was a "timeout" or something, and after that i boot the other node too. After that, the cluster becomes online again. 2- In other case, i just cut the power off on one node, and for my surprise (again), the other node crash too (reboot). After that, i was thinking "Now what? the node will not boot because of the case (1) above"... but i was wrong, this time the node boot ok. The environment is: Two-node sun cluster 3.2, with just "one" cluster interconnect interface. Testing "evacuate" or "switch" just works. The problem is when i try to simulate "real" failures. So, the questions are: a) The case (1) is fine? How can i fix that in a real world scenario? b) and the case (2)? c) In the above configuration, what i can expect and what i can not expect for a failover/switch back scenarios? I mean, what are the failures that are covered in such configuration? How many servers can crash, there is a order to respect (shutdown)... ? I know that all should be obvious for you, and i think there is a explanation for all that... but, i just want to know to be aware of. Thanks for your time! Leal. -- This message posted from opensolaris.org
