On Wed, Oct 8, 2008 at 10:19 AM, Dejan Muhamedagic <[EMAIL PROTECTED]>wrote:
> On Tue, Oct 07, 2008 at 04:14:39PM -0500, Paras pradhan wrote: > > On Mon, Oct 6, 2008 at 2:53 PM, Paras pradhan <[EMAIL PROTECTED] > >wrote: > > > > > > > > > > > On Sun, Oct 5, 2008 at 3:54 AM, Daniel Asplund <[EMAIL PROTECTED] > >wrote: > > > > > >> > > > >> > Hey all: > > >> > > > >> > It seems like my question is related to ha, drbd and xen . Hence > posting > > >> to all of them at once. > > >> > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under > centos > > >> 5.2. As I was testing this cluster for high availibility, I noticed > some > > >> issues > > >> > > > >> > 1) domA is running under node1. when I manually shutdown node 1, > > >> sometimes it is migrated automatically to node2 and sometimes it is > > >> restarted in node2. Why is this happening? > > >> > 2) domA is running under node1. when I pull off the network cable, > domA > > >> is restarted in node 2 with no problem. But when the node1 comes back, > domA > > >> is not migrated to node1 and if i do 'xm list' under node1, I see > > >> "migrating-domain". This is complicating everything. > > >> > > > >> > > >> 1) Most likely live migration fails for some reason and therefore the > > >> domA is restarted in node2. Could be a timer issue or a problem with > > >> release of resources. You should be able to see something from the > > >> logs during shutdown on node1. > > >> > > >> 2) heartbeat on node1 will sense an error and try to migrate domA to > > >> node2 when node1 is up again. But the node2 has already started domA > > >> and you basically have domA running on both nodes. To avoid split > > >> situations like this you should really use a STONITH device that can > > >> reboot the other node, a hardware device connected via serial cable is > > >> most secure, but a cheaper alternative is to use soft stonith device > > >> that can reboot the other node via SSH or telnet. You probably need to > > >> tweak heartbeat as well to allow it to do further checks, for example > > >> test connectivity to your gateway. > > > > > > > > > Yes it seems I need Stonith. At least for now I want to use stonith ssh > for > > > testing purposes. One thing that i am confused, how do i configure > stonith > > > and what is the typical practise. In above scenario, node1 should be > > > rebooted or node2. > > > > > > What i did is under node1, I added "stonith_host * ssh node2" to ha.cfand > > > under node2: "stonith_host * ssh node1". But this is not working. > > > > > > Is that the way to configure stonith. I have checked linux-ha.org + > > > google, but this confusion persists. > > > > > > > > > What I want is, if there is a network outage in node1, it should be > > > automatically rebooted or shutdown migrating all domUs to node2. > > > > > > > > > > > > > > >> > > >> > > >> Do you have two NICs in both nodes or are you running DRBD, HA and > > >> data traffic over same NIC? > > > > > > > > > Daniel, Yes I have 2 NICs in both nodes. > > > > > > > > >> > > >> Regards, Daniel > > >> http://www.asplund.nu/xencluster.html > > >> > > > > > > > > > > > > Thanks > > > Paras. > > > > > > > > > > > One more thing as I am testing my ha cluster. > > > > > > I think I have a satisfactory HA cluster setup which I am planning to put > in > > production. But i think I am too far away from it. > > > > I need some advices... how do I test this cluster? > > > > Few scenarios I have done to test: > > > > 1) stop heart beat daemons --> Working fine > > > > 2) Reboot and Shutdown nodes --> Working fine > > > > 3) Pull of the network cable or did 'service network stop' --> Working.. > but > > split brain need to be manually taken care of. Which solution is ideal? > > stonith meatware, stonith suicide or stonith ssh. > > A real stonith device. Suicide may be of use in some setups, but > I wouldn't recommend it in general. No ssh in production. If I pull off the network cable in a node. Is it possible for that node to be poweroff using suicide or ssh? It seems yes, but is not working. Any tips on this? > > > > 4) Any other tips on how to test the cluster? > > Disk full. > > We need a list of things which may fail in interesting ways. > > Thanks, > > Dejan > > > > > Thanks > > Paras. > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > Thanks Paras. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
