On Tue, Oct 07, 2008 at 04:14:39PM -0500, Paras pradhan wrote:
> On Mon, Oct 6, 2008 at 2:53 PM, Paras pradhan <[EMAIL PROTECTED]>wrote:
> 
> >
> >
> > On Sun, Oct 5, 2008 at 3:54 AM, Daniel Asplund <[EMAIL PROTECTED]>wrote:
> >
> >> >
> >> > Hey all:
> >> >
> >> > It seems like my question is related to ha, drbd and xen . Hence posting
> >> to all of them at once.
> >> > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
> >> 5.2. As I was testing this cluster for high availibility, I noticed some
> >> issues
> >> >
> >> > 1)  domA is running under node1. when I manually shutdown node 1,
> >> sometimes it is migrated automatically to node2 and sometimes it is
> >> restarted in node2. Why is this happening?
> >> > 2) domA is running under node1. when I pull off the network cable, domA
> >> is restarted in node 2 with no problem. But when the node1 comes back, domA
> >> is not migrated to node1 and if i do 'xm list' under node1, I see
> >> "migrating-domain". This is complicating everything.
> >> >
> >>
> >> 1) Most likely live migration fails for some reason and therefore the
> >> domA is restarted in node2. Could be a timer issue or a problem with
> >> release of resources. You should be able to see something from the
> >> logs during shutdown on node1.
> >>
> >> 2) heartbeat on node1 will sense an error and try to migrate domA to
> >> node2 when node1 is up again. But the node2 has already started domA
> >> and you basically have domA running on both nodes. To avoid split
> >> situations like this you should really use a STONITH device that can
> >> reboot the other node, a hardware device connected via serial cable is
> >> most secure, but a cheaper alternative is to use soft stonith device
> >> that can reboot the other node via SSH or telnet. You probably need to
> >> tweak heartbeat as well to allow it to do further checks, for example
> >> test connectivity to your gateway.
> >
> >
> > Yes it seems I need Stonith. At least for now I want to use stonith ssh for
> > testing purposes. One thing that i am confused, how do i configure stonith
> > and what is the typical practise. In above scenario, node1 should be
> > rebooted or node2.
> >
> > What i did is under node1, I added "stonith_host * ssh node2" to ha.cf and
> > under node2: "stonith_host * ssh node1".  But this is not working.
> >
> > Is that the way to configure stonith. I have checked linux-ha.org +
> > google, but this confusion persists.
> >
> >
> > What I want is, if there is a network outage in node1, it should be
> > automatically rebooted or shutdown migrating all domUs to node2.
> >
> >
> >
> >
> >>
> >>
> >> Do you have two NICs in both nodes or are you running DRBD, HA and
> >> data traffic over same NIC?
> >
> >
> > Daniel, Yes I have 2 NICs in both nodes.
> >
> >
> >>
> >> Regards, Daniel
> >> http://www.asplund.nu/xencluster.html
> >>
> >
> >
> >
> > Thanks
> > Paras.
> >
> 
> 
> 
> One more thing as I am testing my ha cluster.
> 
> 
> I think I have a satisfactory HA cluster setup which I am planning to put in
> production. But i think I am too far away from it.
> 
> I need some advices... how do I test this cluster?
> 
> Few scenarios I have done to test:
> 
> 1) stop heart beat daemons --> Working fine
> 
> 2) Reboot and Shutdown nodes --> Working fine
> 
> 3) Pull of the network cable or did 'service network stop' --> Working.. but
> split brain need to be manually taken care of. Which solution is ideal?
> stonith meatware, stonith suicide or stonith ssh.

A real stonith device. Suicide may be of use in some setups, but
I wouldn't recommend it in general. No ssh in production.

> 4) Any other tips on how to test the cluster?

Disk full.

We need a list of things which may fail in interesting ways.

Thanks,

Dejan

> 
> Thanks
> Paras.
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to