Re: [Linux-HA] Some issues with heartbeat

Dejan Muhamedagic Mon, 17 Dec 2007 03:05:40 -0800

Hi,

On Mon, Dec 17, 2007 at 11:12:55AM +0100, Miguel Araujo wrote:
> Dejan Muhamedagic escribi?:
> >Hi,
> >
> >On Fri, Dec 14, 2007 at 01:29:51PM +0100, Miguel Araujo wrote:
> >  
> >>Hello HA list!
> >>
> >>I have been 2 or 3 days at the IRC channel posing some questions and 
> >>probably abusing your patience a little bit ;) which wasn't my intention 
> >>at all. I got to the HA world about 6 days ago and I have been gobbling 
> >>    
> >
> >Welcome!
> >
> >  
> Thank you!
> >>up documentation. First I would like to say that I find the 
> >>documentation at the HA site dispersed and hard to follow. However the 
> >>wiki web is easier to read and find what you are looking for.
> >>
> >>Once said this, I would like to be able to understand heartbeat in a 
> >>deep way and in future if possible collaborate with the documentation 
> >>part or patching the software. I came across with HA because I'm working 
> >>on a virtualization project. Basically what I have to do is set up 4 
> >>machines with XEN3.1 using exported block devices from a SAN using 
> >>iSCSI.
> >>    
> >
> >There's a brand new iSCSI ocf RA.
> >  
> I'm not interested on doing high availability on iSCSI. Anyway thanks 
> for the info.


Your shared storage is on iSCSI disks, right? In that case, it'd
be more secure to include the iSCSI resource in the cluster,
because otherwise disks are going to be accessible on all nodes
and may be mounted by accident on a wrong node.

> >>That's already done, but now I was asked to make the service High 
> >>available. This is how I found heartbeat and started reading about this 
> >>tested software with you many years of experience in the field and that 
> >>so many people rely on.
> >>
> >>What I want to do is to monitor domUs so that if they fail I can move 
> >>them to other nodes of my 4 node cluster. In the IRC they have already 
> >>explained me some of the issues I didn't understand before, like you can 
> >>not monitor the whole dom0, you monitor resources (domUs). After doing 
> >>that I would like to do a stacked cluster monitoring services that the 
> >>domUs run, but this may take ages.
> >>    
> >
> >Recently the Xen ocf RA saw some improvements. In particular,
> >there's a way to hook scripts to monitor resources within the
> >DomU which may allow you to keep the heartbeat running in the
> >Dom0. See recent discussion on Xen and the bugzilla entry.
> >Perhaps you could aid in testing it.
> >
> >  
> I have been reading it.
> >>I have a 2 node cluster with a SAN for testing and having fun, so don't 
> >>worry about warranties. Dominik Klein gently passed me his cib file for 
> >>XEN, I understand almost the whole of it. The problem now is that I 
> >>would like to add fencing to the cluster. Here comes the questions:
> >>
> >>1.- Are fencing and STONITH different technologies that let you avoid 
> >>brain-split? or they are just two different concepts achieved by the 
> >>STONITH way?
> >>    
> >
> >Fencing is a term, STONITH a technology. STONITH makes fencing
> >possible.
> >
> >  
> Ok, first time I understand the real difference.
> >>2.- How does STONITH know how to differentiate between a communication 
> >>network failure and a crash on the node?
> >>    
> >
> >It can't. BTW, STONITH is just a way to reset the node. Other
> >components (pengine in particular) decide when to reset the node.
> >
> >  
> So, what happens if the stonith device can't do its work and then the 
> supposed dead node comes back to life and screws up the resource?

The cluster won't proceed with takeover until it made sure that
the node was reset. In other words, if your STONITH device
doesn't work properly, the clustered services won't be
particularly available in case of a failure.

> >>I mean if the node's network 
> >>fails, how can the STONITH device kill it?
> >>    
> >
> >There are STONITH devices (mainly UPS) which are controlled over
> >serial. Another class is the lights off style devices, such as
> >ilo (HP) or rsa (IBM).
> >
> >  
> >>3.- As my SAN is not like a ServeRAID (it's not resource self fencing to 
> >>say it somehow) I would like to run a different fencing script for every 
> >>node of my cluster as every node have in a first time different block 
> >>devices mounted (one for every virtual machine). I know how to block 
> >>nodes accessing the SAN using its CLI, can it be done? could you pass me 
> >>some example cib files?
> >>    
> >
> >CIB files won't help here. Managing access is not so easy to
> >implement. There's some code around for that, I believe. Don't
> >know about its state though. Look for "shared disk access" or
> >similar in the dev list archives.
> >
> >  
> OK. So if I'm not mistaken there is no way at the moment to do what I 
> want using heartbeat. If I can't fence my nodes this way and neither buy 
> special hardware, heartbeat is useless for my purpose, isn't it?

High availability does require some investment. However, saying
that heartbeat is useless for this purpose is not right. If the
purpose is to offer high availability, then heartbeat can
certainly help. But, just as any other cluster software, it does
require means to do its job.

> >>4.- Would you mind listing some STONITH devices available in the market?
> >>    
> >
> >Take a look at the output of stonith -L. Those are supported.
> >
> >Thanks,
> >  
> No, Thank you! I was wondering if you know any way I can solve my 
> fencing problem using heartbeat or other HA software. I'm starting to 
> run out of time for testing.
> I'm starting to see myself programming some scripts to do HA my way, 
> although they would just be an specific solution that should be 
> refactored for every situation, which is not flexible or scalable.

A hand made cluster software still won't help if you don't have
proper protection for your data. If it were possible, it would've
already been in the Heartbeat. You have shared storage, then you
need fencing.

Thanks,

Dejan

> Regards and thanks for your patience.
> 
>          Miguel Araujo
> >Dejan
> >
> >  
> >>Finally I want to thank you all your time and effort. It's very likely 
> >>you will receive more reply mails asking more of these questions, thanks 
> >>in advanced.
> >>
> >>Regards,
> >>
> >>         Miguel Araujo
> >>
> >>
> >>_______________________________________________
> >>Linux-HA mailing list
> >>[email protected]
> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>See also: http://linux-ha.org/ReportingProblems
> >>    
> >
> >  
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Some issues with heartbeat

Reply via email to