Dejan Muhamedagic escribió:
Hi,

On Mon, Dec 17, 2007 at 11:12:55AM +0100, Miguel Araujo wrote:
Dejan Muhamedagic escribi?:
Hi,

On Fri, Dec 14, 2007 at 01:29:51PM +0100, Miguel Araujo wrote:
Hello HA list!

I have been 2 or 3 days at the IRC channel posing some questions and probably abusing your patience a little bit ;) which wasn't my intention at all. I got to the HA world about 6 days ago and I have been gobbling
Welcome!

Thank you!
up documentation. First I would like to say that I find the documentation at the HA site dispersed and hard to follow. However the wiki web is easier to read and find what you are looking for.

Once said this, I would like to be able to understand heartbeat in a deep way and in future if possible collaborate with the documentation part or patching the software. I came across with HA because I'm working on a virtualization project. Basically what I have to do is set up 4 machines with XEN3.1 using exported block devices from a SAN using iSCSI.
There's a brand new iSCSI ocf RA.
I'm not interested on doing high availability on iSCSI. Anyway thanks for the info.

Your shared storage is on iSCSI disks, right? In that case, it'd
be more secure to include the iSCSI resource in the cluster,
because otherwise disks are going to be accessible on all nodes
and may be mounted by accident on a wrong node.


Yes, it is. I didn't know that adding iSCSI as a resource to the cluster will make available disks only to proper node, does it need special configuration?

That's already done, but now I was asked to make the service High available. This is how I found heartbeat and started reading about this tested software with you many years of experience in the field and that so many people rely on.

What I want to do is to monitor domUs so that if they fail I can move them to other nodes of my 4 node cluster. In the IRC they have already explained me some of the issues I didn't understand before, like you can not monitor the whole dom0, you monitor resources (domUs). After doing that I would like to do a stacked cluster monitoring services that the domUs run, but this may take ages.
Recently the Xen ocf RA saw some improvements. In particular,
there's a way to hook scripts to monitor resources within the
DomU which may allow you to keep the heartbeat running in the
Dom0. See recent discussion on Xen and the bugzilla entry.
Perhaps you could aid in testing it.

I have been reading it.
I have a 2 node cluster with a SAN for testing and having fun, so don't worry about warranties. Dominik Klein gently passed me his cib file for XEN, I understand almost the whole of it. The problem now is that I would like to add fencing to the cluster. Here comes the questions:

1.- Are fencing and STONITH different technologies that let you avoid brain-split? or they are just two different concepts achieved by the STONITH way?
Fencing is a term, STONITH a technology. STONITH makes fencing
possible.

Ok, first time I understand the real difference.
2.- How does STONITH know how to differentiate between a communication network failure and a crash on the node?
It can't. BTW, STONITH is just a way to reset the node. Other
components (pengine in particular) decide when to reset the node.

So, what happens if the stonith device can't do its work and then the supposed dead node comes back to life and screws up the resource?

The cluster won't proceed with takeover until it made sure that
the node was reset. In other words, if your STONITH device
doesn't work properly, the clustered services won't be
particularly available in case of a failure.


Ok, I'm starting to see why people give it such importance.

I mean if the node's network fails, how can the STONITH device kill it?
There are STONITH devices (mainly UPS) which are controlled over
serial. Another class is the lights off style devices, such as
ilo (HP) or rsa (IBM).

3.- As my SAN is not like a ServeRAID (it's not resource self fencing to say it somehow) I would like to run a different fencing script for every node of my cluster as every node have in a first time different block devices mounted (one for every virtual machine). I know how to block nodes accessing the SAN using its CLI, can it be done? could you pass me some example cib files?
CIB files won't help here. Managing access is not so easy to
implement. There's some code around for that, I believe. Don't
know about its state though. Look for "shared disk access" or
similar in the dev list archives.

OK. So if I'm not mistaken there is no way at the moment to do what I want using heartbeat. If I can't fence my nodes this way and neither buy special hardware, heartbeat is useless for my purpose, isn't it?

High availability does require some investment. However, saying
that heartbeat is useless for this purpose is not right. If the
purpose is to offer high availability, then heartbeat can
certainly help. But, just as any other cluster software, it does
require means to do its job.


I understand, however I still think that there could be an individual scripting planning if a node fails. At least someway to say, if this node fails execute this script on the node that is going to substitute it and that script would call the SAN CLI saying, now I'm the only one who has access. To me this seems a good feature for any high availability software, because I do already have a hardware which can make the fencing work, shouldn't be a good idea to use it somehow?

4.- Would you mind listing some STONITH devices available in the market?
Take a look at the output of stonith -L. Those are supported.

Thanks,
No, Thank you! I was wondering if you know any way I can solve my fencing problem using heartbeat or other HA software. I'm starting to run out of time for testing. I'm starting to see myself programming some scripts to do HA my way, although they would just be an specific solution that should be refactored for every situation, which is not flexible or scalable.

A hand made cluster software still won't help if you don't have
proper protection for your data. If it were possible, it would've
already been in the Heartbeat. You have shared storage, then you
need fencing.


I wasn't thinking about facing the problem as a cluster. Let me put you a simplified 2node example. I've got a machineA wich is running a process and its file system is taken out of a SAN machine. Now machineB is just there to backup machineA if it fails. So I make an easy script that runs in background on hostB:

ping machineA
if machineA fails to respond 5 times
take the SAN control using the CLI
start machineA critical HA process.

Thank you again. I hope to finally understand the issue.

         Miguel Araujo

Thanks,

Dejan

Regards and thanks for your patience.

         Miguel Araujo
Dejan

Finally I want to thank you all your time and effort. It's very likely you will receive more reply mails asking more of these questions, thanks in advanced.

Regards,

        Miguel Araujo


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to