Hello Eric,
On 2013-06-27 17:35, Robinson, Eric wrote:
-Original Message-
I don't understand why resources try to start on the wrong
node (and
of course fail).
Pacemaker 1.0.7 ... looking at the Changelog of Pacemaker 1.0
at
https://github.com/ClusterLabs/pacemaker-1.0
-Original Message-
I don't understand why resources try to start on the wrong node (and of
course fail).
My nodes are ha05 and ha06.
ha05 is master/primary and all resources are running on it.
If I run...
crm resource stop p_MySQL_185
..the resource stops fine. Then if I
I don't understand why resources try to start on the wrong node (and of course
fail).
My nodes are ha05 and ha06.
ha05 is master/primary and all resources are running on it.
If I run...
crm resource stop p_MySQL_185
..the resource stops fine. Then if I run...
crm resource start p_MySQL_185
We are installing corosync and pacemaker on a brand new RHEL 6.3 cluster today.
When we installed using yum, here are the versions that pulled down from the
repos.
pacemaker-libs-1.1.9-1512.el6.x86_64
pacemaker-1.1.9-1512.el6.x86_64
corosync-1.4.3-26.2.x86_64
In the simplest terms, we currently have resources:
A = drbd
B = filesystem
C = cluster IP
D thru J = mysql instances.
Resource group G1 consists of resources B through J, in
that order, and is dependent on resource A.
This fails over fine, but it has the serious
In the simplest terms, we currently have resources:
A = drbd
B = filesystem
C = cluster IP
D thru J = mysql instances.
Resource group G1 consists of resources B through J, in
that order, and is dependent on resource A.
This fails over fine, but it has the serious
On Wed, Mar 27, 2013 at 6:12 AM, Robinson, Eric
eric.robin...@psmnv.com wrote:
In the simplest terms, we currently have resources:
A = drbd
B = filesystem
C = cluster IP
D thru J = mysql instances.
Resource group G1 consists of resources B through J
I've asked this question on the list before and never received a good answer,
so here goes again. I've also read the Pacemaker documentation, but I just
cannot seem to get this.
I have a drbd resource, p_drbd0.
I have a resource group, g_clust01, which consists of a filesystem
(p_fs_clust01)
In the simplest terms, we currently have resources:
A = drbd
B = filesystem
C = cluster IP
D thru J = mysql instances.
Resource group G1 consists of resources B through J, in that order, and is
dependent on resource A.
This fails over fine, but it has the serious disadvantage that if you stop
meta ordered=false
Wouldn't it that make it se we could not be sure that the filesystem and
cluster IP start before the MySQL instances?
Or you could take the MySQL instances out of the group and
make them each individually dependent on drbd/filesystem with
a collocation/order
We have this configuration:
NodeA is located in DataCenterA. NodeB is located in (geographically separate)
DataCenterB.
DataCenterA is connected to DataCenterB through 4 redundant gigabit links (two
physically separate Corosync rings).
Both nodes reach the Internet through (geographically
We have this configuration:
NodeA is located in DataCenterA. NodeB is located in
(geographically separate) DataCenterB.
DataCenterA is connected to DataCenterB through 4 redundant
gigabit links (two physically separate Corosync rings).
Both nodes reach the Internet through
With Pacemaker 1.1.8 and drbd 8.4.2, we are observing that when the primary
node is put into standby mode ('crm node standby') the drbd resource on the
secondary node refuses to be promoted because it is in a WFConnection state. Is
this normal and by design? I don't recall seeing this behavior
If the promote of DRBD on one node cannot be done, this might
be because the demote on the other node cannot be achieved.
Do you mount a FS ? If so, force : umount -fl /mountpoint
Double check (cat /proc/drbd) that the DRBD resource is
really secondary on the demoted node.
This is with no
Okay, I think I have some new information on this problem.
First, upgrading to drbd 8.4.2 did not help.
I believe the problem is that when I do 'crm node offline' Pacemaker is fully
stopping the drbd service. This causes drbd on the secondary to go into a
WFConnection state. It refuses to
will
not failover
On 12/05/2012 12:05 PM, Robinson, Eric wrote:
I believe the problem is that when I do 'crm node offline' Pacemaker
is fully stopping the drbd service. This causes drbd on the
secondary to go into a WFConnection state. It refuses to
promote to primary in that state.
Probably
I was thinking drbd losing packets and thus falling back to
WFC rather than pacemaker ordering a full stop.
Gotcha. Well, I think it is demonstrably the case that it is losing packets
because the service is stopped.
you could probably find the stop action in the
RA and replace it with
you could probably find the stop action in the
RA and replace it with (e.g.) logger 'AIE ***I did not
want this***' and then see what gets logged.
--
Well, that worked, in the sense that the resource now fails over. I replaced
the start and stop actions in the RA with logger
will
not failover
02.12.2012 00:34, Robinson, Eric wrote:
Try to set 'target-role=Started' in both of them.
Okay, but how does that address the problem of error code
11 from drbdadm?
Well, you have error promoting resources. 11 is EAGAIN,
usually meaning you did not demote the other
--
Eric Robinson
Director of Information Technology
Physician Select Management, LLC
775-885-2211 x 111
I am not sure if that will really help you - but in my
cluster (ok older pacemaker version) I ahve the following to
define a master slave
resource:
primitive rsc_sap_HA0_ASCS00
Try to set 'target-role=Started' in both of them.
Okay, but how does that address the problem of error code 11 from drbdadm?
--Eric
Disclaimer - December 1, 2012
This email and any files transmitted with it are confidential and intended
solely for 'General Linux-HA mailing list'. If
Bump... does anyone have some insight on this? Google is not turning up
anything useful.
Our newest cluster will not failover master/slave drbd resources. It works fine
manually using drbdadm from a shell prompt, but when we try it using 'crm node
standby' and letting the cluster manage the
I posted about this a couple of weeks ago but didn't get a response.
Our newest cluster will not failover master/slave drbd resources. It works fine
manually using drbdadm from a shell prompt, but when we try it using 'crm node
standby' and letting the cluster manage the resource, crm_mon just
Well I happen to be working on a new landing page which
partially addresses these points.
I'll see if I can can get most of them covered.
That would be awesome. The overall feel that I have is that of making my way
down a country road in the rain. I often feel as though I have missed an
clusterlabs.org/doc is as good as i can do for docs.
i try to keep it up-to-date and version specific (so that
documenting corosync 2.x doesn't obliterate the cman/plugin stuff).
packages are mostly in the hands of the distros though.
building the entire stack (and keeping it up-to-date)
Should I be using pcs or crmsh?
Neither one seems to work quite right.
What doesn't work? I think that at this point of time, it's
be easier to get crmsh going/fixed with pcmk 1.1.8. It's
probably just some path somewhere. If really nothing works,
you *must* use LCMC, Pacemaker GUI.
That's what makes open-source's ecosystem so vibrant. :)
I suspect that you are saying that with tongue somewhat in cheek. :-) Speaking
as someone who just wants to get a job done, I do find the rockiness of the
terrain discouraging. Just when I was getting used to crmsh, I hear that
It is actually worse than that: for as long as I remember RH
has included a trap for young players where if you edit
/etc/hosts all sorts of interesting things may happen after
next reboot. Or rpm update.
Depending on your choice of editor and phase of the moon.
None of my RH6 machines
I totally agree. I try to use HA setups in production
environments but I only do 2 or so a year and meanwhile I
have a complete zoo of versions, tools, shells etc.
I was trying fairly hard not to say something like that for fear of alienating
helpful members of the list, but I have to
I can think of 3 tooling changes:
- ptest/crm_simulate
- hb_report/crm_report
- standalone crmsh
Thats not /too/ bad in 4 years.
Fair enough. From my perspective, I was thinking of the fact that our first
cluster was rolled out in 2006, back when the documentation was all about paul
bump.
Could someone please review the logs in the links below and tell me what the
heck is going on with this cluster? I've never encountered anything like this
before. Basically, corosync thinks the cluster is healthy but Pacemaker won't
elect a DC!
--
Hi Andrew,
would love to see the
Lars,
Did you not see my other mail?
Regards,
Lars
Gosh, no, I don't see another one from you about this in the list. I can't
imagine what might have happened to it. Can you resend it?
--Eric
Disclaimer - November 13, 2012
This email and any files transmitted with it are
Hi Lars, found your email...
Um, are you setting a nodeid in corosync.conf?
Because I see this:
Nov 09 09:07:25 [2609] ha09a.mycharts.md crmd: crit:
crm_get_peer: Node ha09a.mycharts.md and ha09a share
the same cluster
node id '973777088'!
This is not
Lars,
I'd probably strip everything except the short names out of
/etc/HOSTNAME and /etc/hosts, though it may be sufficient to
make sure the short names come first.
I think something changed with regard to hostname handling in 1.1.8.
It looks like you were right about 1.1.8 handling
Should I be using pcs or crmsh?
Neither one seems to work quite right.
Here are the packages I have installed on my RHEL 6.3 servers...
[root@ha09a ~]# rpm -qa|egrep pacem|coro|crmsh|pcs|sort
corosync-1.4.1-7.el6_3.1.x86_64
corosynclib-1.4.1-7.el6_3.1.x86_64
crmsh-1.2.1-45.2.x86_64
The official management tool is/will be pcs. That said, crm
has been around for a while, so it might be more complete/stable.
I know that, personally, I will be learning pcs.
digimer
I tried using pcs but I ran into a roadblock right away. The Clusters from
Scratch document refers to
Hi Andrew,
would love to see the logs from ha09b
Below are links to a clean set of logs from nodes ha09a and ha09b. The
procedure I followed to collect the logs was:
1. Ensure pacemakerd and corosync are stopped on both nodes.
2. Remove corosync.log on both nodes.
3. Start corosync on ha09a.
Andrew,
Um, are you setting a nodeid in corosync.conf?
Because I see this:
Nov 09 09:07:25 [2609] ha09a.mycharts.md crmd: crit:
crm_get_peer: Node ha09a.mycharts.md and ha09a share
the same cluster
node id '973777088'!
Which could easily explain why the cluster
Andrew,
I updated to 1.1.8 from the clusterlabs-next repo. Now I am
back to the problem where no DC gets elected...
Last updated: Thu Nov 8 10:10:06 2012 Last change: Fri Nov 2
17:16:29 2012 Current DC: NONE 0 Nodes configured, unknown expected
votes 0 Resources configured.
\ On Thu, Nov 8, 2012 at 8:31 AM, Robinson, Eric
eric.robin...@psmnv.com wrote:
Okay, I'll look forward to seeing those. Should I use 1.1.7
until then or just wait?
They should be there now.
Excellent, I will get them.
What is the not ideal part?
Well the reason we released 1.1.8
Andrew,
On Thu, Nov 8, 2012 at 8:31 AM, Robinson, Eric
eric.robin...@psmnv.com wrote:
Okay, I'll look forward to seeing those. Should I use 1.1.7
until then or just wait?
They should be there now.
What is the not ideal part?
Well the reason we released 1.1.8 is because we fixed
I tried to build 1.1.8-237 and ran into too many dependency
problems. I uninstalled everything and reinstalled using
version 1.1.7 rpms, and now I finally get the expected...
Last updated: Mon Nov 5 09:08:04 2012 Last change: Mon Nov 5
09:07:55 2012 via crmd on
Basically you're hitting a bug in 1.1.8
I made the is node X alive? check much stricter but there
are certain timing windows which produce a false positive.
You should be fine after applying the following two patches
(or building from HEAD):
+ Andrew Beekhof (4 weeks ago) b87494d:
Basically you're hitting a bug in 1.1.8 I made the is node
X alive?
check much stricter but there are certain timing windows
which produce
a false positive.
You should be fine after applying the following two patches (or
building from HEAD):
+ Andrew Beekhof (4 weeks
Is this where I can get latest Pacemaker source, complete with the patches that
fix the timing window/false positives issue that Andrew mentioned on the
'cib_replace failed' thread?
https://github.com/ClusterLabs/pacemaker/tarball/master
--
Eric Robinson
Disclaimer - November 5, 2012
This
Basically you're hitting a bug in 1.1.8 I made the is node
X alive?
check much stricter but there are certain timing windows
which produce
a false positive.
You should be fine after applying the following two patches (or
building from HEAD):
+ Andrew Beekhof (4
One has to wonder if the cause of problem is your systems are
bogged down by iowait resulting from all that logging and are
e.g. dropping packets.
--
Dimitri Maziuk
rsyslog rate limiting is normal behavior. The messages to the corosync.log are
not being dropped and the system if
This is just crazy as heck. The rings are fine and both nodes have joined the
cluster, but when I start Pacemaker no DC ever gets elected.
[root@ha09b corosync]# corosync-cfgtool -s
Printing ring status.
Local node ID 990554304
RING ID 0
id = 192.168.10.59
status = ring 0
That was still the official version in git at the time. See below.
Perhaps try the official upstream release of 1.1.8 for RHEL-6?
http://www.clusterlabs.org/rpm-next/
Will do.
Here is what we have installed...
[root@ha09a log]# rpm -qa|egrep pacem|coros
pacemaker-1.1.8-0.901.eedc0cc.git.el6.x86_64
Thats an interesting version you have there. Where did you
get it from?
When I tried to remove it, it said...
Of
Robinson, Eric
Sent: Tuesday, October 30, 2012 12:25 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] cib_replace failed?
Bringing up a brand new cluster with corosync 1.4.3 and pacemaker
1.1.8. Configuration fails right away. My first
configuration command
times out
On 10/31/2012 04:59 PM, Robinson, Eric wrote:
Nobody has any thoughts on why my 2-node cluster has no DC?
As I mentioned, corosync-cfgtool -s shows the ring active
with no faults.
Do you have /etc/corosync/service.d/pcmk? And does it look
exactly as the example given here?
http
That probably means that someone (i.e., you ;-) needs to dig
more into the logs of corosync pacemaker. There's bound to
be a clue there; pacemaker is many things, but definitely not
shy when it comes to logging.
Regards,
Lars
Yes, that's what it probably means. :-)
For what it
Okay, the two node names are ha09a and ha09b. Starting clean with all services
turned off.
This is what I get in /var/log/corosync.log on ha09a when I start corosync...
Oct 31 10:22:43 corosync [MAIN ] Corosync Cluster Engine ('1.4.3'): started
and ready to provide service.
Oct 31 10:22:43
Bringing up a brand new cluster with corosync 1.4.3 and pacemaker 1.1.8.
Configuration fails right away. My first configuration command times out with
the error...
[root@ha09a ~]# crm configure property stonith-enabled=false
Call cib_replace failed (-62): Timer expired
null
ERROR: could not
Bringing up a brand new cluster with corosync 1.4.3 and pacemaker
1.1.8. Configuration fails right away. My first
configuration command
times out with the error...
[root@ha09a ~]# crm configure property stonith-enabled=false Call
cib_replace failed (-62): Timer expired null
.
On Mon, Oct 29, 2012 at 8:09 PM, Robinson, Eric
eric.robin...@psmnv.com wrote:
It's been replaced in RHEL by PSC. You can still use CRM (or any
other manager), but it's not maintained by the pacemaker devs.
Madi
I'll use whatever I can get my hands on. Neither crm nor
pcs came
: to be checked (chkconfig etc.)
Alain
Verified. Services are only started by Pacemaker, nothing starts on boot
(except Pacemaker and Corosync).
De :Robinson, Eric eric.robin...@psmnv.com
A : linux-ha@lists.linux-ha.org
Date : 07/05/2012 20:27
Objet : [Linux-HA] We Rebooted
Hi guys, we rebooted a standby node of a healthy cluster and suddenly all the
resources on the primary cluster restarted. What's up with that? Before
rebooting the standby node, we did the normal stuff to verify that all was well.
crm_mon showed all nodes online, in their expected roles, with
We have two geographically separate data centers connected by 4 x
Gigabit links (in 2 trunks). Our HA clusters are distributed between the
data centers, with each node of a 2-node cluster in a separate data
center. (In the case of our 3-node clusters, 2 nodes are in one data
center and the 3rd
, 2012 at 10:34 PM, Robinson, Eric
eric.robin...@psmnv.com wrote:
We have two geographically separate data centers connected by 4 x
Gigabit links (in 2 trunks). Our HA clusters are
distributed between
the data centers, with each node of a 2-node cluster in a separate
data center
I have a geographically dispersed (stretch) cluster, where one node is
in data center A and the other node is in data center B. I have done
everything possible to ensure link redundancy between the cluster nodes.
Each node has 4 x gigabit links connected to 4 different sets of
switches and routers
Should I be concerned that the standby node of a 2-node cluster is
logging these messages about every 15 seconds? This cluster has been up
and running apparently fine for a year.
Nov 12 17:53:59 ha05 crm_attribute: [2420]: info: Invoked: crm_attribute
-N ha05.mycharts.md -n master-p_DRBD:1 -l
As Florian mentioned, there's the debug option, but I doubt
think it is going to help. What may help is to take a look at
the network traffic, but you'd need really good sight ;-)
Thanks,
You're right, it didn't help. What helped was going back to the Linux
bonding documentation,
cluster. Whew.
--
Eric Robinson
Eric,
Please file a bug against the scripts then.
Thank you.
Regards,
Tristan
I imagine since I heard about it from Florian, who heard about it in an
IRC chat, somebody must be way ahead of me on that.
--Eric
Disclaimer - November 6
-11-06 09:13, Robinson, Eric wrote:
Two little problems with my new cluster.
1. When I put the primary node in standby, the resources
failover to
the other node just fine. When I put the primary back online, the
resources automatically fail back, but DRBD on the stanby node goes
That's quite definitely a misconfiguration. Please create a
CIB dump
with cibadmin -Q, make that available on an HTTP server somewhere
(might as well be pastebin or similar), and share the URL here.
Done.
www.psmnv.com/downloads/cibadmin.dump
2. When I do drbdadm up, I get the
A couple of things are fixed. The ring FAULTY messages were caused by
genuine network communication failures (go figure) which in turn had two
root causes. One was my error and the other was Red Hat's. Although I
have set up bonding many times before, on these servers I had
BONDING_OPS instead of
We are unable to find the cause of ringid FAULTY adminisrtative
intervention required on our newest cluster. Is there someone in this
list who knows corosync really well and who we could hire on a
consulting basis? Frankly, we're desperate.
--Eric
Disclaimer - November 4, 2011
This email
We keep getting 'ringid FAULTY adminisrtative intervention required' but
there is nothing in the logs that indicates why it reached this
decision. Is there a way to enable more detailed debugging so I can see
why it is disabling the ring?
--Eric
Disclaimer - November 3, 2011
This email and
I have two rings configured. Everything looks fine until
I bring up
the second node. Then the second ring on both nodes reports:
status = Marking seqid 21 ringid 1 interface 198.51.100.55
FAULTY -
adminisrtative intervention required.
The rings are on different
Technology
Physician Select Management, LLC
775-885-2211 x 111
-Original Message-
From: linux-ha-boun...@lists.linux-ha.org
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of
Robinson, Eric
Sent: Thursday, November 03, 2011 4:43 AM
To: General Linux-HA mailing list
Subject: Re
logging {
debug: off
}
How about changing that to on?
Yes, I feel silly. Thanks.
--Eric
Disclaimer - November 3, 2011
This email and any files transmitted with it are confidential and intended
solely for General Linux-HA mailing list. If you are not the named addressee
you
Can anyone see from this debug log why the ring is being marked faulty?
The FAULTY message is the last line.
Nov 03 16:57:01 corosync [MAIN ] Corosync Cluster Engine ('1.2.3'):
started and ready to provide service.
Nov 03 16:57:01 corosync [MAIN ] Corosync built-in features: nss rdma
Nov 03
Of
Robinson, Eric
Sent: Thursday, November 03, 2011 5:04 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] ringid FAULTY adminisrtative
intervention required
Can anyone see from this debug log why the ring is being
marked faulty?
The FAULTY message is the last line.
Nov 03 16:57:01
After the euphoria of fixing my problem where selinux breaks ring
initialization (thanks all) I thought I would have smooth sailing. No
such luck.
I have two rings configured. Everything looks fine until I bring up the
second node. Then the second ring on both nodes reports:
status = Marking
We are using redundant rings at two different speeds in
SLES11 SP1. Until we upgraded corosync from 1.4 to some 1.4
version, the slower ring was marked failty every time there
was significant configuartion exchange on the rings. Now a
ring still gets faulty from time to time, but it does
I can't get a cluster up on RHEL6. First I tried pacemaker+corosync, but
corosync complains...
Could not get the ring status, the error is: 6
..and I cannot connect to the cluster.
So then I tried pacemaker+heartbeat, only to learn that pacemaker no
longer supports the heartbeat cluster
Florian's suggestion sounds like a good start for you. After
that, try firewalls and selinux.
Well, sheesh, it was selinux. Write that one down, folks. Selinux causes
error 6 problem when initializing the ring. And this is all because
the RHEL 6 installer does not ask whether selinux should
I just installed and configured corosync-1.2.3-21.el6_0.1.x86_64 on
RHEL6. At startup, the corosync log appears to be complete except for
the line, A processor joined or left the membership and a new
membership was formed. I cannot connect to the cluster, and
corosync-cfgtool states:
Local node
list linux-ha@lists.linux-ha.org
An
'General Linux-HA mailing list' linux-ha@lists.linux-ha.org
Kopie
Thema
Re: [Linux-HA] Escaping Depenencies in Resource Groups
Robinson, Eric wrote on 2011-09-29:
We have a 3-node cluster running about 200 instances of
MySQL. The way
We have a 3-node cluster running about 200 instances of MySQL. The way
we have our resource groups set up, the dependency stack looks like
this:
Cluster_IP
Filesystem
MySQL_001
MySQL_002
MySQL_003
MySQL_004
...
On Thu, Jun 16, 2011 at 8:38 PM, Robinson, Eric
eric.robin...@psmnv.com
wrote:
crm_mon on my system displays a lot of failed actions, I guess
because
the init script for the resource is not fully lsb compliant?
In any case, the resources seem to work okay and failover okay.
How can I
crm_mon on my system displays a lot of failed actions, I guess because
the init script for the resource is not fully lsb compliant?
In any case, the resources seem to work okay and failover okay.
How can I get rid of all those failed actions?
crm_mon output follows...
Last
Greetings! We have a few Corosync+PaceMaker+DRBD clusters and a couple
older Heartbeat+DRBD clusters. Our infrastructure is currently located
in a single facility. We have the opportunity to establish a DR site in
another data center over a high bandwidth connection (2-4Gbps). I am
thinking of
I woke up this morning and discovered that one of my clusters had failed
over during the night. Everything was working fine, but I wanted to know
what happened. From reading the logs, it looks to like the primary node
ftp02 gave up its resources to the secondary node ftp01, which had
become
-Original Message-
From: linux-ha-boun...@lists.linux-ha.org
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Mia Lueng
Sent: Monday, November 29, 2010 6:24 AM
To: linux-ha@lists.linux-ha.org
Subject: [Linux-HA] How to monitor the nic link status
Hi:
I have configured a
FYI -- additional info.
1. I did an 'unmove' for resources g_clust04 and g_clust05, which
removed the 'location cli-prefer' statements from the crm config. The
resources stayed where they were.
2. I changed resource-stickiness to 200.
3. I performed the power plug pull test on node ha07b again.
I performed a power-plug-pull test on my newest cluster and it failed
over as expected. However, when I restored power to the failed node,
the resources failed back. I don't understand why this happened since I
have resource stickiness set.
There are three nodes in the cluster. I pulled the plug
Nope, this was created by move/migrate command.
Somebody forgot unmove/unmigrate
Actually, it wasn't. We pulled the plug on node ha07b, watched the
resource failover, and then plugged it back in and watched it fail back.
We didn't issue a move command that I recall.
--
Eric Robinson
I've went through something similar, and this could be
the case as well, the combined score for the group is higher
than the resource-stickiness value, setting it to a much higher
value (or even inf:) would do the job.
I'll give that a try!
--
Eric Robinson
Disclaimer - November 24,
I'm not sure if this list or the DRBD list is the right one to ask this.
Is it possible to deploy a 3-node CRM-based cluster where:
-- nodes A and C share resource R1 on /dev/drbd0
-- nodes B and C share resource R2 on /dev/drbd1
-- resource constraints prevent R1 from
I'm not sure if this list or the DRBD list is the right one to ask
this.
Is it possible to deploy a 3-node CRM-based cluster where:
-- nodes A and C share resource R1 on /dev/drbd0
-- nodes B and C share resource R2 on /dev/drbd1
-- resource constraints prevent R1
That way, if something happens to switched network #1,
Corosync can still track node status through switched
net #2.
Once this configuration is built, I can use Pacemaker
with resource constraints to ensure that resource R1
can only run on SERVER_A or SERVER_C (usually A) and
But now could someone please elaborate on Dejan Muhamedagic's
original
comment that started the thread? What does redundant rings are still
not there mean? Is a three-node cluster an unreliable setup because
Corosync and/or Pacemaker are not really ready for that?
not Pacemaker -
Heartbeat uses term communication path, instead.
And yes it has been able to support more then two
node cluster since version 2. Take a look at
http://oss.linbit.com/drbd-mc/ It's a nifty java
application which can help you to create your initial
cluster configuration in no time.
Heartbeat is not deprecated, it is still supported by Linbut
folks. (Many thanks to them). But if you need clvmd, GFS2,
you would have to use corosync, for example.
It may not be deprecated per se, but there is no getting started
guide on the ClusterLabs site for using PaceMaker+Heartbeat
But don't let me stop you from using corosync, you still
can build your particular cluster with the same amount
of hardware.
The only thing that would stop me from using corosync is the thought
that it is somehow unreliable or not there yet, scary as that sounds.
The cluster would be
Looks like you are mixing up physical connections
and Corosync rings.
I should not have mentioned DRBD at all as it confuses the question.
Let me try it this way:
How do I build a three-node Corosync cluster with redundant heartbeat
paths? I don't trust the switched network or the Ethernet
3-node cluster is much easier to say than to configure,
apparently. :-)
It really isn't :)
Encouraged by your it really isn't, I now press forward. :-)
Based on what I'm hearing, this is what I think I have learned...
It is possible to build a 3-node cluster with redundant heartbeat
1 - 100 of 169 matches
Mail list logo