[Linux-HA] IP-Address problem with cluster
Dear members! I have a problem with our cluster. We have two nodes each has an IP and one more IP for the cluster. All our resources are on node-1 and node-2 is our slave one. But if I ping from node-1 to any other machine he is using the cluster-IP and not his own IP do you know how to fix that? Thank You for your help! -- _ Auszubildender Fachinformatiker für Systemintegration RWTH Aachen Lehrstuhl für Integrierte Analogschaltungen Raum 24C 313 Walter-Schottky-Haus Sommerfeldstr. 24 D-52074 Aachen www.ias.rwth-aachen.de Email: daniel.thielk...@ias.rwth-aachen.de Phone: +49-(0)241-80-27771 FAX: +49-(0)241-80-627771 _ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] IP-Address problem with cluster
On Thu, Jun 26, 2014 at 01:13:22PM +0200, Daniel Thielking wrote: Dear members! I have a problem with our cluster. We have two nodes each has an IP and one more IP for the cluster. All our resources are on node-1 and node-2 is our slave one. But if I ping from node-1 to any other machine he is using the cluster-IP and not his own IP do you know how to fix that? typically because of source routing or SNAT. check iptables-save | grep SNAT and ip route show ip route get $dest_ip (look for src ...) In many cases you *want* outgoing packets from the service IP address. You could use ip route to make them use the node address to contact the other nodes, but use the service address to contact clients. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] unable to recover from split-brain in a two-node cluster
On Tue, Jun 24, 2014 at 08:48:03AM -0700, f...@vmware.com wrote: Hi Lars, Thanks for pointing out the patch. It is not in the heartbeat version on the system (it is using Heartbeat-3-0-7e3a82377fa8). I'll try that out. As for ccm_testclient, the system has stripped out unnecessary files that won't be used during normal operation, including gcc. So ccm_testclient complains gcc not found Uh? Why would it think it needs gcc? Can you copy the exact message please? and I cannot test it on that system. cl_status listnodes shows both nodes on both system, cl_status nodestatus shows both are active thought. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat 3.0.3 crashes if there are networking/multicast issues (ERROR: lowseq cannnot be greater than ackseq)
On Tue, Jun 24, 2014 at 11:20:48PM +0300, Pasi Kärkkäinen wrote: Hello! I've been seeing heartbeat cluster problems in Linux-based Vyatta and more recent VyOS networking/router appliances. These are currently based on Debian Squeeze, and thus are using: Package: heartbeat Version: 1:3.0.3-2 Please use 3.0.5: http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/37f57a36a2dd.tar.bz2 VyOS bug report: http://bugzilla.vyos.net/show_bug.cgi?id=244 The problem is that when there are (unexpected) networking problems causing multicast issues, which cause problems in the inter-cluster communications, the heartbeat processes will die on the cluster nodes, which is bad, right? I assume heartbeat should never die, especially not because of temporary networking issues.. I've also seen heartbeat dying because of temporary network maintenance breaks.. Basicly first I'm seeing this kind of messages: Jun 23 17:55:02 vyos03 heartbeat: [4119]: WARN: node vyos01: is dead Jun 23 17:59:23 vyos03 heartbeat: [4119]: CRIT: Cluster node vyos01 returning after partition. Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Deadtime value may be too small. Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Late heartbeat: Node vyos01: interval 273580 ms Jun 23 17:59:23 vyos03 harc[4961]: info: Running /etc/ha.d//rc.d/status status Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Releasing resource group: vyos01 IPaddr2-vyatta::10.0.0.10/24/eth1 Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Running /etc/ha.d/resource.d/IPaddr2-vyatta 10.0.0.10/24/eth1 stop Jun 23 17:59:26 vyos03 heartbeat: [4119]: WARN: 1 lost packet(s) for [vyos01] [421:423] Jun 23 17:59:39 vyos03 heartbeat: [4119]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jun 23 17:59:40 vyos03 harc[5102]: info: Running /etc/ha.d//rc.d/status status Which seem normal in the case of networking problem.. But then later: Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (494 messages in queue) Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (495 messages in queue) Jun 23 19:31:23 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (496 messages in queue) Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (497 messages in queue) Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (498 messages in queue) Jun 23 19:31:25 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (499 messages in queue) Jun 23 19:31:26 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (500 messages in queue) Jun 23 19:31:42 vyos03 heartbeat: last message repeated 25 times The hist queue size keeps increasing, and when it gets to 500 messages bad things start happening.. Jun 23 19:31:43 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (500 messages in queue) Jun 23 19:31:49 vyos03 heartbeat: last message repeated 9 times Jun 23 19:31:49 vyos03 heartbeat: [10921]: ERROR: lowseq cannnot be greater than ackseq Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown: Master Control process died. Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10921 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10924 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10925 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown(MCP dead): Killing ourselves. At this point clustering has failed, because the heartbeat services/processes aren't running anymore.. Has anyone else seen this? It has been fixed years ago ... It seems the bug gets triggered at 500 messages in the hist queue, and then I always see the ERROR: lowseq cannnot be greater than ackseq and then heartbeat dies.. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] unable to recover from split-brain in a two-node cluster
I don't know what ccm_testclient I was running. I'm pretty sure it was a shell script and it was complaining gcc not found. I rebuilt heartbeat and ugpraded pacemaker, now the ccm_testclient is a binary file and I can run it without problem -Kaiwei - Original Message - From: Lars Ellenberg lars.ellenb...@linbit.com To: linux-ha@lists.linux-ha.org Sent: Thursday, June 26, 2014 4:24:29 AM Subject: Re: [Linux-HA] unable to recover from split-brain in a two-node cluster On Tue, Jun 24, 2014 at 08:48:03AM -0700, f...@vmware.com wrote: Hi Lars, Thanks for pointing out the patch. It is not in the heartbeat version on the system (it is using Heartbeat-3-0-7e3a82377fa8). I'll try that out. As for ccm_testclient, the system has stripped out unnecessary files that won't be used during normal operation, including gcc. So ccm_testclient complains gcc not found Uh? Why would it think it needs gcc? Can you copy the exact message please? and I cannot test it on that system. cl_status listnodes shows both nodes on both system, cl_status nodestatus shows both are active thought. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org https://urldefense.proofpoint.com/v1/url?u=http://lists.linux-ha.org/mailman/listinfo/linux-hak=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=9IPI1z37RqWr21klX9jnPw%3D%3D%0Am=YWD2xEyRKh61Jmcm0EXT7hmgss2aUbM7cfoKSh9MEa4%3D%0As=c143678ccb972cfa4c5e65f1769f192d51edfdcee919cd978567d0c18b93621e See also: https://urldefense.proofpoint.com/v1/url?u=http://linux-ha.org/ReportingProblemsk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=9IPI1z37RqWr21klX9jnPw%3D%3D%0Am=YWD2xEyRKh61Jmcm0EXT7hmgss2aUbM7cfoKSh9MEa4%3D%0As=01de7c6157a51619278b06ef93465ce5ce6164a688133e0bcb80ce3b9cb1be7f ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems