Hi,

On Mon, Nov 09, 2009 at 12:18:12PM -0600, Carlos Chacón Ch wrote:
> Hello guys
> 
> I got this weird issue.
> 2 nodes they are working fine with auto_failback ON. But a few days ago I
> tried auto_failback OFF.
> But I realized it did not work. When doing the manual failback the
> IP(10.1.1.4) that has to move to node01 is lost. I mean there is no bond0:0
> in node01 when is should but the IP is not on node02 either. Every server
> has its bond IP but the HA IP is lost. I have to basically restart heartbeat
> several times to get the IP on any of the nodes.
> 
> Weird thing the HA IP(10.1.1.4) works fine when failback is automatic but
> when configured for manual the HA IP is lost.

Strange indeed.

> OS: Red Hat Enterprise 5.0
> Kernel: 2.6.18-8.el5
> HeartBeat version: heartbeat-2.1.4-6.el5 - Install using RPM packages.
> 
> ha.cf Conf.
> 
> logfile /var/log/ha-log
> logfacility local0
> keepalive 3
> deadtime 10
> udp bond0
> udpport 695
> auto_failback off
> node node01
> node node02
> 
> haresources Conf
> node01 x.x.x.x. HA_http
> 
> these are the logs the day I tried the OFF configuration for auto_failback
> 
> what could be causing this issue?
> 
> Logs node01
> http://karlochacon.googlepages.com/node01.txt
> 
> Logs node02
> http://karlochacon.googlepages.com/node02.txt

The logs are a mess, there are several shutdowns on both nodes,
not possible to figure out what's going on. There are these
problems though (encountered many times):

Oct 29 20:37:17 node02 ResourceManager[20430]: ERROR: Return code 1 from 
/etc/init.d/HA_http
Oct 29 20:37:17 node02 ResourceManager[20430]: CRIT: Giving up resources due to 
failure of HA_http

Nodes regularly go dead some 10 seconds after the other node
starts or stops resources. Is your network sane? Does the IP
resource makes somehow nodes lose each other?

Oct 29 19:49:18 node01 IPaddr[15369]: INFO:  Running OK
Oct 29 19:49:18 node01 ResourceManager[15342]: info: Running 
/etc/init.d/HA_http  start
Oct 29 19:49:29 node01 heartbeat: [14132]: WARN: node node02: is dead
Oct 29 19:49:29 node01 heartbeat: [14132]: info: Dead node node02 gave up 
resources.

Thanks,

Dejan





> thanks a lot guys
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to