Hi,

Somehow ip fail over occurs right after I start the service on both 
master/slave servers. Here are my specs:

Master
OS: centos 5.3
Hostname: master
Public IP: 123.456.789.123  (This is the shared IP as well)
Private IP: 12.345.678.123

Slave
OS: centos 5.3
Hostname: master
Public IP: 123.456.789.456
Private IP: 12.345.678.456


The Slave seems to be always active when starting the service because I always 
see the virtual interface eth0:1; the master never creates a virtual interface, 
maybe because it's already using the cluster IP? This is what I get in the log:

Sep 23 16:12:58 Master heartbeat: [5494]: WARN: node slave: is dead
Sep 23 16:12:58 Master heartbeat: [5494]: info: Comm_now_up(): updating status 
to active
Sep 23 16:12:58 Master heartbeat: [5494]: info: Local status now set to: 
'active'
Sep 23 16:12:58 Master heartbeat: [5494]: info: Starting child client 
"/usr/lib64/heartbeat/ipfail" (498,496)
Sep 23 16:12:58 Master heartbeat: [5494]: WARN: No STONITH device configured.
Sep 23 16:12:58 Master heartbeat: [5494]: WARN: Shared disks are not protected.
Sep 23 16:12:58 Master heartbeat: [5494]: info: Resources being acquired from 
slave.
Sep 23 16:12:58 Master heartbeat: [5503]: info: Starting 
"/usr/lib64/heartbeat/ipfail" as uid 498  gid 496 (pid 5503)
Sep 23 16:12:58 Master harc[5504]: info: Running /etc/ha.d/rc.d/status status
Sep 23 16:12:58 Master mach_down[5559]: info: /usr/share/heartbeat/mach_down: 
nice_failback: foreign resources acquired
Sep 23 16:12:58 Master mach_down[5559]: info: mach_down takeover complete for 
node slave.
Sep 23 16:12:58 Master heartbeat: [5494]: info: mach_down takeover complete.
Sep 23 16:12:58 Master heartbeat: [5494]: info: Initial resource acquisition 
complete (mach_down)
Sep 23 16:12:58 Master IPaddr[5580]: INFO:  Resource is stopped
Sep 23 16:12:58 Master heartbeat: [5505]: info: Local Resource acquisition 
completed.
Sep 23 16:12:58 Master harc[5643]: info: Running /etc/ha.d/rc.d/ip-request-resp 
ip-request-resp
Sep 23 16:12:58 Master ip-request-resp[5643]: received ip-request-resp 
123.456.789.123  OK yes
Sep 23 16:12:58 Master ResourceManager[5670]: info: Acquiring resource group: 
master 123.456.789.123
Sep 23 16:12:58 Master IPaddr[5700]: INFO:  Resource is stopped


Sep 23 16:12:58 Master ResourceManager[5670]: info: Running 
/etc/ha.d/resource.d/IPaddr 123.456.789.123  start
Sep 23 16:12:58 Master IPaddr[5767]: INFO: Using calculated nic for 
123.456.789.123  : eth0
Sep 23 16:12:58 Master IPaddr[5767]: INFO: Using calculated netmask for 
123.456.789.123  : 255.255.255.0
Sep 23 16:12:58 Master IPaddr[5767]: INFO: eval ifconfig eth0:0 123.456.789.123 
 netmask 255.255.255.0 broadcast 123.456.789.255
Sep 23 16:12:58 Master IPaddr[5767]: ERROR: Could not add 123.456.789.123  to 
eth0: 255
Sep 23 16:12:58 Master IPaddr[5751]: ERROR:  Unknown error: 255
Sep 23 16:12:58 Master ResourceManager[5670]: ERROR: Return code 1 from 
/etc/ha.d/resource.d/IPaddr
Sep 23 16:12:58 Master ResourceManager[5670]: CRIT: Giving up resources due to 
failure of 123.456.789.123
Sep 23 16:12:58 Master ResourceManager[5670]: info: Releasing resource group: 
master 123.456.789.123


Sep 23 16:12:58 Master ResourceManager[5670]: info: Running 
/etc/ha.d/resource.d/IPaddr 123.456.789.123  stop
Sep 23 16:12:59 Master IPaddr[5891]: INFO:  Success
Sep 23 16:13:08 Master heartbeat: [5494]: info: Local Resource acquisition 
completed. (none)
Sep 23 16:13:08 Master heartbeat: [5494]: info: local resource transition 
completed.
Sep 23 16:13:29 Master hb_standby[5934]: Going standby [foreign].
Sep 23 16:13:29 Master heartbeat: [5494]: info: master wants to go standby 
[foreign]
Sep 23 16:13:39 Master heartbeat: [5494]: WARN: No reply to standby request.  
Standby request cancelled.


After some time the master recovers the cluster IP but then heartbeat would 
stop working; maybe I need an additional IP for the whole cluster? Is it 
required to have 3 IPs for this? Please let me know what could be wrong...


-Ken

Confidential and proprietary information of Gaddis Partners Ltd. d/b/a T3 (T3) 
and/or its clients and licensors. These materials are intended for a specific 
recipient and purpose, and are protected by law. If you are not the intended 
recipient, please notify the sender by reply email. Any disclosure, copying, or 
distribution of these materials, or the taking of any action based on them, is 
strictly prohibited without obtaining the prior written consent of T3. Please 
contact T3 at [email protected] with questions.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to