Hi, Somehow ip fail over occurs right after I start the service on both master/slave servers. Here are my specs:
Master OS: centos 5.3 Hostname: master Public IP: 123.456.789.123 (This is the shared IP as well) Private IP: 12.345.678.123 Slave OS: centos 5.3 Hostname: master Public IP: 123.456.789.456 Private IP: 12.345.678.456 The Slave seems to be always active when starting the service because I always see the virtual interface eth0:1; the master never creates a virtual interface, maybe because it's already using the cluster IP? This is what I get in the log: Sep 23 16:12:58 Master heartbeat: [5494]: WARN: node slave: is dead Sep 23 16:12:58 Master heartbeat: [5494]: info: Comm_now_up(): updating status to active Sep 23 16:12:58 Master heartbeat: [5494]: info: Local status now set to: 'active' Sep 23 16:12:58 Master heartbeat: [5494]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,496) Sep 23 16:12:58 Master heartbeat: [5494]: WARN: No STONITH device configured. Sep 23 16:12:58 Master heartbeat: [5494]: WARN: Shared disks are not protected. Sep 23 16:12:58 Master heartbeat: [5494]: info: Resources being acquired from slave. Sep 23 16:12:58 Master heartbeat: [5503]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 498 gid 496 (pid 5503) Sep 23 16:12:58 Master harc[5504]: info: Running /etc/ha.d/rc.d/status status Sep 23 16:12:58 Master mach_down[5559]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired Sep 23 16:12:58 Master mach_down[5559]: info: mach_down takeover complete for node slave. Sep 23 16:12:58 Master heartbeat: [5494]: info: mach_down takeover complete. Sep 23 16:12:58 Master heartbeat: [5494]: info: Initial resource acquisition complete (mach_down) Sep 23 16:12:58 Master IPaddr[5580]: INFO: Resource is stopped Sep 23 16:12:58 Master heartbeat: [5505]: info: Local Resource acquisition completed. Sep 23 16:12:58 Master harc[5643]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp Sep 23 16:12:58 Master ip-request-resp[5643]: received ip-request-resp 123.456.789.123 OK yes Sep 23 16:12:58 Master ResourceManager[5670]: info: Acquiring resource group: master 123.456.789.123 Sep 23 16:12:58 Master IPaddr[5700]: INFO: Resource is stopped Sep 23 16:12:58 Master ResourceManager[5670]: info: Running /etc/ha.d/resource.d/IPaddr 123.456.789.123 start Sep 23 16:12:58 Master IPaddr[5767]: INFO: Using calculated nic for 123.456.789.123 : eth0 Sep 23 16:12:58 Master IPaddr[5767]: INFO: Using calculated netmask for 123.456.789.123 : 255.255.255.0 Sep 23 16:12:58 Master IPaddr[5767]: INFO: eval ifconfig eth0:0 123.456.789.123 netmask 255.255.255.0 broadcast 123.456.789.255 Sep 23 16:12:58 Master IPaddr[5767]: ERROR: Could not add 123.456.789.123 to eth0: 255 Sep 23 16:12:58 Master IPaddr[5751]: ERROR: Unknown error: 255 Sep 23 16:12:58 Master ResourceManager[5670]: ERROR: Return code 1 from /etc/ha.d/resource.d/IPaddr Sep 23 16:12:58 Master ResourceManager[5670]: CRIT: Giving up resources due to failure of 123.456.789.123 Sep 23 16:12:58 Master ResourceManager[5670]: info: Releasing resource group: master 123.456.789.123 Sep 23 16:12:58 Master ResourceManager[5670]: info: Running /etc/ha.d/resource.d/IPaddr 123.456.789.123 stop Sep 23 16:12:59 Master IPaddr[5891]: INFO: Success Sep 23 16:13:08 Master heartbeat: [5494]: info: Local Resource acquisition completed. (none) Sep 23 16:13:08 Master heartbeat: [5494]: info: local resource transition completed. Sep 23 16:13:29 Master hb_standby[5934]: Going standby [foreign]. Sep 23 16:13:29 Master heartbeat: [5494]: info: master wants to go standby [foreign] Sep 23 16:13:39 Master heartbeat: [5494]: WARN: No reply to standby request. Standby request cancelled. After some time the master recovers the cluster IP but then heartbeat would stop working; maybe I need an additional IP for the whole cluster? Is it required to have 3 IPs for this? Please let me know what could be wrong... -Ken Confidential and proprietary information of Gaddis Partners Ltd. d/b/a T3 (T3) and/or its clients and licensors. These materials are intended for a specific recipient and purpose, and are protected by law. If you are not the intended recipient, please notify the sender by reply email. Any disclosure, copying, or distribution of these materials, or the taking of any action based on them, is strictly prohibited without obtaining the prior written consent of T3. Please contact T3 at [email protected] with questions. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
