On Tue, Aug 10, 2010 at 3:25 PM, David Lang <[email protected]> wrote: > could you re-post the files (log files, ha.cf and haresources from each box) >
Log file from pfs-srv3 Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: other_holds_resources: 0 Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Received shutdown notice from 'pfs-srv4'. Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Resources being acquired from pfs-srv4. Aug 10 17:08:28 pfs-srv3 heartbeat: [1528]: info: acquire local HA resources (standby). Aug 10 17:08:28 pfs-srv3 heartbeat: [1528]: info: go_standby: who: 2 resource set: local Aug 10 17:08:28 pfs-srv3 heartbeat: [1528]: info: go_standby: (query/action): (ourkeys/takegroup) Aug 10 17:08:28 pfs-srv3 ResourceManager[1558]: [1577]: info: Acquiring resource group: pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 nfs-kernel-server smbd Aug 10 17:08:28 pfs-srv3 heartbeat: [1529]: info: 1 local resources from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3] Aug 10 17:08:28 pfs-srv3 heartbeat: [1529]: info: Local Resource acquisition completed. Aug 10 17:08:28 pfs-srv3 heartbeat: [1529]: info: FIFO message [type resource] written rc=79 Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Managed req_our_resources(ask) process 1529 exited with return code 0. Aug 10 17:08:28 pfs-srv3 Filesystem[1619]: [1658]: INFO: Resource is stopped Aug 10 17:08:28 pfs-srv3 ResourceManager[1558]: [1673]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /pfs ext3 start Aug 10 17:08:28 pfs-srv3 Filesystem[1682]: [1710]: INFO: Running start for /dev/drbd0 on /pfs Aug 10 17:08:28 pfs-srv3 Filesystem[1676]: [1727]: INFO: Success Aug 10 17:08:28 pfs-srv3 IPaddr[1741]: [1770]: INFO: Resource is stopped Aug 10 17:08:28 pfs-srv3 ResourceManager[1558]: [1787]: info: Running /etc/ha.d/resource.d/IPaddr 10.1.8.45/24 start Aug 10 17:08:28 pfs-srv3 IPaddr[1811]: [1836]: INFO: Using calculated nic for 10.1.8.45: eth0 Aug 10 17:08:28 pfs-srv3 IPaddr[1811]: [1842]: INFO: Using calculated netmask for 10.1.8.45: 255.255.255.0 Aug 10 17:08:28 pfs-srv3 IPaddr[1811]: [1866]: INFO: eval ifconfig eth0:0 10.1.8.45 netmask 255.255.255.0 broadcast 10.1.8.255 Aug 10 17:08:28 pfs-srv3 IPaddr[1790]: [1887]: INFO: Success Aug 10 17:08:28 pfs-srv3 ResourceManager[1558]: [1907]: info: Running /etc/init.d/nfs-kernel-server start Aug 10 17:08:28 pfs-srv3 ResourceManager[1558]: [1967]: info: Running /etc/init.d/smbd start Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: WARN: Shutdown delayed until current resource activity finishes. Aug 10 17:08:28 pfs-srv3 heartbeat: [1528]: info: local HA resource acquisition completed (standby). Aug 10 17:08:28 pfs-srv3 heartbeat: [1528]: info: FIFO message [type ask_resources] written rc=47 Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Standby resource acquisition done [all]. Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: AnnounceTakeover(local 1, foreign 1, reason 'auto_failback' (1)) Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: New standby state: 0 Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Managed go_standby process 1528 exited with return code 0. Aug 10 17:08:28 pfs-srv3 harc[1982]: [1990]: info: Running /etc/ha.d//rc.d/status status Aug 10 17:08:28 pfs-srv3 mach_down[1995]: [2015]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired Aug 10 17:08:28 pfs-srv3 mach_down[1995]: [2020]: info: mach_down takeover complete for node pfs-srv4. Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: mach_down takeover complete. Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: AnnounceTakeover(local 1, foreign 1, reason 'mach_down' (1)) Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Managed status process 1982 exited with return code 0. Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: hb_giveup_resources(): current status: active Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Heartbeat shutdown in progress. (1216) Aug 10 17:08:28 pfs-srv3 heartbeat: [2021]: info: Giving up all HA resources. Aug 10 17:08:28 pfs-srv3 ResourceManager[2035]: [2046]: info: Releasing resource group: pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 nfs-kernel-server smbd Aug 10 17:08:28 pfs-srv3 ResourceManager[2035]: [2057]: info: Running /etc/init.d/smbd stop Aug 10 17:08:29 pfs-srv3 ResourceManager[2035]: [2080]: info: Running /etc/init.d/nfs-kernel-server stop Aug 10 17:08:29 pfs-srv3 ResourceManager[2035]: [2107]: info: Running /etc/ha.d/resource.d/IPaddr 10.1.8.45/24 stop Aug 10 17:08:29 pfs-srv3 IPaddr[2131]: [2146]: INFO: ifconfig eth0:0 down Aug 10 17:08:29 pfs-srv3 IPaddr[2110]: [2150]: INFO: Success Aug 10 17:08:29 pfs-srv3 ResourceManager[2035]: [2167]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /pfs ext3 stop Aug 10 17:08:29 pfs-srv3 Filesystem[2176]: [2204]: INFO: Running stop for /dev/drbd0 on /pfs Aug 10 17:08:29 pfs-srv3 Filesystem[2176]: [2219]: INFO: Trying to unmount /pfs Aug 10 17:08:29 pfs-srv3 Filesystem[2176]: [2227]: INFO: unmounted /pfs successfully Aug 10 17:08:29 pfs-srv3 Filesystem[2170]: [2234]: INFO: Success Aug 10 17:08:29 pfs-srv3 ResourceManager[2035]: [2251]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop Aug 10 17:08:29 pfs-srv3 heartbeat: [2021]: info: All HA resources relinquished. Aug 10 17:08:29 pfs-srv3 heartbeat: [2021]: info: FIFO message [type shutdone] written rc=27 Aug 10 17:08:31 pfs-srv3 heartbeat: [1216]: info: killing HBFIFO process 1255 with signal 15 Aug 10 17:08:31 pfs-srv3 heartbeat: [1216]: info: killing HBWRITE process 1256 with signal 15 Aug 10 17:08:31 pfs-srv3 heartbeat: [1216]: info: killing HBREAD process 1257 with signal 15 Aug 10 17:08:31 pfs-srv3 heartbeat: [1216]: info: Core process 1255 exited. 3 remaining Aug 10 17:08:31 pfs-srv3 heartbeat: [1216]: info: Core process 1257 exited. 2 remaining Aug 10 17:08:31 pfs-srv3 heartbeat: [1216]: info: Core process 1256 exited. 1 remaining Aug 10 17:08:31 pfs-srv3 heartbeat: [1216]: info: pfs-srv3 Heartbeat shutdown complete. Aug 10 17:08:32 pfs-srv3 logd: [979]: info: logd_term_write_action: received SIGTERM Aug 10 17:08:33 pfs-srv3 logd: [979]: info: Exiting write process Log file from pfs-srv4: Aug 10 17:08:28 pfs-srv4 heartbeat: [1168]: info: Heartbeat shutdown in progress. (1168) Aug 10 17:08:28 pfs-srv4 heartbeat: [1340]: info: Giving up all HA resources. Aug 10 17:08:28 pfs-srv4 ResourceManager[1354]: [1365]: info: Releasing resource group: pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 nfs-kernel-server smbd Aug 10 17:08:28 pfs-srv4 ResourceManager[1354]: [1376]: info: Running /etc/init.d/smbd stop Aug 10 17:08:28 pfs-srv4 ResourceManager[1354]: [1395]: info: Running /etc/init.d/nfs-kernel-server stop Aug 10 17:08:28 pfs-srv4 ResourceManager[1354]: [1421]: info: Running /etc/ha.d/resource.d/IPaddr 10.1.8.45/24 stop Aug 10 17:08:28 pfs-srv4 IPaddr[1424]: [1453]: INFO: Success Aug 10 17:08:28 pfs-srv4 ResourceManager[1354]: [1470]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /pfs ext3 stop Aug 10 17:08:28 pfs-srv4 Filesystem[1479]: [1507]: INFO: Running stop for /dev/drbd0 on /pfs Aug 10 17:08:28 pfs-srv4 Filesystem[1473]: [1519]: INFO: Success Aug 10 17:08:28 pfs-srv4 ResourceManager[1354]: [1536]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop Aug 10 17:08:28 pfs-srv4 heartbeat: [1340]: info: All HA resources relinquished. Aug 10 17:08:28 pfs-srv4 heartbeat: [1340]: info: FIFO message [type shutdone] written rc=27 Aug 10 17:08:28 pfs-srv4 heartbeat: [1168]: info: other_holds_resources: 3 Aug 10 17:08:29 pfs-srv4 heartbeat: [1168]: WARN: 1 lost packet(s) for [pfs-srv3] [2631:2633] Aug 10 17:08:29 pfs-srv4 heartbeat: [1168]: info: other_holds_resources: 3 Aug 10 17:08:29 pfs-srv4 heartbeat: [1168]: info: No pkts missing from pfs-srv3! Aug 10 17:08:29 pfs-srv4 heartbeat: [1168]: info: other_holds_resources: 3 Aug 10 17:08:29 pfs-srv4 heartbeat: [1168]: info: other_holds_resources: 0 Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: Received shutdown notice from 'pfs-srv3'. Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: Resource takeover cancelled - shutdown in progress. Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: killing HBREAD process 1196 with signal 15 Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: killing HBFIFO process 1194 with signal 15 Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: killing HBWRITE process 1195 with signal 15 Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: Core process 1194 exited. 3 remaining Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: Core process 1195 exited. 2 remaining Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: Core process 1196 exited. 1 remaining Aug 10 17:08:30 pfs-srv4 heartbeat: [1168]: info: pfs-srv4 Heartbeat shutdown complete. Aug 10 17:08:31 pfs-srv4 logd: [1002]: info: logd_term_write_action: received SIGTERM Aug 10 17:08:31 pfs-srv4 logd: [1002]: info: Exiting write process haresources file from both r...@pfs-srv3:~# cat /etc/ha.d/haresources pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 nfs-kernel-server smbd r...@pfs-srv4:~# cat /etc/ha.d/haresources pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 nfs-kernel-server smbd ha.cf from both: r...@pfs-srv3:~# cat /etc/ha.d/ha.cf use_logd on udpport 12694 keepalive 1 warntime 15 deadtime 20 debug 1 initdead 180 bcast eth1 node pfs-srv3 node pfs-srv4 auto_failback on crm off r...@pfs-srv4:~# cat /etc/ha.d/ha.cf use_logd on udpport 12694 keepalive 1 warntime 15 deadtime 20 debug 1 initdead 180 bcast eth1 node pfs-srv3 node pfs-srv4 auto_failback on crm off _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
