Re: [Linux-HA] Strange failover

Dejan Muhamedagic Wed, 10 Dec 2008 11:22:38 -0800

Hi,

With a bit of delay ;-)


On Tue, Oct 28, 2008 at 10:45:49AM -0400, David Miller wrote:
> Hi All;
>
> I've looked at the log files until I'm blind and I've no idea why a simple 
> two node HA pair failed over.
>
> The setup is straightforward; two boxes with external RAID storage setup 
> for failover NFS services.
>
> Saturday morning the backup server had a problem with the scsi bus, 
> renegotiated to 40 MB/s, and remounted root in read only mode.  It 
> effectively went offline for a couple of minutes where ipfail and heartbeat 
> couldn't talk to the primary.  When it couldn't see the primary it decided 
> it was to take over, which makes sense.

It may make sense to the computers, but you'd probably hate to
lose data: use stonith!

> What doesn't make sense is that a couple of minutes later, when the scsi 
> bit was sorted out and the two servers talked to each other again, is that 
> they decided to failover to the backup.
>
> There was never anything wrong with the primary server.
>
> Any ideas what would have induced the failover?
>
> Relevant bits:
>
> Heartbeat v  2.0.7-2 installed via debian packages.  Debian etch, updated a 
> couple of weeks ago.  2.6.18-6-686 kernel.
>
> haresources has only this resource group:
>
> nfs1  IPaddr::192.168.120.230/24/eth0 drbddisk::all 
> Filesystem::/dev/drbd0::/nfs_exports::ext3 
> Filesystem::/dev/drbd1::/llnw_backups::ext3 nfs-kernel-server 
> start_dnotify.sh
>
> ha.cfg:
>
> logfacility     local0
> keepalive 4

You could heartbeat more often. Also, set warntime to twice
keepalive.

> deadtime 30

Increase deadtime to 60.

> bcast   eth0 eth1
> auto_failback off
> node nfs1
> node nfs2
> respawn hacluster /usr/lib/heartbeat/ipfail
> ping 192.168.120.1
>
>
> Logfile from nfs1 (primary server):
>
> Oct 25 06:43:16 nfs1 heartbeat: [9320]: WARN: node nfs2: is dead
> Oct 25 06:43:16 nfs1 heartbeat: [9320]: info: Dead node nfs2 gave up 
> resources.
> Oct 25 06:43:16 nfs1 heartbeat: [9320]: info: Link nfs2:eth0 dead.
> Oct 25 06:43:16 nfs1 heartbeat: [9320]: info: Link nfs2:eth1 dead.
> Oct 25 06:43:16 nfs1 ipfail: [9342]: info: Status update: Node nfs2 now has 
> status dead
> Oct 25 06:43:18 nfs1 ipfail: [9342]: info: NS: We are still alive!
> Oct 25 06:43:18 nfs1 ipfail: [9342]: info: Link Status update: Link 
> nfs2/eth0 now has status dead
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: info: For information on cluster 
> partitions, See URL: http://linux-ha.org/SplitBrain
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: WARN: Deadtime value may be too 
> small.
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: info: See FAQ for information on 
> tuning deadtime.
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: info: URL: 
> http://linux-ha.org/FAQ#heavy_load
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: info: Link nfs2:eth1 up.
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: WARN: Late heartbeat: Node nfs2: 
> interval 33030 ms
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: info: Status update for node nfs2: 
> status active
> Oct 25 06:43:19 nfs1 heartbeat: [9320]: info: Link nfs2:eth0 up.
> Oct 25 06:43:19 nfs1 harc[6029]: info: Running /etc/ha.d/rc.d/status status
> Oct 25 06:43:19 nfs1 ipfail: [9342]: info: Asking other side for ping node 
> count.
> Oct 25 06:43:19 nfs1 ipfail: [9342]: info: Checking remote count of ping 
> nodes.
> Oct 25 06:43:19 nfs1 ipfail: [9342]: info: Link Status update: Link 
> nfs2/eth1 now has status dead
> Oct 25 06:43:21 nfs1 ipfail: [9342]: info: Asking other side for ping node 
> count.
> Oct 25 06:43:21 nfs1 ipfail: [9342]: info: Checking remote count of ping 
> nodes.
> Oct 25 06:43:21 nfs1 ipfail: [9342]: info: Link Status update: Link 
> nfs2/eth1 now has status up
> Oct 25 06:43:21 nfs1 ipfail: [9342]: info: Status update: Node nfs2 now has 
> status active
> Oct 25 06:43:21 nfs1 ipfail: [9342]: info: Link Status update: Link 
> nfs2/eth0 now has status up
> Oct 25 06:43:21 nfs1 heartbeat: [9320]: info: Heartbeat shutdown in 
> progress. (9320)
> Oct 25 06:43:21 nfs1 heartbeat: [6039]: info: Giving up all HA resources.
> Oct 25 06:43:21 nfs1 ResourceManager[6049]: info: Releasing resource group: 
> nfs1 IPaddr::192.168.120.230/24/eth0 drbddisk::all 
> Filesystem::/dev/drbd0::/nfs_exports::ext3 
> Filesystem::/dev/drbd1::/llnw_backups::ext3 nfs-kernel-server 
> start_dnotify.sh
> Oct 25 06:43:21 nfs1 ResourceManager[6049]: info: Running 
> /etc/ha.d/resource.d/start_dnotify.sh  stop
> Oct 25 06:43:22 nfs1 ResourceManager[6049]: info: Running 
> /etc/init.d/nfs-kernel-server  stop
> Oct 25 06:43:22 nfs1 ipfail: [9342]: info: No giveup timer to abort.
> Oct 25 06:43:22 nfs1 kernel: nfsd: last server has exited
> Oct 25 06:43:22 nfs1 ResourceManager[6049]: info: Running 
> /etc/ha.d/resource.d/Filesystem /dev/drbd1 /llnw_backups ext3 stop
> Oct 25 06:43:22 nfs1 Filesystem[6203]: INFO: Running stop for /dev/drbd1 on 
> /llnw_backups
> Oct 25 06:43:22 nfs1 Filesystem[6203]: INFO: Trying to unmount /llnw_backups
> Oct 25 06:43:22 nfs1 Filesystem[6203]: INFO: unmounted /llnw_backups 
> successfully
> Oct 25 06:43:22 nfs1 Filesystem[6139]: INFO: Filesystem Success
> Oct 25 06:43:22 nfs1 ResourceManager[6049]: info: Running 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /nfs_exports ext3 stop
> Oct 25 06:43:22 nfs1 Filesystem[6329]: INFO: Running stop for /dev/drbd0 on 
> /nfs_exports
> Oct 25 06:43:22 nfs1 Filesystem[6329]: INFO: Trying to unmount /nfs_exports
> Oct 25 06:43:22 nfs1 Filesystem[6329]: INFO: Some processes on /nfs_exports 
> were signalled
> Oct 25 06:43:23 nfs1 ipfail: [9342]: info: No giveup timer to abort.
> Oct 25 06:43:25 nfs1 Filesystem[6329]: INFO: unmounted /nfs_exports 
> successfully
> Oct 25 06:43:25 nfs1 Filesystem[6265]: INFO: Filesystem Success
> Oct 25 06:43:25 nfs1 ResourceManager[6049]: info: Running 
> /etc/ha.d/resource.d/drbddisk all stop
> Oct 25 06:43:26 nfs1 kernel: drbd0: Primary/Secondary --> 
> Secondary/Secondary
> Oct 25 06:43:26 nfs1 kernel: drbd1: Primary/Secondary --> 
> Secondary/Secondary
> Oct 25 06:43:26 nfs1 ResourceManager[6049]: info: Running 
> /etc/ha.d/resource.d/IPaddr 192.168.120.230/24/eth0 stop
> Oct 25 06:43:27 nfs1 IPaddr[6525]: INFO: /sbin/route -n del -host 
> 192.168.120.230
> Oct 25 06:43:27 nfs1 IPaddr[6525]: INFO: /sbin/ifconfig eth0:0 
> 192.168.120.230 down
> Oct 25 06:43:27 nfs1 IPaddr[6525]: INFO: IP Address 192.168.120.230 released
> Oct 25 06:43:27 nfs1 IPaddr[6443]: INFO: IPaddr Success
> Oct 25 06:43:27 nfs1 heartbeat: [6039]: info: All HA resources relinquished.
> Oct 25 06:43:28 nfs1 kernel: drbd0: Secondary/Secondary --> 
> Secondary/Primary
> Oct 25 06:43:28 nfs1 kernel: drbd1: Secondary/Secondary --> 
> Secondary/Primary
> Oct 25 06:43:28 nfs1 heartbeat: [9320]: info: killing 
> /usr/lib/heartbeat/ipfail process group 9342 with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: killing HBWRITE process 9328 
> with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: killing HBREAD process 9329 
> with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: killing HBFIFO process 9323 
> with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: killing HBWRITE process 9324 
> with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: killing HBREAD process 9325 
> with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: killing HBWRITE process 9326 
> with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: killing HBREAD process 9327 
> with signal 15
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Core process 9328 exited. 7 
> remaining
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Core process 9323 exited. 6 
> remaining
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Core process 9329 exited. 5 
> remaining
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Core process 9325 exited. 4 
> remaining
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Core process 9326 exited. 3 
> remaining
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Core process 9327 exited. 2 
> remaining
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Core process 9324 exited. 1 
> remaining
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: nfs1 Heartbeat shutdown 
> complete.
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Heartbeat restart triggered.
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Restarting heartbeat.
> Oct 25 06:43:30 nfs1 heartbeat: [9320]: info: Performing heartbeat restart 
> exec.
> Oct 25 06:43:48 nfs1 syslogd 1.4.1#18: restart.
> Oct 25 06:44:01 nfs1 heartbeat: [9320]: WARN: Core dumps could be lost if 
> multiple dumps occur
> Oct 25 06:44:01 nfs1 heartbeat: [9320]: WARN: Consider setting 
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum 
> supportability
> Oct 25 06:44:01 nfs1 heartbeat: [9320]: WARN: Logging daemon is disabled 
> --enabling logging daemon is recommended
> Oct 25 06:44:01 nfs1 heartbeat: [9320]: info: **************************
> [finishes restarting heartbeat]

This node decided to reboot after realizing that there was a
split brain condition. That's a (very simple) recovery procedure.
The other node never figured out that there was a split brain due
to lack of cpu attention/hw problems. It, the secondary node,
just noticed that the primary node was leaving.

In short: heartbeat reacted as it should have.

If you experience often hardware induced heartbeat delays,
consider increasing deadtime, or, even better, fixing/replacing
that SCSI controller.

Sorry for such a late reply.

Thanks,

Dejan

> ===============================
>
> Log file from nfs2:
>
> Oct 25 06:42:00 nfs2 kernel: mptscsih: ioc0: attempting task abort! 
> (sc=df8111c0)
> Oct 25 06:42:00 nfs2 kernel: sd 0:0:1:0:
> Oct 25 06:42:00 nfs2 kernel:         command: Write(10): 2a 00 28 6c 02 8f 
> 00 00 30 00
> Oct 25 06:42:02 nfs2 kernel: mptbase: Initiating ioc0 recovery
> Oct 25 06:42:08 nfs2 kernel: mptscsih: ioc0: task abort: SUCCESS 
> (sc=df8111c0)
> Oct 25 06:42:08 nfs2 kernel:  target0:0:0: Beginning Domain Validation
> Oct 25 06:42:09 nfs2 kernel:  target0:0:0: Domain Validation skipping write 
> tests
> Oct 25 06:42:09 nfs2 kernel:  target0:0:0: Ending Domain Validation
> Oct 25 06:42:09 nfs2 kernel:  target0:0:0: FAST-40 SCSI 40.0 MB/s ST (25 
> ns, offset 127)
> Oct 25 06:42:09 nfs2 kernel:  target0:0:1: Beginning Domain Validation
> Oct 25 06:42:09 nfs2 kernel:  target0:0:1: Domain Validation skipping write 
> tests
> Oct 25 06:42:09 nfs2 kernel:  target0:0:1: Ending Domain Validation
> Oct 25 06:42:09 nfs2 kernel:  target0:0:1: FAST-40 SCSI 40.0 MB/s ST (25 
> ns, offset 127)
> Oct 25 06:43:19 nfs2 kernel: mptscsih: ioc0: attempting task abort! 
> (sc=eb523980)
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: G_CH_dispatch_int: Dispatch 
> function for read child took too long to execute: 31980 ms (> 50 ms) 
> (GSource: 0x80fbbb8)
> Oct 25 06:43:19 nfs2 kernel: sd 0:0:1:0:
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: Gmain_timeout_dispatch: 
> Dispatch function for send local status was delayed 29200 ms (> 2010 ms) 
> before being called (GSource: 0x80fbeb0)
> Oct 25 06:43:19 nfs2 kernel:         command: Write(10): 2a 00 00 01 b9 17 
> 00 00 28 00
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: info: Gmain_timeout_dispatch: 
> started at 1751846343 should have started at 1751843423
> Oct 25 06:43:19 nfs2 kernel: mptbase: Initiating ioc0 recovery
> Oct 25 06:43:19 nfs2 kernel: mptscsih: ioc0: task abort: SUCCESS 
> (sc=eb523980)
> Oct 25 06:43:19 nfs2 kernel:  target0:0:0: Beginning Domain Validation
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: Late heartbeat: Node nfs2: 
> interval 33200 ms
> Oct 25 06:43:19 nfs2 kernel:  target0:0:0: Domain Validation skipping write 
> tests
> Oct 25 06:43:19 nfs2 kernel:  target0:0:0: Ending Domain Validation
> Oct 25 06:43:19 nfs2 kernel:  target0:0:0: FAST-40 SCSI 40.0 MB/s ST (25 
> ns, offset 127)
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: Late heartbeat: Node nfs1: 
> interval 33980 ms
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: Late heartbeat: Node 
> 192.168.120.1: interval 32980 ms
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: Gmain_timeout_dispatch: 
> Dispatch function for check for signals was delayed 31670 ms (> 2010 ms) 
> before being called (GSource: 0x80fc1b8)
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: info: Gmain_timeout_dispatch: 
> started at 1751846344 should have started at 1751843177
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: Gmain_timeout_dispatch: 
> Dispatch function for update msgfree count was delayed 31990 ms (> 30000 
> ms) before being called (GSource: 0x80fc288)
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: info: Gmain_timeout_dispatch: 
> started at 1751846344 should have started at 1751843145
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: WARN: Gmain_timeout_dispatch: 
> Dispatch function for client audit was delayed 27510 ms (> 5000 ms) before 
> being called (GSource: 0x80fc0e8)
> Oct 25 06:43:19 nfs2 heartbeat: [22707]: info: Gmain_timeout_dispatch: 
> started at 1751846344 should have started at 1751843593
> Oct 25 06:43:19 nfs2 kernel:  target0:0:1: Beginning Domain Validation
> Oct 25 06:43:19 nfs2 kernel:  target0:0:1: Domain Validation skipping write 
> tests
> Oct 25 06:43:19 nfs2 kernel:  target0:0:1: Ending Domain Validation
> Oct 25 06:43:19 nfs2 kernel:  target0:0:1: FAST-40 SCSI 40.0 MB/s ST (25 
> ns, offset 127)
> Oct 25 06:43:21 nfs2 ipfail: [22731]: info: Ping node count is balanced.
> Oct 25 06:43:22 nfs2 ipfail: [22731]: info: Ping node count is balanced.
> Oct 25 06:43:26 nfs2 kernel: drbd0: Secondary/Primary --> 
> Secondary/Secondary
> Oct 25 06:43:26 nfs2 kernel: drbd1: Secondary/Primary --> 
> Secondary/Secondary
> Oct 25 06:43:27 nfs2 heartbeat: [22707]: info: Received shutdown notice 
> from 'nfs1'.
> Oct 25 06:43:27 nfs2 heartbeat: [22707]: info: Resources being acquired 
> from nfs1.
> Oct 25 06:43:27 nfs2 heartbeat: [32715]: info: acquire all HA resources 
> (standby).
> Oct 25 06:43:28 nfs2 heartbeat: [32716]: info: No local resources 
> [/usr/lib/heartbeat/ResourceManager listkeys nfs2] to acquire.
> Oct 25 06:43:28 nfs2 ResourceManager[32735]: info: Acquiring resource 
> group: nfs1 IPaddr::192.168.120.230/24/eth0 drbddisk::all 
> Filesystem::/dev/drbd0::/nfs_exports::ext3 
> Filesystem::/dev/drbd1::/llnw_backups::ext3 nfs-kernel-server 
> start_dnotify.sh
> Oct 25 06:43:28 nfs2 IPaddr[32759]: INFO: IPaddr Resource is stopped
> Oct 25 06:43:28 nfs2 ResourceManager[32735]: info: Running 
> /etc/ha.d/resource.d/IPaddr 192.168.120.230/24/eth0 start
> Oct 25 06:43:28 nfs2 IPaddr[491]: INFO: eval /sbin/ifconfig eth0:0 
> 192.168.120.230 netmask 255.255.255.0 broadcast 192.168.120.255
> Oct 25 06:43:28 nfs2 IPaddr[491]: INFO: Sending Gratuitous Arp for 
> 192.168.120.230 on eth0:0 [eth0]
> Oct 25 06:43:28 nfs2 IPaddr[491]: INFO: /usr/lib/heartbeat/send_arp -i 500 
> -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.120.230 eth0 
> 192.168.120.230 auto 192.168.120.230 ffffffffffff
> Oct 25 06:43:28 nfs2 IPaddr[409]: INFO: IPaddr Success
> Oct 25 06:43:28 nfs2 ResourceManager[32735]: info: Running 
> /etc/ha.d/resource.d/drbddisk all start
> Oct 25 06:43:28 nfs2 kernel: drbd0: Secondary/Secondary --> 
> Primary/Secondary
> Oct 25 06:43:28 nfs2 kernel: drbd1: Secondary/Secondary --> 
> Primary/Secondary
> Oct 25 06:43:28 nfs2 Filesystem[697]: INFO: Running status for /dev/drbd0 
> on /nfs_exports
> Oct 25 06:43:28 nfs2 Filesystem[697]: INFO: /nfs_exports is unmounted 
> (stopped)
> Oct 25 06:43:28 nfs2 Filesystem[633]: INFO: Filesystem Resource is stopped
> Oct 25 06:43:28 nfs2 ResourceManager[32735]: info: Running 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /nfs_exports ext3 start
> Oct 25 06:43:28 nfs2 Filesystem[812]: INFO: Running start for /dev/drbd0 on 
> /nfs_exports
> Oct 25 06:43:28 nfs2 kernel: kjournald starting.  Commit interval 5 seconds
> Oct 25 06:43:28 nfs2 kernel: EXT3 FS on drbd0, internal journal
> Oct 25 06:43:28 nfs2 kernel: EXT3-fs: mounted filesystem with ordered data 
> mode.
> Oct 25 06:43:28 nfs2 Filesystem[743]: INFO: Filesystem Success
> Oct 25 06:43:28 nfs2 Filesystem[921]: INFO: Running status for /dev/drbd1 
> on /llnw_backups
> Oct 25 06:43:28 nfs2 Filesystem[921]: INFO: /llnw_backups is unmounted 
> (stopped)
> Oct 25 06:43:28 nfs2 Filesystem[853]: INFO: Filesystem Resource is stopped
> Oct 25 06:43:28 nfs2 ResourceManager[32735]: info: Running 
> /etc/ha.d/resource.d/Filesystem /dev/drbd1 /llnw_backups ext3 start
> Oct 25 06:43:28 nfs2 Filesystem[1030]: INFO: Running start for /dev/drbd1 
> on /llnw_backups
> Oct 25 06:43:28 nfs2 kernel: kjournald starting.  Commit interval 5 seconds
> Oct 25 06:43:28 nfs2 kernel: EXT3 FS on drbd1, internal journal
> Oct 25 06:43:28 nfs2 kernel: EXT3-fs: mounted filesystem with ordered data 
> mode.
> Oct 25 06:43:28 nfs2 Filesystem[966]: INFO: Filesystem Success
> Oct 25 06:43:28 nfs2 ResourceManager[32735]: info: Running 
> /etc/init.d/nfs-kernel-server  start
> Oct 25 06:43:28 nfs2 kernel: NFSD: Using /var/lib/nfs/v4recovery as the 
> NFSv4 state recovery directory
> Oct 25 06:43:28 nfs2 kernel: NFSD: starting 90-second grace period
> Oct 25 06:43:28 nfs2 ResourceManager[32735]: info: Running 
> /etc/ha.d/resource.d/start_dnotify.sh  start
> Oct 25 06:43:32 nfs2 heartbeat: [32715]: info: all HA resource acquisition 
> completed (standby).
> Oct 25 06:43:32 nfs2 heartbeat: [22707]: info: Standby resource acquisition 
> done [all].
> Oct 25 06:43:32 nfs2 harc[1324]: info: Running /etc/ha.d/rc.d/status status
> Oct 25 06:43:32 nfs2 mach_down[1334]: info: Taking over resource group 
> IPaddr::192.168.120.230/24/eth0
> Oct 25 06:43:32 nfs2 ResourceManager[1354]: info: Acquiring resource group: 
> nfs1 IPaddr::192.168.120.230/24/eth0 drbddisk::all 
> Filesystem::/dev/drbd0::/nfs_exports::ext3 
> Filesystem::/dev/drbd1::/llnw_backups::ext3 nfs-kernel-server 
> start_dnotify.sh
> Oct 25 06:43:32 nfs2 IPaddr[1378]: INFO: IPaddr Running OK
> Oct 25 06:43:32 nfs2 ResourceManager[1354]: info: Running 
> /etc/ha.d/resource.d/drbddisk all start
> Oct 25 06:43:32 nfs2 Filesystem[1601]: INFO: Running status for /dev/drbd0 
> on /nfs_exports
> Oct 25 06:43:32 nfs2 Filesystem[1601]: INFO: /nfs_exports is mounted 
> (running)
> Oct 25 06:43:32 nfs2 Filesystem[1537]: INFO: Filesystem Running OK
> Oct 25 06:43:33 nfs2 Filesystem[1700]: INFO: Running status for /dev/drbd1 
> on /llnw_backups
> Oct 25 06:43:33 nfs2 Filesystem[1700]: INFO: /llnw_backups is mounted 
> (running)
> Oct 25 06:43:33 nfs2 Filesystem[1636]: INFO: Filesystem Running OK
> Oct 25 06:43:33 nfs2 ResourceManager[1354]: info: Running 
> /etc/ha.d/resource.d/start_dnotify.sh  start
> Oct 25 06:43:33 nfs2 mach_down[1334]: info: /usr/lib/heartbeat/mach_down: 
> nice_failback: foreign resources acquired
> Oct 25 06:43:33 nfs2 mach_down[1334]: info: mach_down takeover complete for 
> node nfs1.
> Oct 25 06:43:33 nfs2 heartbeat: [22707]: info: mach_down takeover complete.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Strange failover

Reply via email to