Dejan Muhamedagic wrote:
Hi,

On Thu, Jan 24, 2008 at 09:39:05AM +1300, Steve Wray wrote:
Well I posted my config and I've tried various things and tested this setup... and it still behaves incorrectly: going primary in the event of a complete loss of network connectivity.

I mean... its an NFS server... *network* filesystem. If it can't connect to the network *at* *all* it makes no sense to become the primary NFS server...

I'd really appreciate some comment on what may be wrong in the config files that I've posted. If theres any further info that I need to post please mention it.

Did you check if ipfail is running? If not, then you have to
check the user in the respawn line. Otherwise, please post the
logs.

Thanks for your reply!

ipfail is running, the user in the respawn line is correct.

I just ran a test failure of the network interface in the non-primary node. Here are the logs from this test run only from the 'failed' node.


ipfail determines that "We are dead" and then heartbeat decides to take over as primary.

Could this be a problem with "/etc/ha.d/rc.d/status status"?

Jan 25 08:38:55 drbd-test-2 heartbeat[2468]: WARN: node 10.10.10.1: is dead
Jan 25 08:38:55 drbd-test-2 heartbeat[2468]: info: Link drbd-test-1:eth0 dead.
Jan 25 08:38:55 drbd-test-2 heartbeat[2468]: info: Link 10.10.10.1:10.10.10.1 
dead.
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: Status update: Node 10.10.10.1 
now has status dead
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: NS: We are dead. :<
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: Link Status update: Link 
drbd-test-1/eth0 now has status dead
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: We are dead. :<
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: Asking other side for ping node 
count.
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: Link Status update: Link 
10.10.10.1/10.10.10.1 now has status dead
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: We are dead. :<
Jan 25 08:38:55 drbd-test-2 ipfail[2478]: info: Asking other side for ping node 
count.
Jan 25 08:38:55 drbd-test-2 heartbeat: info: Running /etc/ha.d/rc.d/status 
status
Jan 25 08:38:56 drbd-test-2 kernel: drbd0: drbd0_asender [2519]: cstate 
Connected --> NetworkFailure
Jan 25 08:38:56 drbd-test-2 kernel: drbd0: asender terminated
Jan 25 08:38:56 drbd-test-2 kernel: drbd0: drbd0_receiver [2514]: cstate 
NetworkFailure --> BrokenPipe
Jan 25 08:38:56 drbd-test-2 kernel: drbd0: worker terminated
Jan 25 08:38:56 drbd-test-2 kernel: drbd0: drbd0_receiver [2514]: cstate 
BrokenPipe --> Unconnected
Jan 25 08:38:56 drbd-test-2 kernel: drbd0: Connection lost.
Jan 25 08:38:56 drbd-test-2 kernel: drbd0: drbd0_receiver [2514]: cstate 
Unconnected --> WFConnection
Jan 25 08:39:20 drbd-test-2 heartbeat[2468]: WARN: node drbd-test-1: is dead
Jan 25 08:39:20 drbd-test-2 heartbeat[2468]: WARN: No STONITH device configured.
Jan 25 08:39:20 drbd-test-2 heartbeat[2468]: WARN: Shared disks are not 
protected.
Jan 25 08:39:20 drbd-test-2 heartbeat[2468]: info: Resources being acquired 
from drbd-test-1.
Jan 25 08:39:20 drbd-test-2 ipfail[2478]: info: Status update: Node drbd-test-1 
now has status dead
Jan 25 08:39:20 drbd-test-2 ipfail[2478]: info: NS: We are dead. :<
Jan 25 08:39:20 drbd-test-2 heartbeat: info: Running /etc/ha.d/rc.d/status 
status
Jan 25 08:39:20 drbd-test-2 heartbeat[6426]: info: No local resources 
[/usr/lib/heartbeat/ResourceManager listkeys drbd-test-2] to acquire.
Jan 25 08:39:20 drbd-test-2 heartbeat: info: Taking over resource group 
drbddisk::drbdtest
Jan 25 08:39:20 drbd-test-2 heartbeat: info: Acquiring resource group: 
drbd-test-1 drbddisk::drbdtest Filesystem::/dev/drbd0::/data::ext3 killnfsd 
nfs-c\
ommon nfs-kernel-server Delay::20::0 IPaddr::10.10.2.28/16/eth0
Jan 25 08:39:21 drbd-test-2 heartbeat: info: Running 
/etc/ha.d/resource.d/drbddisk drbdtest start
Jan 25 08:39:21 drbd-test-2 kernel: drbd0: Secondary/Unknown --> Primary/Unknown
Jan 25 08:39:21 drbd-test-2 heartbeat: info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start
Jan 25 08:39:21 drbd-test-2 kernel: kjournald starting.  Commit interval 5 
seconds
Jan 25 08:39:21 drbd-test-2 kernel: EXT3-fs warning: maximal mount count 
reached, running e2fsck is recommended
Jan 25 08:39:21 drbd-test-2 kernel: EXT3 FS on drbd0, internal journal
Jan 25 08:39:21 drbd-test-2 kernel: EXT3-fs: recovery complete.
Jan 25 08:39:21 drbd-test-2 kernel: EXT3-fs: mounted filesystem with ordered 
data mode.
Jan 25 08:39:21 drbd-test-2 heartbeat: info: Running 
/etc/ha.d/resource.d/killnfsd  start
Jan 25 08:39:22 drbd-test-2 heartbeat: info: Running /etc/init.d/nfs-common  
start
Jan 25 08:39:22 drbd-test-2 heartbeat: info: Running 
/etc/init.d/nfs-kernel-server  start
Jan 25 08:39:22 drbd-test-2 kernel: NFSD: Using /var/lib/nfs/v4recovery as the 
NFSv4 state recovery directory
Jan 25 08:39:22 drbd-test-2 kernel: NFSD: starting 90-second grace period
Jan 25 08:39:22 drbd-test-2 heartbeat: info: Running /etc/ha.d/resource.d/Delay 
20 0 start
Jan 25 08:39:42 drbd-test-2 heartbeat: info: Running 
/etc/ha.d/resource.d/IPaddr 10.10.2.28/16/eth0 start
Jan 25 08:39:42 drbd-test-2 heartbeat: info: /sbin/ifconfig eth0:0 10.10.2.28 
netmask 255.255.0.0^Ibroadcast 10.10.255.255
Jan 25 08:39:42 drbd-test-2 heartbeat: info: Sending Gratuitous Arp for 
10.10.2.28 on eth0:0 [eth0]
Jan 25 08:39:42 drbd-test-2 heartbeat: /usr/lib/heartbeat/send_arp -i 1010 -r 5 
-p /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.10.2.28 eth0 10.10.2.28 auto 
10.10.2.28 ffffffffffff
Jan 25 08:39:42 drbd-test-2 heartbeat: info: /usr/lib/heartbeat/mach_down: 
nice_failback: foreign resources acquired
Jan 25 08:39:42 drbd-test-2 heartbeat[2468]: info: mach_down takeover complete.
Jan 25 08:39:42 drbd-test-2 heartbeat: info: mach_down takeover complete for 
node drbd-test-1.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to