Hi Linux HA users,
We have setup a two node DRBD cluster running DRBD / Heartbeat and NFS.
Every now and again the Linux ha will suddenly NFS crash and people will not
be able to read / write to the shared folders, when it goes into this mode,
running rpcinfo -p on the float IP shows NFS daemons not RPC accessible like
shown...
----------------WHEN EVERYTHING IS FINE---------------------------
[someServer]# rpcinfo -p X.X.X.X (Float IP)
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 744 status
100024 1 tcp 747 status
100011 1 udp 808 rquotad
100011 2 udp 808 rquotad
100011 1 tcp 811 rquotad
100011 2 tcp 811 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100021 1 udp 58872 nlockmgr
100021 3 udp 58872 nlockmgr
100021 4 udp 58872 nlockmgr
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100021 1 tcp 36993 nlockmgr
100021 3 tcp 36993 nlockmgr
100021 4 tcp 36993 nlockmgr
100005 1 udp 847 mountd
100005 1 tcp 850 mountd
100005 2 udp 847 mountd
100005 2 tcp 850 mountd
100005 3 udp 847 mountd
100005 3 tcp 850 mountd
------------------------------------------------------------------------------------------
Then when it fails:
------------------------------------------------------------------------------------------
[SomeServer] rcpinfo -p X.X.X.X (Float IP)
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 694 status
100024 1 tcp 697 status
------------------------------------------------------------------------------------------
I try restarting the daemons and all processes I can think of but unless I
reboot it doesn't come back up...
My system setup is as follows:
2x 64bit Centos5 bare bones standard install without Dialup Networking support.
DRBD83 with the kernel module
---------------------ha.cf------------------
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 5
initdead 20
bcast eth1
udpport 694
auto_failback on
node storage1.clusterfarm.net.au
node storage2.clusterfarm.net.au
---------------------------------------------
-------------HARESOURCES-----------------
storage1.clusterfarm.net.au IPaddr::[FLOATIP]/24/eth1 drbddisk::repdata
Filesystem::/dev/drbd0::/storage::ext3 portmap nfslock nfs rpcidmapd
-----------------------------------------------------
----------DRBD CONF---------------
global { usage-count yes; }
common { syncer { rate 100M; al-extents 257; } }
resource repdata {
protocol C;
handlers { pri-on-incon-degr "halt -f"; }
disk { on-io-error detach; }
startup { degr-wfc-timeout 60; wfc-timeout 60; }
on storage1.clusterfarm.net.au {
address X.X.X.X:7788;
device /dev/drbd0;
disk /dev/sda6;
meta-disk internal;
}
on storage2.clusterfarm.net.au {
address X.X.X.X:7788;
device /dev/drbd0;
disk /dev/sda6;
meta-disk internal;
}
}
------------------------------------------------------
If anyone can shed light on this issue I would be MOST appreciative!
Thank you in advance,
Karl Kloppenborg
Head of Development
Phone: 1300 884 839 (AU Only - Business Hours)
Website: AU http://www.crucial.com.au| US http://www.crucialp.com
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems