Hi, could you show the configuration you use via crm configure show?
Christoph Am 15.04.2011 12:02, schrieb Caspar Smit: > Hi all, > > I'm too testing with High Available NFS over TCP, I'd like to share my > findings of a whole day of testing regarding NFS over TCP and some very > interesting conclusions! > > Note: I send these findings to Florian Haas of linbit (maintainer of the > exportfs RA) and he noted that the exportfs RA is meant to be used in active > active setups like Rasca's and not in active/passive setups (what I am > testing at the moment). > > First of all I started with a fresh install of every node and rebooted the > NFS client machine. > > While starting the first test I noticed the failover actually DID work. So I > started to investigate further. After a few more failovers I was stuck at > the situation where I have a stale mount on the client. > > These first test were all done using the migrate command. > > Rebooted everything again and started the second batch of tests (now with > the node standby command). I noticed that this way of failover could survive > way more failovers. Only when I started using the NFS mount during failover > by writing something to it I noticed that the time it took to survive the > failover increased consideratly. > > I digged deeper and started to monitor the nfs tcp connections using netstat > and wrote down the results: > > node1 = active node > node2 = passive node > > node1 netstat = > tcp 0 0 *:nfs *:* LISTEN > tcp 0 0 192.168.0.30:nfs 192.168.0.10:767 > ESTABLISHED > udp 0 0 *:nfs *:* > > node2 netstat = > tcp 0 0 *:nfs *:* LISTEN > udp 0 0 *:nfs *:* > > I did a failover (migrate resource) from node1 -> node2 > > node1 netstat = > tcp 0 0 *:nfs *:* LISTEN > tcp 0 0 192.168.0.30:nfs 192.168.0.10:767 > ESTABLISHED > udp 0 0 *:nfs *:* > > node2 netstat = > tcp 0 0 *:nfs *:* LISTEN > tcp 0 0 192.168.0.30:nfs 192.168.0.10:767 > ESTABLISHED > udp 0 0 *:nfs *:* > > Having the nfs-kernel-server LSB script run as a clone keeps tcp sessions > ESTABLISHED on the passive node after a failover for about 10 minutes. After > that the state changes to FIN_WAIT1 and lasts about another 4 minutes. > > During the time the session is ESTABLISHED and FIN_WAIT1 (about 14 minutes) > it is not possible to migrate the resource back as this results in a stale > mount, > > Then I started testing with node standby failovers and saw the following: > > node1 netstat = > tcp 0 0 *:nfs *:* LISTEN > tcp 0 0 192.168.0.30:nfs 192.168.0.10:767 > ESTABLISHED > udp 0 0 *:nfs *:* > > node2 netstat = > tcp 0 0 *:nfs *:* LISTEN > udp 0 0 *:nfs *:* > > I did a failover (node standby) from node1 -> node2 > > node1 netstat = > tcp 0 0 *:nfs *:* LISTEN > tcp 0 0 192.168.0.30:nfs 192.168.0.10:767 > TIME_WAIT > udp 0 0 *:nfs *:* > > node2 netstat = > tcp 0 0 *:nfs *:* LISTEN > tcp 0 0 192.168.0.30:nfs 192.168.0.10:767 > ESTABLISHED > udp 0 0 *:nfs *:* > > The session is immediatly changed to TIME_WAIT and lasts a bit shorter > (around 2 minutes) then FIN_WAIT1 using the migrate command. > > It is still not possible to do a failover back during the FIN_WAIT1 state > but after around 2 minutes the session is restored and doesn't become stale. > > > I concluded that the stopping and starting of nfs-kernel-server (which > happens only when doing node standby) is the main difference here. > > SO i started testing without having the nfs-kernel-server as a cloned > resource but as a normal resource (so it gets stopped/started during > failover) > > After a failover the tcp state of the passive node sometimes remained > ESTABLISHED and sometimes became TIME_WAIT. > > I noticed that as i didn't use the nfs mount during failover the state > became TIME_WAIT and if I did use the nfs mount it remained ESTABLISHED. > > So it had to do something with nfs-kernel-server not shutting down all > connections on a stop command. I checked the /etc/init.d/nfs-kernel-server > LSB script and saw > that the stop command was using a signal 2 to stop all nfsd instances. I > noticed when the session is active the nfsd instance is not stopped. So I > changed the signal into > 9 and then it killed all nfsd instances on a stop command. > > Conslusion: > > - *Using nfs-kernel-server as a cloned resource prevents quick failovers > (<15 minutes) if you use NFS over TCP*, using it as a normal resource stops > and starts the nfsd instances which keep the TCP connections. > - For this to work in active/passive mode the nfs-kernel-server init script > needs to be changed, the stop command must use signal 9 to kill all nfsd > instances instead of signal 2 > > Kind regards, > > Caspar Smit > > 2011/4/14 Alessandro Iurlano<[email protected]> > >> Thanks a lot, Rasca. >> Using your configuration I was able to setup the active active NFS server. >> I had to use the UDP protocol for NFS to work. With TCP, the NFS >> clients would occasionally hang. >> With UDP it seems to work well without any need of rmtab file >> replication/synchronization. >> >> Now I'm trying to go a little further by using OCFS2 cluster >> filesystem with a dual primary DRBD configuration. The goal is to be >> able to share the same directory from both nodes while still having >> the failover on a single node. >> With the actual configuration, the cluster comes up and every services >> is running as expected. >> But when I unplug the network cable of a node, on the remaining active >> node the exportfs processes hangs and I can't see why. >> Any suggestion? >> >> This is my current configuration: >> http://nopaste.voric.com/paste.php?f=sxub6z >> >> Thanks! >> Alessandro >> >> On Mon, Apr 4, 2011 at 11:23 AM, RaSca<[email protected]> wrote: >>> Il giorno Sab 02 Apr 2011 19:04:08 CET, Alessandro Iurlano ha scritto: >>>> >>>> On Fri, Apr 1, 2011 at 11:34 AM, RaSca<[email protected]> >> wrote: >>>>>> >>>>>> Then I tried to find a way to keep just the rmtab file synchronized on >>>>>> both nodes. I cannot find a way to have pacemaker do this for me. Is >>>>>> there one? >>>>> >>>>> As far as I know, all those operations are handled by the exportfs RA. >>>> >>>> I believe this was true till the backup part was removed. See the git >>>> commit below. >>> >>> So, for some reasons this is not needed anymore, but I don't think this >> may >>> create problems, surely the RA maintainer has done all the necessary >> tests. >>> >>>> I checked the boot order and indeed I was doing it the wrong way. >>>> After I fixed it, a couple of tests worked right away, while the >>>> client hanged again when I switched back the cluster to both nodes >>>> online. >>>> Could you post your working configuration? >>>> Thanks, >>>> Alessandro >>> >>> Here it is, note that I'm using DRBD instead of a shared storage >> (basically >>> each drbd is a stand alone export that can reside independently on a >> node): >>> >>> node ubuntu-nodo1 >>> node ubuntu-nodo2 >>> primitive drbd0 ocf:linbit:drbd \ >>> params drbd_resource="r0" \ >>> op monitor interval="20s" timeout="40s" >>> primitive drbd1 ocf:linbit:drbd \ >>> params drbd_resource="r1" \ >>> op monitor interval="20s" timeout="40s" >>> primitive nfs-kernel-server lsb:nfs-kernel-server \ >>> op monitor interval="10s" timeout="30s" >>> primitive ping ocf:pacemaker:ping \ >>> params host_list="172.16.0.1" multiplier="100" name="ping" \ >>> op monitor interval="20s" timeout="60s" \ >>> op start interval="0" timeout="60s" >>> primitive portmap lsb:portmap \ >>> op monitor interval="10s" timeout="30s" >>> primitive share-a-exportfs ocf:heartbeat:exportfs \ >>> params directory="/share-a" clientspec="172.16.0.0/24" >>> options="rw,async,no_subtree_check,no_root_squash" fsid="1" \ >>> op monitor interval="10s" timeout="30s" \ >>> op start interval="0" timeout="40s" \ >>> op stop interval="0" timeout="40s" >>> primitive share-a-fs ocf:heartbeat:Filesystem \ >>> params device="/dev/drbd0" directory="/share-a" fstype="ext3" >>> options="noatime" fast_stop="no" \ >>> op monitor interval="20s" timeout="40s" \ >>> op start interval="0" timeout="60s" \ >>> op stop interval="0" timeout="60s" >>> primitive share-a-ip ocf:heartbeat:IPaddr2 \ >>> params ip="172.16.0.63" nic="eth0" \ >>> op monitor interval="20s" timeout="40s" >>> primitive share-b-exportfs ocf:heartbeat:exportfs \ >>> params directory="/share-b" clientspec="172.16.0.0/24" >>> options="rw,no_root_squash" fsid="2" \ >>> op monitor interval="10s" timeout="30s" \ >>> op start interval="0" timeout="40s" \ >>> op stop interval="0" timeout="40s" >>> primitive share-b-fs ocf:heartbeat:Filesystem \ >>> params device="/dev/drbd1" directory="/share-b" fstype="ext3" >>> options="noatime" fast_stop="no" \ >>> op monitor interval="20s" timeout="40s" \ >>> op start interval="0" timeout="60s" \ >>> op stop interval="0" timeout="60s" >>> primitive share-b-ip ocf:heartbeat:IPaddr2 \ >>> params ip="172.16.0.64" nic="eth0" \ >>> op monitor interval="20s" timeout="40s" >>> primitive statd lsb:statd \ >>> op monitor interval="10s" timeout="30s" >>> group nfs portmap statd nfs-kernel-server >>> group share-a share-a-fs share-a-exportfs share-a-ip >>> group share-b share-b-fs share-b-exportfs share-b-ip >>> ms ms_drbd0 drbd0 \ >>> meta master-max="1" master-node-max="1" clone-max="2" >>> clone-node-max="1" notify="true" >>> ms ms_drbd1 drbd1 \ >>> meta master-max="1" master-node-max="1" clone-max="2" >>> clone-node-max="1" notify="true" target-role="Started" >>> clone nfs_clone nfs \ >>> meta globally-unique="false" >>> clone ping_clone ping \ >>> meta globally-unique="false" >>> location share-a_on_connected_node share-a \ >>> rule $id="share-a_on_connected_node-rule" -inf: not_defined ping >> or >>> ping lte 0 >>> location share-b_on_connected_node share-b \ >>> rule $id="share-b_on_connected_node-rule" -inf: not_defined ping >> or >>> ping lte 0 >>> colocation share-a_on_ms_drbd0 inf: share-a ms_drbd0:Master >>> colocation share-b_on_ms_drbd1 inf: share-b ms_drbd1:Master >>> order share-a_after_ms_drbd0 inf: ms_drbd0:promote share-a:start >>> order share-b_after_ms_drbd1 inf: ms_drbd1:promote share-b:start >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ >>> cluster-infrastructure="openais" \ >>> expected-quorum-votes="2" \ >>> no-quorum-policy="ignore" \ >>> stonith-enabled="false" \ >>> last-lrm-refresh="1301915944" >>> >>> Note that I've grouped all the nfs-server daemons (portmap, nfs-common >> and >>> nfs-kernel-server) in the cloned group nfs_clone. >>> >>> -- >>> RaSca >>> Mia Mamma Usa Linux: Niente รจ impossibile da capire, se lo spieghi bene! >>> [email protected] >>> http://www.miamammausalinux.org >>> >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
