Hi,

On Mon, Apr 12, 2010 at 05:03:41PM +0300, luben karavelov wrote:
> I have tested the exportfs RA with NFS4 server (the version from the
> mercurial repo).
> 
> I have some considerations though.
> 
> 1. It uses grep -P (perl regexes). This feature of grep is marked as
> experimental and
> not compiled in on some linux distributions. My proposition is to replace
> the only
> occurrence of "grep -P ..." with equivalent "grep -E ...":
> 
> showmount -e | grep -E
> ^${OCF_RESKEY_directory}[[:space:]]+${OCF_RESKEY_clientspec}$

Yes, that should be changed.

> 2. It seems that /var/lib/nfs/rmtab is not used in NFSv4 so I just deleted
> the backup/restore
> procedures. May be it could be a configuration option or runtime check for
> version 4 that 
> disables these procedures.

Are you sure it's not used? Isn't that implementation dependent?

> On itself exportfs RA worked as expected, though there was some unexpected
> interaction between 
> NFSv4 server and the underlying FS. If some NFSv4 clients keep an open
> file of an exported
> directory,  even if I completely stop the nfsd, there is some timeout
> before I could umount the 
> underlying FS on the server (XFS here). Meantime I keep getting "device
> busy error".

Open files are likely to happen, perhaps there is some nfs
parameter to reduce the timeout.

> The interaction of this mis-feature of NFS4 server with pacemaker is
> pretty nasty: if you migrate
> such a resource group on another node it could not properly stop on the
> current node. So the 
> cluster hangs without providing the service. If you configured preferred
> nodes for different 
> NFS exports (services) and node fencing (in order to avoid data
> corruptions) it gets a lot worse.
> 
> Example of some possible scenario. 
> Setup: Cluster of 2 nodes (node0, node1). You get 2 devices replicated by
> DRBD on nodes (drbd0,
> drbd1) configured as master/slave ms resource . Over this devices you
> collocate a resource groups
> of Filesystem, exportfs and IPaddr. You set a preferred location for drbd0
> to run on node0 and 
> drbd1 to run on node1. You setup stonith device in order to shutdown
> appropriately mis-behaving
> node. 
> 
> On this setup you try to migrate drbd0 from node0 to node1 (crm node
> standby). The resource group 
> fails to stop because the Filesystem RA fails to umount the busy
> filesystem. So the the node1 
> shoots down node0 in order to bring back the service. Now the 2 volumes
> and associated RGs run on
> node1. When node0 comes back online the the drbd0 is scheduled for
> migration on it (location pref).
> It again fails to stop properly the RG on node1 so it is shooted down by
> node0. Now all volumes and 
> RGs run on node0. When node1 restarts the cluster manager tries to migrate
> drbd1 to node1 (location
> pref). It fails and so on ... the cluster keeps automatically shooting
> itself.

Probably not desirable but expected.

> What I have done: In the Filesystem RA I have increased the sleep interval
> in the "stop" op. Also I 
> have configured 4 minutes timeout for the stop op. May be this sleep value
> could be configurable:
> I could imagine other scenarios where tweaking it could be useful.

Four minutes is quite long. Why would you want to increase the
sleep interval? To reduce the number of logged messages?

Thanks,

Dejan

> With this tweaks I get NFSv4 client failover and active/active nfsd setup.
> 
> 
> Thanks for the great work
> 
> Luben
> 
> 
> 
> 
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to