Re: [Linux-ha-dev] NFS resource agent for active-active clusters.

Tim Serong Thu, 25 Mar 2010 07:54:17 -0700

On 3/26/2010 at 01:19 AM, Ben Timby <[email protected]> wrote: 
> Tim, I replied in line below so I could address each of your comments.


Likewise...

> On Wed, Mar 24, 2010 at 11:55 PM, Tim Serong <[email protected]> wrote: 
> > On 3/25/2010 at 05:59 AM, Ben Timby <[email protected]> wrote: 
> >> Attached is a resource agent that I call exportfs. 
> >> 
> >> Rather than starting/stopping NFS, it uses exportfs to add/remove 
> >> individual exports. 
> > 
> > Awesome, as Florian said :) 
> > 
> > A couple of random comments... 
> > 
> >> It also takes care to use cluster-wide unique fsid parameters for each 
> >> export. It ensures that this fsid is migrated with the resource. 
> > 
> > An alternative to automating this would be to just push the burden of 
> > fsid assignment to the sysadmin (have the RA return $OCF_ERR_CONFIGURED 
> > if no fsid was explicitly specified).  Makes the code slightly simpler 
> > at the expense of some small administrative effort :) 
>  
> I just provided a patch for this change. I was on the fence of whether 
> to do it this way or not, but ultimately decided to make the script as 
> turnkey as possible. I am happy either way :-). 

Cool :)

> > Now for a little potential nastiness...  I did some work in this area 
> > a year or two ago, and at the time, we ran into some curious edge cases. 
> > Hopefully things have moved on a little since then in NFS-land (I was 
> > using SLES 10 SP2, from memory), but for reference, have a look at: 
> > 
> >  http://marc.info/?l=linux-nfs&m=123175640421702&w=2 
> > 
> > This describes an edge case where (depending on what the clients are 
> > doing), it's possible that running "exportfs -i" to export one directory 
> > will result in an interruption of service to an unrelated exported 
> > directory on the same node. 
>  
> I think you are advocating additional testing, I address that below... 

Yes.  But, I should probably explicitly state that the additional testing
I'm advocating is focused on testing NFS in an HA environment, i.e. these
issues (assuming they still exist) need to be resolved somewhere in the
NFS server, and are not specific to your RA.  It's just that you don't hit
them until you try to do active/active, rather than active/passive (i.e.
start/stop entire NFS server).

> > There's also a problem whereby you almost certainly can't rely on the 
> > return code from exportfs actually telling you the directory was exported 
> > successfully.  The only reason exportfs will fail is if you pass invalid 
> > options, and it's possible that exportfs will return before the export 
> > has actually appeared in /var/lib/nfs/etab (exportfs says "please kernel, 
> > export this when you get a chance, kthxbye"). 
>  
> Are you suggesting that we poll the etab file to make sure our export 
> appears before calling the operation a success? 

Something like that :)  I actually don't know what the best way to do this is.
Looping a few times, polling the file then sleeping for one second on each
iteration is probably reasonable.  In the normal case it'll succeed instantly
anyway, with almost zero impact on start time, but it'll catch the (probably
freakishly unlikely) case where the filesystem can't be exported for some
reason.

You could also parse the output of "showmount -e", which should tell you
what the server thinks it's exporting.  Which reminds me...  There was
another issue at one point with the NFS server checking the mtime of
/var/lib/nfs/etab to determine what to export.  Thus if "exportfs -i" was
run more than once per second, the file could be out of sync with what the
server thought it was exporting.  Again, this is something that needs to
be bashed on a bit, but it's really a Linux kernel NFS server thing.

> > We ran into these issues because we were doing failover testing while 
> > the system was under heavy load (continuous write of several GB, followed 
> > by reading the same data back for verification), while failing over 
> > multiple NFS exports from node to node.  You probably won't ever hit them 
> > unless the system is being severely hammered...  But I'd still recommend 
> > further testing along these lines, out of sheer paranoia. 
>  
> I will definitely do some testing. If I understand your statement, 
> then the following scenario will help to determine if this is a 
> problem for me or not. 
>  
> 1. Bring up cluster in active-active mode, both nodes online. 
> 2. Start $ dd if=/dev/zero of=/path/to/fs0/bigfile bs=1GB count=10 on client 
> 3. Fail over resource fs1. 
> 4. Make sure the addition of fs1 to the node handling fs0 does not 
> cause disruption. 
>  
> ... and then ... 
>  
> 1. Bring up cluster in active-active mode, both nodes online. 
> 2. Start $ dd if=/path/to/fs0/bigfile of=/dev/null bs=1GB count=10 on client 
> 3. Fail over resource fs1. 
> 4. Make sure the addition of fs1 to the node handling fs0 does not 
> cause disruption.

Yep, that's the sort of test.  I'll see if I can find out anything else useful
about the tools we were using at the time (not sure if they ever got publicly
released, unfortunately :-/)

Regards,

Tim


-- 
Tim Serong <[email protected]>
Senior Clustering Engineer, OPS Engineering, Novell Inc.


_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] NFS resource agent for active-active clusters.

Reply via email to