On 3/26/2010 at 02:47 AM, Ben Timby <[email protected]> wrote: > On Thu, Mar 25, 2010 at 10:54 AM, Tim Serong <[email protected]> wrote: > >> > Now for a little potential nastiness... I did some work in this area > >> > a year or two ago, and at the time, we ran into some curious edge cases. > >> > Hopefully things have moved on a little since then in NFS-land (I was > >> > using SLES 10 SP2, from memory), but for reference, have a look at: > >> > > >> > http://marc.info/?l=linux-nfs&m=123175640421702&w=2 > >> > > >> > This describes an edge case where (depending on what the clients are > >> > doing), it's possible that running "exportfs -i" to export one directory > >> > will result in an interruption of service to an unrelated exported > >> > directory on the same node. > >> > >> I think you are advocating additional testing, I address that below... > > > > Yes. But, I should probably explicitly state that the additional testing > > I'm advocating is focused on testing NFS in an HA environment, i.e. these > > issues (assuming they still exist) need to be resolved somewhere in the > > NFS server, and are not specific to your RA. It's just that you don't hit > > them until you try to do active/active, rather than active/passive (i.e. > > start/stop entire NFS server). > > Actually, reading through that post, the testing I suggested is close, > but not quite. The problems was explictly caused by write buffers from > the client in the 32K range, as these were small enough to send a lot > of them in a short amount of time, but large enough to be dropped by > the NFS server rather than deferred. This was the crux of the problem. > I am not sure how to get 32K writes, besides... > > dd if=/dev/zero of=/path/to/fs0/smallfile bs=32K count=1024
# mount -o rsize=32768,wsize=32768 server:/dir /localdir That specifies the maximum number of bytes for each read and write request over the wire. > [...] > > > > Yep, that's the sort of test. I'll see if I can find out anything else > useful > > about the tools we were using at the time (not sure if they ever got > publicly > > released, unfortunately :-/) Actually, they were, but on an "as-is" basis as tarballs, which may or may not require some effort to get running. See: http://lwn.net/Articles/326926/ http://oss.sgi.com/projects/nfs/testtools/ We would have been using genstream/checkstream from a single client for those R/W tests. This has the advantage over dd that you can use it to check for data corruption. You may also be interested in Weber, which simulates multiple NFS clients. > Any more info you can provide will be helpful. I need to get my > testing done soon, as these boxes are going into production this > weekend. I am way behind schedule, you would not believe how long it > took me to build a 30TB array and then sync it via DRBD (4 days for > Linux RAID, 10 days for DRBD). Actually, I had to split it into two > volumes as the DRBD volume limit is a scant 18TB :-). *ouch* Good luck, Tim -- Tim Serong <[email protected]> Senior Clustering Engineer, OPS Engineering, Novell Inc. _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
