On Thu, Mar 25, 2010 at 07:30:35PM -0600, Tim Serong wrote: > On 3/26/2010 at 02:47 AM, Ben Timby <[email protected]> wrote: > > On Thu, Mar 25, 2010 at 10:54 AM, Tim Serong <[email protected]> wrote: > > >> > Now for a little potential nastiness... I did some work in this area > > >> > a year or two ago, and at the time, we ran into some curious edge > > >> > cases. > > >> > Hopefully things have moved on a little since then in NFS-land (I was > > >> > using SLES 10 SP2, from memory), but for reference, have a look at: > > >> > > > >> > http://marc.info/?l=linux-nfs&m=123175640421702&w=2 > > >> > > > >> > This describes an edge case where (depending on what the clients are > > >> > doing), it's possible that running "exportfs -i" to export one > > >> > directory > > >> > will result in an interruption of service to an unrelated exported > > >> > directory on the same node. > > >> > > >> I think you are advocating additional testing, I address that below... > > > > > > Yes. But, I should probably explicitly state that the additional testing > > > I'm advocating is focused on testing NFS in an HA environment, i.e. these > > > issues (assuming they still exist) need to be resolved somewhere in the > > > NFS server, and are not specific to your RA. It's just that you don't > > > hit > > > them until you try to do active/active, rather than active/passive (i.e. > > > start/stop entire NFS server). > > > > Actually, reading through that post, the testing I suggested is close, > > but not quite. The problems was explictly caused by write buffers from > > the client in the 32K range, as these were small enough to send a lot > > of them in a short amount of time, but large enough to be dropped by > > the NFS server rather than deferred. This was the crux of the problem. > > I am not sure how to get 32K writes, besides... > > > > dd if=/dev/zero of=/path/to/fs0/smallfile bs=32K count=1024 > > # mount -o rsize=32768,wsize=32768 server:/dir /localdir > > That specifies the maximum number of bytes for each read and write request > over the wire. > > > [...] > > > > > > Yep, that's the sort of test. I'll see if I can find out anything else > > useful > > > about the tools we were using at the time (not sure if they ever got > > publicly > > > released, unfortunately :-/) > > Actually, they were, but on an "as-is" basis as tarballs, which may or > may not require some effort to get running. See: > > http://lwn.net/Articles/326926/ > http://oss.sgi.com/projects/nfs/testtools/ > > We would have been using genstream/checkstream from a single client for > those R/W tests. This has the advantage over dd that you can use it > to check for data corruption. You may also be interested in Weber, which > simulates multiple NFS clients. > > > Any more info you can provide will be helpful. I need to get my > > testing done soon, as these boxes are going into production this > > weekend. I am way behind schedule, you would not believe how long it > > took me to build a 30TB array and then sync it via DRBD (4 days for > > Linux RAID, 10 days for DRBD). Actually, I had to split it into two > > volumes as the DRBD volume limit is a scant 18TB :-). > > *ouch*
currently 16 TiB per drbd minor, to be exact. if this was a new array, and you intended to mkfs /dev/drbdX anyways, you could have skipped the initial DRBD sync, btw... -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
