RE: MD1220 + H800 nfs performance

John LLOYD Wed, 22 Sep 2010 10:49:45 -0700

> Date: Wed, 22 Sep 2010 14:45:16 +0100
> From: Robert Horton <[email protected]>
> Subject: MD1220 + H800 nfs performance
> To: Dell poweredge Mailling-liste <[email protected]>
> Message-ID: <1285163116.1780.80.ca...@moelwyn>
> Content-Type: text/plain; charset="UTF-8"
> 
> Hi,
> 
> I'm having some problems getting decent nfs performance from some
> MD1220s connected to an R710. Here's a summary of the setup:
> 
> 3 x MD1220 each with 24 x 500GB 7.2k SAS disk
> All connected to a Perc H800 in an R710.
> 
> At present I have a single RAID60 volume with three spans of 23 disks,
> so each array holds one span plus a hot standby. Stripe element size is
> 64k.
> 
> I'm testing the performance with:
> 
> iozone -l 1 -u 1 -r 4k -s 10g -e
> 
> and getting write performance of:
> 
> Direct to filesystem:       1076 MB/s
> nfs via loopback interface: 217 MB/s
> nfs via IPoIB:              38 MB/s
> nfs via Ethernet:           24 MB/s


Which version of NFS?

NFS is synchronous normally. Every operation gets acknowledged before the next 
operation occurs.  This synchrony costs time: your loopback measurement already 
tells you to expect "1/5 of real" given a network with zero latency and 
memory-speed bandwidth.   Slowing down the network slightly (IB) or adding 
latency (microseconds with either) and your throughput takes another big hit.

To get fast write speed don't use NFS (!).  Switch to FTP.  If you must use 
NFS, turn on some writeback caching (async on /etc/exports).  Ensure NFS uses 
TCP. Verify your network connections are direct, full-duplex and zero 
error-rate, and not something odd.  


> 
> Based on testing other systems I would expect the nfs over Ethernet to
> be around 100MB/s (ie saturating the GigE link) and the nfs over IPoIB
> to be higher than that. I've tested the network links with nttcp and
> there don't appear to be any problems.
> 
> I've tried various filesystems (ext3, ext4, xfs) but this didn't have a
> significant effect.
> 
> I'm wondering:
> 
> 1) Should the stripe size be smaller? Given that the nfs max block size
> is 32KB each write is going to be less than one stripe..?

Stripe makes a big difference if your filesystem is not aligned to the stripe.  
Allocate partitions on stripe boundaries and use XFS and tell it what the 
stripe size and stripe width is.  Remember what RAID-5 has to do to write 64 
kbytes on a 21-disk stripe.  You want XFS to optimize this.

> 
> 2) Is there a better way of arranging the disks? Given that I want the
> dual parity I'm more or less stuck with some form of RAID 6, but I
> could
> have more spans or create separate volumes and stripe them with LVM.

You'll probably run into H800 bandwidth limits.  1GB/sec is pretty amazing 
throughput for a cheap^h^h^h^h^h inexpensive raid controller.  Your issue lies 
elsewhere since the combination of network and software is taking away 
performance by the bucketfull.

> 
> I'm happy to test different configurations but given the time needed to
> reinitialise the array it would be good to get some pointers first...
> 
> Any thoughts would be appreciated.

For that many disks, RAID-10 is indicated.  With 24 disks you'll get one or two 
failures per year per MD1220, and therefore will be running into the risk of 
double-failures killing your data.

Also, the filesystem is now huge and you are relying on: disks reading and 
writing perfectly, RAID controller perfection, and Linux and XFS perfection not 
to corrupt the filesystem data and/or the metadata.

When a disk breaks, do you know what to do to replace it?  You could practise 
replacing disks now, when you have no data at risk. 


--John

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

RE: MD1220 + H800 nfs performance

Reply via email to