Richard "Doc" Kinne wrote:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.19    0.00    1.68   23.85    0.00   73.28

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sda1              0.00         0.00         0.00          0          0
sda2              0.00         0.00         0.00          0          0
sda3              0.00         0.00         0.00          0          0
sda4              0.00         0.00         0.00          0          0
sda5              0.00         0.00         0.00          0          0
sdb              56.00         0.00      6921.60          0      34608
sdb1             56.00         0.00      6921.60          0      34608

Obviously the sdb device is what I was writing to. The Blk_wrtn/s comes out to ~3.2MB/s.  I'm not sure how to interpret the %iowait statistic.
%iowait is the amount of time the CPU spends with an outstanding I/O request.  IIRC "top", "uptime", etc (i.e. all the things you use to 'measure' system load) reports threads in iowait state as active.

Throughput and TPS are two *different* measures of performance, and they're maximized by fairly opposite operations.

The TPS I see from your sdb is on the order magnitude I'd expect from a single, older generation 7200 RPM SATA drive.

See http://storageadvisors.adaptec.com/2007/04/17/yet-another-raid-10-vs-raid-5-question/ for some discussion of RAID5 IOps performance.  The short answer is:  I'd expect an 8 disk RAID5 write performance to be about 2x as fast as a single disk.  (Ugly, eh?).  So, you're still a little short.

Now, RAID5 behaves radically different in sequential write (or large block write), and NFS is fairly small.  If you're doing file copies, upping your NFS write size or read size might help you out (which effectively requires going to Jumbo frames on your ethernet).  This didn't do anything for me as my NFS was the base of a VMWare system and VMWare appeared to read/write in storage blocks anyway.
I'm a small place so I don't have any managed switches. They're auto sensing. The ethernet ports on the computer in question are integral to the motherboard. Since, during straight copies the transfer rate is the same (both transmit and receive seem to give me about 8MB/s over a 100MB line I would tend to think the I don't have a duplex mismatch.
(Aside:  if you want a small managed switch, you might try the Netgear GS108T.  I'm running that at home.  I think there's a 16 port cousin.)
The machine uses a hardware RAID controller.
Battery backed write cache should help a great deal.  I know Adaptec makes an 8xSATA controller for <$500.  I've no experience with the controller directly.  At work I use HP and at home I'm just going with Linux software RAID.
I increased the NFS threads from the default of 8 to 32 and restarted NFS. However, based on nfsstat I don't think even needed to do that. It shows there has been 0 retrans or authrefreshes.
Were you looking at retrans on the client?  That's the important one.  Retrans on the server will likely always be zero.

I mislead you somewhat in my last email (sorry about that -- was going from memory and didn't check).  The critical part is the "th" line in /proc/net/rpc/nfsd.  This example is taken from my home server:
r...@nottingham:~# grep th /proc/net/rpc/nfsd
th 8 11804 44107.272 404.636 97.604 0.000 103.032 196.976 40.616 98.468 0.000 41.980
See the article http://kamilkisiel.blogspot.com/2007/11/understanding-linux-nfsd-statistics.html, but in short:  the 2nd number there (the 11815) is the number of times since boot where all NFS threads were in use.  This example shows I need to tune my home server, which explains why my mail server performance sucks right now :-)
The problem at this point seems to be the NFS write. I've googled for this and I come up with problems that have a regular read speed, but a write spee measured in KB/s, which is MUCH slower than I have (perhaps I should be thankful!  :-)  ). Most of what I have seen also tends to be for kernel < 2.6.12. I'm at 2.6.27.37.
This feels like the same situation I had at work on a RedHat Enterprise 5.3 server serving VMWare storage to 6 pretty beefy hosts.  The issue was that when the RAID5 performance went to pieces it clogged up the NFS threads and the clients started experiencing timeouts.

I'm out of time on this topic, but I just had an additional thought:  what is the write performance of a local copy on that machine?  How does it compare?
--
Dewey
_______________________________________________
bblisa mailing list
[email protected]
http://www.bblisa.org/mailman/listinfo/bblisa

Reply via email to