[gentoo-server] Atrocious NFS performance

Matthias Bethke Wed, 19 Apr 2006 10:24:38 -0700

I just switched our old SuSE-based server to Gentoo (2.6.14-hardened-r7)
and am experiencing some problems, the most annoying of which is
abysmally bad NFS performance when the server is even moderately loaded:
| msbethke ~ $ time (touch x; rm x)
| 
| real    0m59.841s
| user    0m0.008s
| sys     0m0.036s
This is on a client, with the server unpacking a 6 GB gzip file.
The slowness is the same on SuSE and Gentoo based clients. The previous
installation handled the same thing without any problems, which I'd
certainly expect from a dual Xeon @3 GHz with $ GB RAM, a Compaq
SmartArray 642 U320 hostadpater and some 200 GB in a RAID5, connected
to the clients via GBit ethernet.


To test whether only open/create/remove operations are affected, I tried
dd:
| msbethke ~ $ time dd if=/dev/zero of=test bs=1M count=100
| dd: closing output file `test': Input/output error
| 
| real    0m50.500s
| user    0m0.012s
| sys     0m1.136s
| msbethke ~ $ ll test
| -rw------- 1 msbethke users 104857600 2006-04-19 18:17 test
Definitely not good for GBit, but not so bad either considering it will
have taken half a minute just to open that file. The file is complete
despite the I/O error but the error is definitely related to the server
load, it never happens normally (and I get 9-11s for the 100 MB).

I have 16 nfsd processes running but the problem is there even if only a
single client is active. nfsstat on the server shows a huge number of
read operations (I never used it before---is that too much for a server
that's been running under very moderate load from half a dozen clients
that were doing mostly word processing and programming?) but otherwise
it looks fine to me:
| # nfsstat -s
| Server rpc stats:
| calls      badcalls   badauth    badclnt    xdrcall
| 433459545   0          0          0          0
| [snip unused NFSv2]
| Server nfs v3:
| null       getattr    setattr    lookup     access     readlink
| 846     0% 6556617  1% 24798   0% 332257  0% 1258235  0% 855     0%
| read       write      create     mkdir      symlink    mknod
| 424885404 98% 148929  0% 11172   0% 30      0% 137     0% 10      0%
| remove     rmdir      rename     link       readdir    readdirplus
| 8882    0% 11      0% 6539    0% 2418    0% 889     0% 33264   0%
| fsstat     fsinfo     pathconf   commit
| 2919    0% 1209    0% 0       0% 19881   0%

On the client, however, I get some retransmissions and very strange
read/write values compared to the server's. I thought of 32-bit overflow
but the value is obviously longer, I can drive it beyond 2^32 on the
server. Here's the client:
| # nfsstat -c
| Client rpc stats:
| calls      retrans    authrefrsh
| 2691313    1493       0
| [snip unused NFSv2]
| Client nfs v3:
| null       getattr    setattr    lookup     access     readlink
| 0       0% 2292223 85% 848     0% 76917   2% 260072  9% 96      0%
| read       write      create     mkdir      symlink    mknod
| 3595    0% 48255   1% 930     0% 4       0% 5       0% 0       0%
| remove     rmdir      rename     link       readdir    readdirplus
| 1076    0% 0       0% 1265    0% 292     0% 0       0% 3055    0%
| fsstat     fsinfo     pathconf   commit
| 2067    0% 141     0% 0       0% 331     0%

I noticed a few things about the setup: the SA 642 adapter still has a
stoneage firmware, V1.30, but we never saw a need to upgrade as it
worked nicely with the kernel 2.4.21 cciss driver. Any know issues with
the 2.6 kernel with this one? I just flashed the latest version and will
try rebooting tonight.
Another one is that the kernel is still set to 250 Hz ticks, that was
fine on the P4/HT test system where I built it. Would this really suck
so badly on a real SMP machine? Anyway, 100 Hz should be fine for a
server.
And one parameter I haven't tried to tweak is the IO scheduler. I seem
to remember a recommendation to use noop for RAID5 as the cylinder
numbers are completely virtual anyway so the actual head scheduling
should be left to the controller. Any opinions on this?

cheers & TIA
        Matthias
-- 
I prefer encrypted and signed messages. KeyID: FAC37665
Fingerprint: 8C16 3F0A A6FC DF0D 19B0  8DEF 48D9 1700 FAC3 7665

pgp5h3pmg10BH.pgp
Description: PGP signature

[gentoo-server] Atrocious NFS performance

Reply via email to