On Feb 18, 2008, at 4:39 AM, Font Bella wrote:
I tried TCP and async options, but I get poor performance in my
benchmarks (a dbench run with 10 clients). Below I tabulated the
outcome of my tests, which show that in my setting there is a huge
difference between sync and async, and udp/tcp. Any
comments/suggestions are warmly welcome.
I also tried setting 128 server threads as Chuck suggested, but this
doesn't seem to affect performance. This makes sense, since we only
have a dozen of clients.
Each Linux client mount point can generate up to 16 server requests
by default. A dozen clients each with a single mount point can
generate 192 concurrent requests. So 128 server threads is not as
outlandish as you might think.
In this case, you are likely hitting some other bottleneck before the
clients can utilize all the server threads.
About sync/async, I am not very concerned about corrupt data if the
cluster goes down, we do mostly computing, no crucial database
transactions or anything like that. Our users wouldn't mind some
degree of data corruption in case of power failure, but speed is
crucial.
The data corruption is silent. If it weren't, you could simply
restore from a backup as soon as you recover from a server crash.
Silent corruption spreads into your backed up data, and starts
causing strange application errors, sometimes a long time after the
corruption first occurred.
Our network setting is just a dozen of servers connected to a switch.
Everything (adapters/cables/switch) is 1gigabit. We use ethernet
bonding to double networking speed.
Here are the test results. I didn't measure SYNC+UDP, since SYNC+TCP
already gives me very poor performance. Admittedly, my test is very
simple, and I should probably try something more complete, like
IOzone. But the dbench run seems to reproduce the bottleneck we've
been observing in our cluster.
I assume the dbench test is read and write only (little or no
metadata activity like file creation and deletion). How closely does
dbench reflect your production workload?
I see from your initial e-mail that your server file system is:
> SAS 10k disks.
>
> Filesystem: ext3 over LVM.
Have you tried testing over NFS with a file system that resides on a
single physical disk? If you have done a read-only test versus a
write-only test, how do the numbers compare? Have you tested a range
of write sizes, from small file writes v. writes to writing files
larger than the server's memory?
********************** ASYNC option in server
******************************
rsize,wsize TCP UDP
1024 24 MB/s 34 MB/s
2048 35 49
4096 37 75
8192 40.4 35
16386 40.2 19
As the size of the read and write requests increase, your UDP
throughput decreases markedly. This does indicate some packet loss,
so TCP is going to provide consistent performance and much lower risk
to data integrity as your network and client workloads increase.
You might try this test again and watch your clients' ethernet
bandwidth and RPC retransmit rate to see what I mean. At the 16386
setting, the UDP test may be pumping significantly more packets onto
the network, but is getting only about 20MB/s through. This will
certainly have some effect on other traffic on the network.
The first thing I check in these instances is that gigabit ethernet
flow control is enabled in both directions on all interfaces (both
host and switch).
In addition, using larger r/wsize settings on your clients means the
server can perform disk reads and writes more efficiently, which will
help your server scale with increasing client workloads.
By examining your current network carefully, you might be able to
boost the performance of NFS over both UDP and TCP. With bonded
gigabit, you should be able to push network throughput past 200 MB/s
using a test like iPerf which doesn't touch disks. Thus, at least
NFS reads from files already in the server's page cache ought to fly
in this configuration.
********************** SYNC option in server
******************************
rsize,wsize TCP UDP
1024 6 MB/s ?? MB/s
2048 7.44 ??
4096 7.33 ??
8192 7 ??
16386 7 ??
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html