On Feb 18, 2008, at 4:39 AM, Font Bella wrote:
I tried TCP and async options, but I get poor performance in my
benchmarks (a dbench run with 10 clients). Below I tabulated the
outcome of my tests, which show that in my setting there is a huge
difference between sync and async, and udp/tcp. Any
comments/suggestions are warmly welcome.

I also tried setting 128 server threads as Chuck suggested, but this
doesn't seem to affect performance. This makes sense, since we only
have a dozen of clients.

Each Linux client mount point can generate up to 16 server requests by default. A dozen clients each with a single mount point can generate 192 concurrent requests. So 128 server threads is not as outlandish as you might think.

In this case, you are likely hitting some other bottleneck before the clients can utilize all the server threads.

About sync/async, I am not very concerned about corrupt data if the
cluster goes down, we do mostly computing, no crucial database
transactions or anything like that. Our users wouldn't mind some
degree of data corruption in case of power failure, but speed is
crucial.

The data corruption is silent. If it weren't, you could simply restore from a backup as soon as you recover from a server crash. Silent corruption spreads into your backed up data, and starts causing strange application errors, sometimes a long time after the corruption first occurred.

Our network setting is just a dozen of servers connected to a switch.
Everything (adapters/cables/switch) is 1gigabit. We use ethernet
bonding to double networking speed.

Here are the test results. I didn't measure SYNC+UDP, since SYNC+TCP
already gives me very poor performance. Admittedly, my test is very
simple, and I should probably try something more complete, like
IOzone. But the dbench run seems to reproduce the bottleneck we've
been observing in our cluster.

I assume the dbench test is read and write only (little or no metadata activity like file creation and deletion). How closely does dbench reflect your production workload?

I see from your initial e-mail that your server file system is:

> SAS 10k disks.
>
> Filesystem: ext3 over LVM.

Have you tried testing over NFS with a file system that resides on a single physical disk? If you have done a read-only test versus a write-only test, how do the numbers compare? Have you tested a range of write sizes, from small file writes v. writes to writing files larger than the server's memory?

********************** ASYNC option in server ******************************

rsize,wsize          TCP                 UDP

1024                  24 MB/s            34 MB/s
2048                  35                 49
4096                  37                 75
8192                  40.4               35
16386                 40.2               19

As the size of the read and write requests increase, your UDP throughput decreases markedly. This does indicate some packet loss, so TCP is going to provide consistent performance and much lower risk to data integrity as your network and client workloads increase.

You might try this test again and watch your clients' ethernet bandwidth and RPC retransmit rate to see what I mean. At the 16386 setting, the UDP test may be pumping significantly more packets onto the network, but is getting only about 20MB/s through. This will certainly have some effect on other traffic on the network.

The first thing I check in these instances is that gigabit ethernet flow control is enabled in both directions on all interfaces (both host and switch).

In addition, using larger r/wsize settings on your clients means the server can perform disk reads and writes more efficiently, which will help your server scale with increasing client workloads.

By examining your current network carefully, you might be able to boost the performance of NFS over both UDP and TCP. With bonded gigabit, you should be able to push network throughput past 200 MB/s using a test like iPerf which doesn't touch disks. Thus, at least NFS reads from files already in the server's page cache ought to fly in this configuration.

********************** SYNC option in server ******************************

rsize,wsize          TCP                 UDP

1024                  6 MB/s             ?? MB/s
2048                  7.44               ??
4096                  7.33               ??
8192                  7                  ??
16386                 7                  ??

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to