Re: Performance question

Chuck Lever Fri, 15 Feb 2008 08:18:55 -0800

On Feb 15, 2008, at 10:37 AM, Font Bella wrote:

Dear all,
I finally got it to work, after much pain/testing. Here are my config
notes (just for the record).
Thanks Marcelo and Chuck!

NFS setup
=========

Documentation
-------------

* http://billharlan.com/pub/papers/NFS_for_clusters.html
* http://nfs.sourceforge.net/nfs-howto/ar01s05.html#nfsd_daemon_instances
Setting
-------
We use package nfs-kernel-server, i.e. we use the kernel-space nfsserver,
which is faster than nfs-user-server.

We use NFS version 3.

Configuration
-------------
Make sure we are using nfs version 3. This seems to be the defaultwith
package nfs-kernel-server. Check from client side with::

        cat /proc/mounts

Use UDP for packet transmission, i.e. use option 'proto=udp' in your
/etc/fstab, /etc/auto.home (if using automounts), or in general, inany mount
command. Check from client side also with 'cat /proc/mounts'.
Make sure you have enough nfsd server threads. See if your serveris receiving
too many overlapping requests with

  $ grep th /proc/net/rpc/nfsd

Ours isn't, so we increase the number of threads used by the server to
32 by changing
RPCNFSDCOUNT=32 in /etc/default/nfs-kernel-server (Debianconfiguration filefor startup scripts). Remember to restart nfs-kernel-server forchanges to
take effect.
In the server side, use 'async' option in /etc/exports. This was acrucial
step to get good performance.

Finally, try different values of rsize and wsize in your
/etc/fstab, /etc/auto.home (if using automounts), or in general, inany mount
command. Check from client side also with 'cat /proc/mounts'.
Test your favourite benchmark with different rsize,wsize and lookfor an
optimal value.

ALL the steps above were necessary for me to get good performance, but
the last step was
crucial, since I got very different performances depending on the
value of rsize/wsize.

I'm glad you were able to make progress. 32 server threads isactually fairly conservative; you might consider 128 or more if youhave more than a few clients.

I want to make sure you understand the limitations and risks of usingUDP and the "async" export option, however.

1. "async" is no longer the default because it introduces a silentdata corruption risk. With NFSv3, data write operations are alreadyasynchronous, with a subsequent COMMIT, so that they are safe. Theclient now knows when data has hit stable storage and can thus deleteits cached copy safely.

I urge you to read the NFS FAQ discussion on the "async" exportoption and reconsider its use in production.

2. UDP is no longer the default because it also introduces a silentdata corruption risk, since the IP ID field (which UDP depends on forreassembling datagrams larger than a single link-layer frame) is only16 bits wide. If this field should wrap, datagram reassembly iscompromised. The UDP datagram checksum is weak enough that thereceiving end probably won't detect the reassembly errors.

In addition, UDP will likely perform poorly in situations involvingmore than a few clients. It's congestion control algorithm is unableto handle large amounts of concurrent network traffic since itdoesn't have a packet ACK mechanism like TCP does. The fact thatyour performance was best at such a small r/wsize (you mentioned 2048in your earlier e-mail) suggests you have a network environment thatwould benefit enormously from using TCP.

So, our recommendation these days is to use the default "sync" exportsetting, and use NFSv3 over TCP if at all possible. (The HOWTO maybe out of date in this regard). If you are not able to achieve goodperformance results with these settings, you can e-mail the listagain and we can do further analysis.

On Thu, Feb 14, 2008 at 5:56 PM, Chuck Lever<[EMAIL PROTECTED]> wrote:

On Feb 14, 2008, at 11:27 AM, Marcelo Leal wrote:

 Hello all,
There is a great diff between access the raw discs and through LVM,
with some kind of RAID, and etc. I think you should use NFS v3, and
it's hard to think that without you explicitally configure it to use
v2, it using...
A great diff between v2 and v3 is that v2 is always "async", whatis aperformance burst. Are you sure that in the new environment isnot v3?In the new stable version (nfs-utils), debian is sync by default.I'm
used to "8192" transfer sizes, and was the best perfomance in my
tests.


 As Marcelo suggested, this could be nothing more than the change in

default export options (see exports(8) -- the description of thesync/

 async option) between sarge and etch.  This was a change in the nfs-
 utils package done a while back to improve data integrity guarantees
 during server instability.

 You can test this easily by explicitly specifying sync or async in
 your /etc/exports and trying your test.

 It especially effects NFSv2, as all NFSv2 writes are FILE_SYNC (ie
 they must be committed to permanent storage before the server
 replies) -- the async export option breaks that guarantee to improve
 performance.  There is some further description in the NFS FAQ at
 http://nfs.sourceforge.net/ .

 The preferred way to get "async" write performance is to use NFSv3.

 Would be nice if you could test another network service writing in
that server.. like ftp, or iscsi.
 Another question, the discs are "local" or SAN? There is no
concurrency?

ps.: v2 has a 2GB file size limit AFAIK.

 Leal.

2008/2/14, Font Bella <[EMAIL PROTECTED]>:

Hi,

 some of our apps are experiencing slow nfs performance in our new
cluster, in
 comparison with the old one. The nfs setups for both clusters are
very
 similar, and we are wondering what's going on. The details of
both setups are
 given below for reference.

 The problem seems to occur with apps that do heavy i/o, creating,
writing,
 reading, and deleting many files. However, writing or reading a
large file
 (as measure with `time dd if=/dev/zero of=2gbfile bs=1024
count=2000`) is not
 slow.

 We have performed some tests with the disk benchmark 'dbench',
which reports
 i/o performance of 60 Mb/sec in the old cluster down to about 6Mb/
sec in the
 new one.

 After noticing this problem, we tried the user-mode nfs server
instead of the
 kernel-mode server, and just installing the user-mode server
helped improving
 throughput up to 12 Mb/sec, but still far away from the good old
60 Mb/sec.

After going through the "Optimizing NFS performance" section ofthe

 NFS-Howto and tweaking the rsize,wsize parameters (the optimal
seems to be
 2048, which seems kind of weird to me, specially compared to the
8192 used in
 the old cluster), throughput increased to 21 Mb/sec, but is still
too far
 from the old 60Mb/sec.

 We are stuck at this point. Any help/comment/suggestion will be
greatly
 appreciated.
 /P

 **************************** OLD CLUSTER
*****************************

 SATA disks.

 Filesystem: ext3.

 * the version of nfs-utils you are using: I don't know. It's the
most
  recent version in debian sarge (oldstable).

 user-mode nfs server.

 nfs version 2, as reported with rpcinfo.

 * the version of the kernel and any non-stock applied kernels:
2.6.12
 * the distribution of linux you are using: Debian sarge x386 on
Intel Xeon
  processors.
 * the version(s) of other operating systems involved: no other OS.

 It is also useful to know the networking configuration connecting
the hosts:
 Typical beowulf setup, with all servers connected to a switch,
1Gb network.

 /etc/exports:

 /srv/homes      192.168.1.0/255.255.255.0 (rw,no_root_squash)

 /etc/fstab:

 server:/srv/homes/user /mnt/user nfs
rw,hard,intr,rsize=8192,wsize=8192 0 0

 **************************** NEW CLUSTER
*****************************

 SAS 10k disks.

 Filesystem: ext3 over LVM.

 * the version of nfs-utils you are using: I don't know. It's the
most
  recent version in debian etch (stable).

 kernel-mode nfs server.

 nfs version 2, as reported with rpcinfo.

 * the version of the kernel and any non-stock applied kernels:
2.6.18-5-amd64
 * the distribution of linux you are using: Debian etch AMD64 on
Intel Xeon
  processors.
 * the version(s) of other operating systems involved: no other OS.

 It is also useful to know the networking configuration connecting
the hosts:
 Typical beowulf setup, with all servers connected to a switch,
1Gb network.

 /etc/exports:

 /srv/homes      192.168.1.0/255.255.255.0 (no_root_squash)

 mount options:

 rsize=8192,wsize=8192
 -
 To unsubscribe from this list: send the line "unsubscribe linux-
nfs" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
pOSix rules
-
To unsubscribe from this list: send the line "unsubscribe linux-
nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


 --
 Chuck Lever
 chuck[dot]lever[at]oracle[dot]com

To unsubscribe from this list: send the line "unsubscribe linux-nfs" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



-
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance question

Reply via email to