erik.ableson wrote: > OK - I'm at my wit's end here as I've looked everywhere to find some > means of tuning NFS performance with ESX into returning something > acceptable using osol 2008.11. I've eliminated everything but the NFS > portion of the equation and am looking for some pointers in the right > direction.
Any time you have NFS, ZFS as the backing store, JBOD, and a performance concern you need to look at the sync activity on the server. This will often be visible as ZIL activity, which you can see clearly with zilstat. http://www.richardelling.com/Home/scripts-and-programs-1/zilstat The cure is not to disable the ZIL or break NFS. The cure is lower latency I/O for the ZIL. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_NFS_Server_Performance -- richard > > Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a > zpool of 7 mirror vdevs. ESX 3.5 and 4.0. Pretty much a vanilla > install across the board, no additional software other than the > Adaptec StorMan to manage the disks. > > local performance via dd - 463MB/s write, 1GB/s read (8Gb file) > iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM) > NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the > Service Console, transfer of a 8Gb file via the datastore browser) > > I just found the tool latencytop which points the finger at the ZIL > (tip of the hat to Lejun Zhu). Ref: > <http://www.infrageeks.com/zfs/nfsd.png> & > <http://www.infrageeks.com/zfs/fsflush.png>. Log file: > <http://www.infrageeks.com/zfs/latencytop.log> > > Now I can understand that there is a performance hit associated with > this feature of ZFS for ensuring data integrity, but this drastic a > difference makes no sense whatsoever. The pool is capable of handling > natively (at worst) 120*7 IOPS and I'm not even seeing enough to > saturate a USB thumb drive. This still doesn't answer why the read > performance is so bad either. According to latencytop, the culprit > would be genunix`cv_timedwait_sig rpcmod`svc > > From my searching it appears that there's no async setting for the > osol nfsd, and ESX does not offer any mount controls to force an async > connection. Other than putting in an SSD as a ZIL (which still > strikes me as overkill for basic NFS services) I'm looking for any > information that can bring me up to at least reasonable throughput. > > Would a dedicated 15K SAS drive help the situation by moving the ZIL > traffic off to a dedicated device? Significantly? This is the sort of > thing that I don't want to do without some reasonable assurance that > it will help since you can't remove a ZIL device from a pool at the > moment. > > Hints and tips appreciated, > > Erik > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss