erik.ableson wrote:
> OK - I'm at my wit's end here as I've looked everywhere to find some 
> means of tuning NFS performance with ESX into returning something 
> acceptable using osol 2008.11.  I've eliminated everything but the NFS 
> portion of the equation and am looking for some pointers in the right 
> direction.

Any time you have NFS, ZFS as the backing store, JBOD, and a performance
concern you need to look at the sync activity on the server.  This will 
often
be visible as ZIL activity, which you can see clearly with zilstat. 
http://www.richardelling.com/Home/scripts-and-programs-1/zilstat

The cure is not to disable the ZIL or break NFS.  The cure is lower latency
I/O for the ZIL.
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_NFS_Server_Performance
 -- richard
>
> Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a 
> zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla 
> install across the board, no additional software other than the 
> Adaptec StorMan to manage the disks.
>
> local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
> iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
> NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the 
> Service Console, transfer of a 8Gb file via the datastore browser)
>
> I just found the tool latencytop which points the finger at the ZIL 
> (tip of the hat to Lejun Zhu).  Ref: 
> <http://www.infrageeks.com/zfs/nfsd.png> & 
> <http://www.infrageeks.com/zfs/fsflush.png>.  Log file: 
> <http://www.infrageeks.com/zfs/latencytop.log>
>
> Now I can understand that there is a performance hit associated with 
> this feature of ZFS for ensuring data integrity, but this drastic a 
> difference makes no sense whatsoever. The pool is capable of handling 
> natively (at worst) 120*7 IOPS and I'm not even seeing enough to 
> saturate a USB thumb drive. This still doesn't answer why the read 
> performance is so bad either.  According to latencytop, the culprit 
> would be genunix`cv_timedwait_sig rpcmod`svc
>
> From my searching it appears that there's no async setting for the 
> osol nfsd, and ESX does not offer any mount controls to force an async 
> connection.  Other than putting in an SSD as a ZIL (which still 
> strikes me as overkill for basic NFS services) I'm looking for any 
> information that can bring me up to at least reasonable throughput.
>
> Would a dedicated 15K SAS drive help the situation by moving the ZIL 
> traffic off to a dedicated device? Significantly? This is the sort of 
> thing that I don't want to do without some reasonable assurance that 
> it will help since you can't remove a ZIL device from a pool at the 
> moment.
>
> Hints and tips appreciated,
>
> Erik
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to