Re: [zfs-discuss] Bugid 6535160

2008-01-04 Thread eric kustarz

 So either we're hitting a pretty serious zfs bug, or they're purposely
 holding back performance in Solaris 10 so that we all have a good  
 reason to
 upgrade to 11.  ;)

In general, for ZFS we try to push all changes from Nevada back to  
s10 updates.

In particular, 6535160 Lock contention on zl_lock from zil_commit  
is pegged for s10u6.  And i believe we're going for an early build of  
update 6, so point patches should hopefully be available even earlier.

Nice to see filebench validating our performance work,
eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bugid 6535160

2008-01-03 Thread Vincent Fox
We loaded Nevada_78 on a peer T2000 unit.  Imported the same ZFS pool.  I 
didn't even upgrade the pool since we wanted to be able to move it back to 
10u4.  Cut 'n paste of my colleague's email with the results:

Here's the latest Pepsi Challenge results.

Sol10u4 vs Nevada78. Same tuning options, same zpool, same storage, same SAN
switch - you get the idea. The only difference is the OS.

Sol10u4:
 4984: 82.878: Per-Operation Breakdown
closefile4404ops/s   0.0mb/s  0.0ms/op   19us/op-cpu
readfile4 404ops/s   6.3mb/s  0.1ms/op  109us/op-cpu
openfile4 404ops/s   0.0mb/s  0.1ms/op  112us/op-cpu
closefile3404ops/s   0.0mb/s  0.0ms/op   25us/op-cpu
fsyncfile3404ops/s   0.0mb/s 18.7ms/op 1168us/op-cpu
appendfilerand3   404ops/s   6.3mb/s  0.2ms/op  192us/op-cpu
readfile3 404ops/s   6.3mb/s  0.1ms/op  111us/op-cpu
openfile3 404ops/s   0.0mb/s  0.1ms/op  111us/op-cpu
closefile2404ops/s   0.0mb/s  0.0ms/op   24us/op-cpu
fsyncfile2404ops/s   0.0mb/s 19.0ms/op 1162us/op-cpu
appendfilerand2   404ops/s   6.3mb/s  0.2ms/op  173us/op-cpu
createfile2   404ops/s   0.0mb/s  0.3ms/op  334us/op-cpu
deletefile1   404ops/s   0.0mb/s  0.2ms/op  173us/op-cpu

 4984: 82.879: 
IO Summary:  318239 ops 5251.8 ops/s, (808/808 r/w)  25.2mb/s,   1228us
cpu/op,   9.7ms latency


Nevada78:
 1107: 82.554: Per-Operation Breakdown
closefile4   1223ops/s   0.0mb/s  0.0ms/op   22us/op-cpu
readfile41223ops/s  19.4mb/s  0.1ms/op  112us/op-cpu
openfile41223ops/s   0.0mb/s  0.1ms/op  128us/op-cpu
closefile3   1223ops/s   0.0mb/s  0.0ms/op   29us/op-cpu
fsyncfile3   1223ops/s   0.0mb/s  4.6ms/op  256us/op-cpu
appendfilerand3  1223ops/s  19.1mb/s  0.2ms/op  191us/op-cpu
readfile31223ops/s  19.9mb/s  0.1ms/op  116us/op-cpu
openfile31223ops/s   0.0mb/s  0.1ms/op  127us/op-cpu
closefile2   1223ops/s   0.0mb/s  0.0ms/op   28us/op-cpu
fsyncfile2   1223ops/s   0.0mb/s  4.4ms/op  239us/op-cpu
appendfilerand2  1223ops/s  19.1mb/s  0.1ms/op  159us/op-cpu
createfile2  1223ops/s   0.0mb/s  0.5ms/op  389us/op-cpu
deletefile1  1223ops/s   0.0mb/s  0.2ms/op  198us/op-cpu

 1107: 82.581: 
IO Summary:  954637 ops 15903.4 ops/s, (2447/2447 r/w)  77.5mb/s,
590us cpu/op,   2.6ms latency


That's a 3-4x improvement in ops/sec and average fsync time.


Here are the results from our UFS software mirror for comparison:
 4984: 211.056: Per-Operation Breakdown
closefile4465ops/s   0.0mb/s  0.0ms/op   23us/op-cpu
readfile4 465ops/s  12.6mb/s  0.1ms/op  142us/op-cpu
openfile4 465ops/s   0.0mb/s  0.1ms/op   83us/op-cpu
closefile3465ops/s   0.0mb/s  0.0ms/op   24us/op-cpu
fsyncfile3465ops/s   0.0mb/s  6.0ms/op  498us/op-cpu
appendfilerand3   465ops/s   7.3mb/s  1.7ms/op  282us/op-cpu
readfile3 465ops/s  11.1mb/s  0.1ms/op  132us/op-cpu
openfile3 465ops/s   0.0mb/s  0.1ms/op   84us/op-cpu
closefile2465ops/s   0.0mb/s  0.0ms/op   26us/op-cpu
fsyncfile2465ops/s   0.0mb/s  5.9ms/op  445us/op-cpu
appendfilerand2   465ops/s   7.3mb/s  1.1ms/op  231us/op-cpu
createfile2   465ops/s   0.0mb/s  2.2ms/op  443us/op-cpu
deletefile1   465ops/s   0.0mb/s  2.0ms/op  269us/op-cpu

 4984: 211.057: 
IO Summary:  366557 ops 6049.2 ops/s, (931/931 r/w)  38.2mb/s,912us
cpu/op,   4.8ms latency


So either we're hitting a pretty serious zfs bug, or they're purposely
holding back performance in Solaris 10 so that we all have a good reason to
upgrade to 11.  ;) 
 

-Nick
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Bugid 6535160

2007-12-14 Thread Vincent Fox
So does anyone have any insight on BugID 6535160?

We have verified on a similar system, that ZFS shows big latency in filebench 
varmail test.

We formatted the same LUN with UFS and latency went down from 300 ms to 1-2 ms.

http://sunsolve.sun.com/search/document.do?assetkey=1-1-6535160-1

We run Solaris 10u4 on our production systems, don't see any indication of a 
patch for this.

I'll try downloading recent Nevada build and load it on same system and see if 
the problem has indeed vanished post snv_71.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bugid 6535160

2007-12-14 Thread Neil Perrin
Vincent Fox wrote:
 So does anyone have any insight on BugID 6535160?
 
 We have verified on a similar system, that ZFS shows big latency in filebench 
 varmail test.
 
 We formatted the same LUN with UFS and latency went down from 300 ms to 1-2 
 ms.

This is such a big difference it makes me think something else is going on.
I suspect one of two possible causes:

A) The disk write cache is enabled and volatile. UFS knows nothing of write 
caches
   and requires the write cache to be disabled otherwise corruption can occur.
B) The write cache is non volatile, but ZFS hasn't been configured
   to stop flushing it (set zfs:zfs_nocacheflush = 1).
   Note, ZFS enables the write cache and will flush it as necessary.

 
 http://sunsolve.sun.com/search/document.do?assetkey=1-1-6535160-1
 
 We run Solaris 10u4 on our production systems, don't see any indication
 of a patch for this.
 
 I'll try downloading recent Nevada build and load it on same system and see
 if the problem has indeed vanished post snv_71.

Yes please try this. I think it will make a difference but the delta
will be small.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bugid 6535160

2007-12-14 Thread Neil Perrin
Vincent Fox wrote:
 So does anyone have any insight on BugID 6535160?

 We have verified on a similar system, that ZFS shows big latency in filebench 
 varmail test.

 We formatted the same LUN with UFS and latency went down from 300 ms to 1-2 
 ms.

This is such a big difference it makes me think something else is going on.
I suspect one of two possible causes:

A) The disk write cache is enabled and volatile. UFS knows nothing of write 
caches
  and requires the write cache to be disabled otherwise corruption can occur.
B) The write cache is non volatile, but ZFS hasn't been configured
  to stop flushing it (set zfs:zfs_nocacheflush = 1).
  Note, ZFS enables the write cache and will flush it as necessary.


 http://sunsolve.sun.com/search/document.do?assetkey=1-1-6535160-1

 We run Solaris 10u4 on our production systems, don't see any indication
 of a patch for this.

 I'll try downloading recent Nevada build and load it on same system and see
 if the problem has indeed vanished post snv_71.

Yes please try this. I think it will make a difference but the delta
will be small.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bugid 6535160

2007-12-14 Thread Vincent Fox
 ) The write cache is non volatile, but ZFS hasn't
 been configured
 to stop flushing it (set zfs:zfs_nocacheflush =
  1).

These are a pair of 2540 with dual-controllers, definitely non-volatile cache.

We set the zfs_nocacheflush=1 and that improved things considerably.

ZFS filesystem (2540 arrays):
 fsyncfile3434ops/s   0.0mb/s 17.3ms/op   977us/op-cpu
 fsyncfile2434ops/s   0.0mb/s 17.8ms/op   981us/op-cpu

However still not very good compared to UFS.

We turned off ZIL with zil_disable=1 and WOW!
ZFS ZIL disabled:
 fsyncfile3   1148ops/s   0.0mb/s  0.0ms/op   18us/op-cpu
 fsyncfile2   1148ops/s   0.0mb/s  0.0ms/op   18us/op-cpu

Not a good setting to use in production but useful data.

Anyhow will take some time to get OpenSolaris onto the system, will report back 
then.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss