Re: [nfs-discuss] Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-11 Thread Ben Rockwood

Robert Milkowski wrote:

Hello eric,

Saturday, December 9, 2006, 7:07:49 PM, you wrote:

ek Jim Mauro wrote:
  
Could be NFS synchronous semantics on file create (followed by 
repeated flushing of the write cache).  What kind of storage are you 
using (feel free to send privately if you need to) - is it a thumper? 

It's not clear why NFS-enforced synchronous semantics would induce 
different behavior than the same

load to a local ZFS.
  


ek Actually i forgot he had 'zil_disable' turned on, so it won't matter in
ek this case.


Ben, are you sure zil_disable was set to 1 BEFORE pool was imported?
  


Yes, absolutely.  Set var in /etc/system, reboot, system come up.  That 
happened almost 2 months ago, long before this lock insanity problem 
popped up.


To be clear, the ZIL issue was a problem for creation of a handful of 
files of any size.  Untar'ing a file was a massive performance drain.  
This issue, other the other hand, deals with thousands of little files 
being created all the time (IMAP Locks).  These are separate issues from 
my point of view.  With ZIL slowness NFS performance was just slow but 
we didn't see massive CPU usage, with this issue on the other hand we 
were seeing waves in 10 second-ish cycles where the run queue would go 
sky high with 0 idle.  Please see the earlier mails for examples of the 
symptoms.


benr.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [nfs-discuss] Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-11 Thread Robert Milkowski
Hello Ben,

Monday, December 11, 2006, 9:34:18 PM, you wrote:

BR Robert Milkowski wrote:
 Hello eric,

 Saturday, December 9, 2006, 7:07:49 PM, you wrote:

 ek Jim Mauro wrote:
   
 Could be NFS synchronous semantics on file create (followed by 
 repeated flushing of the write cache).  What kind of storage are you 
 using (feel free to send privately if you need to) - is it a thumper? 
 
 It's not clear why NFS-enforced synchronous semantics would induce 
 different behavior than the same
 load to a local ZFS.
   

 ek Actually i forgot he had 'zil_disable' turned on, so it won't matter in
 ek this case.


 Ben, are you sure zil_disable was set to 1 BEFORE pool was imported?
   

BR Yes, absolutely.  Set var in /etc/system, reboot, system come up. That
BR happened almost 2 months ago, long before this lock insanity problem 
BR popped up.

BR To be clear, the ZIL issue was a problem for creation of a handful of 
BR files of any size.  Untar'ing a file was a massive performance drain.
BR This issue, other the other hand, deals with thousands of little files
BR being created all the time (IMAP Locks).  These are separate issues from
BR my point of view.  With ZIL slowness NFS performance was just slow but
BR we didn't see massive CPU usage, with this issue on the other hand we 
BR were seeing waves in 10 second-ish cycles where the run queue would go
BR sky high with 0 idle.  Please see the earlier mails for examples of the
BR symptoms.

Ok.

And just another question - is nfs file system mounted with noac
options on imap server and application is doing chdir() to nfs
directories? I'm not sure if it's fixed - if not your nfs client on
every chdir() will generate lot of small traffic to nfs server
starving it of cpu.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [nfs-discuss] Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-11 Thread Richard Elling

BR Yes, absolutely.  Set var in /etc/system, reboot, system come up. That
BR happened almost 2 months ago, long before this lock insanity problem
BR popped up.

For the archives, a high level of lock activity can always be a problem.
The worst cases I've experienced were with record locking over NFS.
The worst offenders were programs written with a local file system in
mind, especially PC-based applications.  This has become one of the
first things I check in such instances, and the network traffic will
be consistently full of lock activity.  Just one of those things you
need to watch out for in a networked world :-)
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Spencer Shepler
On Fri, Ben Rockwood wrote:
 eric kustarz wrote:
 So i'm guessing there's lots of files being created over NFS in one 
 particular dataset?
 
 We should figure out how many creates/second you are doing over NFS (i 
 should have put a timeout on the script).  Here's a real simple one 
 (from your snoop it looked like you're only doing NFSv3, so i'm not 
 tracking NFSv4):
 
 #!/usr/sbin/dtrace -s
 
 rfs3_create:entry,
 zfs_create:entry
 {
 @creates[probefunc] = count();
 }
 
 tick-60s
 {
 exit(0);
 }
 
 
 
 Eric, I love you. 
 
 Running this bit of DTrace reveled more than 4,000 files being created 
 in almost any given 60 second window.  And I've only got one system that 
 would fit that sort of mass file creation: our Joyent Connector products 
 Courier IMAP server which uses Maildir.  As a test I simply shutdown 
 Courier and unmounted the mail NFS share for good measure and sure 
 enough the problem vanished and could not be reproduced.  10 minutes 
 later I re-enabled Courier and our problem came back. 
 
 Clearly ZFS file creation is just amazingly heavy even with ZIL 
 disabled.  If creating 4,000 files in a minute squashes 4 2.6Ghz Opteron 
 cores we're in big trouble in the longer term.  In the meantime I'm 
 going to find a new home for our IMAP Mail so that the other things 
 served from that NFS server at least aren't effected.
 
 You asked for the zpool and zfs info, which I don't want to share 
 because its confidential (if you want it privately I'll do so, but not 
 on a public list), but I will say that its a single massive Zpool in 
 which we're using less than 2% of the capacity.   But in thinking about 
 this problem, even if we used 2 or more pools, the CPU consumption still 
 would have choked the system, right?  This leaves me really nervous 
 about what we'll do when its not an internal mail server thats creating 
 all those files but a customer. 
 
 Oddly enough, this might be a very good reason to use iSCSI instead of 
 NFS on the Thumper.
 
 Eric, I owe you a couple cases of beer for sure.  I can't tell you how 
 much I appreciate your help.  Thanks to everyone else who chimed in with 
 ideas and suggestions, all of you guys are the best!

Good to hear that you have figured out what is happening, Ben.

For future reference, there are two commands that you may want to
make use of in observing the behavior of the NFS server and individual
filesystems.

There is the trusty, nfsstat command.  In this case, you would have been
able to do something like:
nfsstat -s -v3 60

This will provide all of the server side NFSv3 statistics on 60 second
intervals.  

Then there is a new command fsstat that will provide vnode level
activity on a per filesystem basis.  Therefore, if the NFS server
has multiple filesystems active and you want ot look at just one
something like this can be helpful:

fsstat /export/foo 60

Fsstat has a 'full' option that will list all of the vnode operations
or just certain types.  It also will watch a filesystem type (e.g. zfs, nfs).
Very useful.

Spencer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Bill Moore
On Fri, Dec 08, 2006 at 12:15:27AM -0800, Ben Rockwood wrote:
 Clearly ZFS file creation is just amazingly heavy even with ZIL 
 disabled.  If creating 4,000 files in a minute squashes 4 2.6Ghz Opteron 
 cores we're in big trouble in the longer term.  In the meantime I'm 
 going to find a new home for our IMAP Mail so that the other things 
 served from that NFS server at least aren't effected.

For local tests, this is not true of ZFS.  It seems that file creation
only swamps us when coming over NFS.  We can do thousands of files a
second on a Thumper with room to spare if NFS isn't involved.

Next step is to figure out why NFS kills us.


--Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Ben Rockwood

Spencer Shepler wrote:

Good to hear that you have figured out what is happening, Ben.

For future reference, there are two commands that you may want to
make use of in observing the behavior of the NFS server and individual
filesystems.

There is the trusty, nfsstat command.  In this case, you would have been
able to do something like:
nfsstat -s -v3 60

This will provide all of the server side NFSv3 statistics on 60 second
intervals.  


Then there is a new command fsstat that will provide vnode level
activity on a per filesystem basis.  Therefore, if the NFS server
has multiple filesystems active and you want ot look at just one
something like this can be helpful:

fsstat /export/foo 60

Fsstat has a 'full' option that will list all of the vnode operations
or just certain types.  It also will watch a filesystem type (e.g. zfs, nfs).
Very useful.
  



NFSstat I've been using, but fsstat I was unaware of.  Which I'd used it 
rather than duplicated most of its functionality with D script. :)


Thanks for the tip.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Ben Rockwood

Bill Moore wrote:

On Fri, Dec 08, 2006 at 12:15:27AM -0800, Ben Rockwood wrote:
  
Clearly ZFS file creation is just amazingly heavy even with ZIL 
disabled.  If creating 4,000 files in a minute squashes 4 2.6Ghz Opteron 
cores we're in big trouble in the longer term.  In the meantime I'm 
going to find a new home for our IMAP Mail so that the other things 
served from that NFS server at least aren't effected.



For local tests, this is not true of ZFS.  It seems that file creation
only swamps us when coming over NFS.  We can do thousands of files a
second on a Thumper with room to spare if NFS isn't involved.

Next step is to figure out why NFS kills us.
  


Agreed.  If mass file creation was a problem locally I'd think that we'd 
have people beating down the doors with complaints.


One thought I had as a work around was to move all my mail on NFS to an 
iSCSI LUN and then put a Zpool on that.  I'm willing to bet that'd work 
fine.  Hopefully I can try it.



To round out the discussion, the root cause of this whole mess was 
Courier IMAP Locking.   After isolating the problem last night and 
writing a little d script to find out what files were being create it 
was obviously lock files, turn off locking and file creations dropped to 
a reasonable level and our problem vanished.


If I can help at all with testing or analysis please let me know.


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread eric kustarz

Ben Rockwood wrote:

Bill Moore wrote:


On Fri, Dec 08, 2006 at 12:15:27AM -0800, Ben Rockwood wrote:
 

Clearly ZFS file creation is just amazingly heavy even with ZIL 
disabled.  If creating 4,000 files in a minute squashes 4 2.6Ghz 
Opteron cores we're in big trouble in the longer term.  In the 
meantime I'm going to find a new home for our IMAP Mail so that the 
other things served from that NFS server at least aren't effected.




For local tests, this is not true of ZFS.  It seems that file creation
only swamps us when coming over NFS.  We can do thousands of files a
second on a Thumper with room to spare if NFS isn't involved.

Next step is to figure out why NFS kills us.
  



Agreed.  If mass file creation was a problem locally I'd think that we'd 
have people beating down the doors with complaints.


One thought I had as a work around was to move all my mail on NFS to an 
iSCSI LUN and then put a Zpool on that.  I'm willing to bet that'd work 
fine.  Hopefully I can try it.


Could be NFS synchronous semantics on file create (followed by repeated 
flushing of the write cache).  What kind of storage are you using (feel 
free to send privately if you need to) - is it a thumper?





To round out the discussion, the root cause of this whole mess was 
Courier IMAP Locking.   After isolating the problem last night and 
writing a little d script to find out what files were being create it 
was obviously lock files, turn off locking and file creations dropped to 
a reasonable level and our problem vanished.


Gotta love that D.



If I can help at all with testing or analysis please let me know.


I want to create a test case in-house first and see if its reproducable.

eric

ps: i like lots of different types of beers
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Jim Mauro


Could be NFS synchronous semantics on file create (followed by 
repeated flushing of the write cache).  What kind of storage are you 
using (feel free to send privately if you need to) - is it a thumper? 
It's not clear why NFS-enforced synchronous semantics would induce 
different behavior than the same

load to a local ZFS.

File creates are metadata intensive, right? And these operations need to 
be synchronous to guarantee

file system consistency (yes, I am familiar with the ZFS COW model).

AnywayI'm feeling rather naive' here, but I've seen the NFS 
enforced synchronous semantics phrase
kicked around many times as the explanation for suboptimal performance 
for metadata-intensive
operations when ZFS is the underlying file system, but I never really 
understood what's unsynchronous

about doing the same thing to a local ZFS.

And yes, there is certainly a network latency component to the NFS 
configuration, so for any
synchronous operation, I would expect things to be slower when done over 
NFS.


Awaiting enlightment

:^)

/jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread eric kustarz

Jim Mauro wrote:


Could be NFS synchronous semantics on file create (followed by 
repeated flushing of the write cache).  What kind of storage are you 
using (feel free to send privately if you need to) - is it a thumper? 


It's not clear why NFS-enforced synchronous semantics would induce 
different behavior than the same

load to a local ZFS.


Actually i forgot he had 'zil_disable' turned on, so it won't matter in 
this case.


The big difference is that the NFS server does a VOP_FSYNC on each file 
create as it needs to guarantee that the changes are on stable storage. 
 Locally, we don't need that same guarantee so you don't have to wait 
for a disk I/O to complete on each create.


eric



File creates are metadata intensive, right? And these operations need to 
be synchronous to guarantee

file system consistency (yes, I am familiar with the ZFS COW model).

AnywayI'm feeling rather naive' here, but I've seen the NFS 
enforced synchronous semantics phrase
kicked around many times as the explanation for suboptimal performance 
for metadata-intensive
operations when ZFS is the underlying file system, but I never really 
understood what's unsynchronous

about doing the same thing to a local ZFS.

And yes, there is certainly a network latency component to the NFS 
configuration, so for any
synchronous operation, I would expect things to be slower when done over 
NFS.


Awaiting enlightment

:^)

/jim



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Ed Gould

On Dec 9, 2006, at 8:59 , Jim Mauro wrote:
AnywayI'm feeling rather naive' here, but I've seen the NFS  
enforced synchronous semantics phrase
kicked around many times as the explanation for suboptimal  
performance for metadata-intensive
operations when ZFS is the underlying file system, but I never  
really understood what's unsynchronous

about doing the same thing to a local ZFS


If I remember correctly, the difference is that NFS requires that the  
operation be committed to stable storage before the return to the  
client.  This is definitely a heavier operation than the local case,  
where the return to the caller may happen as soon as the operation is  
cached.  If there's a crash, the local system does not guarantee to  
the caller that the operation is on disk, but NFS does.


Both guarantee consistency but NFS makes stronger guarantees of  
completeness.


--Ed


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Casper . Dik

On Dec 9, 2006, at 8:59 , Jim Mauro wrote:
 AnywayI'm feeling rather naive' here, but I've seen the NFS  
 enforced synchronous semantics phrase
 kicked around many times as the explanation for suboptimal  
 performance for metadata-intensive
 operations when ZFS is the underlying file system, but I never  
 really understood what's unsynchronous
 about doing the same thing to a local ZFS

If I remember correctly, the difference is that NFS requires that the  
operation be committed to stable storage before the return to the  
client.  This is definitely a heavier operation than the local case,  
where the return to the caller may happen as soon as the operation is  
cached.  If there's a crash, the local system does not guarantee to  
the caller that the operation is on disk, but NFS does.

Both guarantee consistency but NFS makes stronger guarantees of  
completeness.


The currenet way to solve this from the NFS server's perspective
is to call something equivalent to sync().  It was my understanding
that disable ZIL would make such operations more or less asynchronous
but it still seems that NFS is doing a sync() like thing.

Perhaps there should be a different mechanism for making VOP_*
calls, calls which do not return until the specific call was
committed to stable storage.  (Interested parties get notified
when such events happen).

Unfortunately, that would cause the NFS client system calls to sleep
until the calls are completed (as they do now), but there doesn't seem
to be a need for that; writes currently need a specific commit and
it would perhaps be worthwhile to investigate more asynchronous
behaviour.  As long as the client remembers the non-committed calls,
there is no data loss issue; when the client crashes, you expect
data loss; when the server crashes, the client resends.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-08 Thread Ben Rockwood

eric kustarz wrote:
So i'm guessing there's lots of files being created over NFS in one 
particular dataset?


We should figure out how many creates/second you are doing over NFS (i 
should have put a timeout on the script).  Here's a real simple one 
(from your snoop it looked like you're only doing NFSv3, so i'm not 
tracking NFSv4):


#!/usr/sbin/dtrace -s

rfs3_create:entry,
zfs_create:entry
{
@creates[probefunc] = count();
}

tick-60s
{
exit(0);
}




Eric, I love you. 

Running this bit of DTrace reveled more than 4,000 files being created 
in almost any given 60 second window.  And I've only got one system that 
would fit that sort of mass file creation: our Joyent Connector products 
Courier IMAP server which uses Maildir.  As a test I simply shutdown 
Courier and unmounted the mail NFS share for good measure and sure 
enough the problem vanished and could not be reproduced.  10 minutes 
later I re-enabled Courier and our problem came back. 

Clearly ZFS file creation is just amazingly heavy even with ZIL 
disabled.  If creating 4,000 files in a minute squashes 4 2.6Ghz Opteron 
cores we're in big trouble in the longer term.  In the meantime I'm 
going to find a new home for our IMAP Mail so that the other things 
served from that NFS server at least aren't effected.


You asked for the zpool and zfs info, which I don't want to share 
because its confidential (if you want it privately I'll do so, but not 
on a public list), but I will say that its a single massive Zpool in 
which we're using less than 2% of the capacity.   But in thinking about 
this problem, even if we used 2 or more pools, the CPU consumption still 
would have choked the system, right?  This leaves me really nervous 
about what we'll do when its not an internal mail server thats creating 
all those files but a customer. 

Oddly enough, this might be a very good reason to use iSCSI instead of 
NFS on the Thumper.


Eric, I owe you a couple cases of beer for sure.  I can't tell you how 
much I appreciate your help.  Thanks to everyone else who chimed in with 
ideas and suggestions, all of you guys are the best!


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Jim Mauro


Hey Ben - I need more time to look at this and connect some dots,
but real quick

Some nfsstat data that we could use to potentially correlate to the local
server activity would be interesting. zfs_create() seems to be the
heavy hitter, but a periodic kernel profile (especially if we can catch
a 97% SYS period) would help:

#lockstat -i997 -Ik -s 10 sleep 60

Alternatively:

#dtrace -n 'profile-997hz / arg0 != 0 / { @s[stack()]=count(); }'

It would also be interesting to see what the zfs_create()'s are doing.
Perhaps a quick:

#dtrace -n 'zfs_create:entry { printf(ZFS Create: %s\n, 
stringof(args[0]-v_path)); }'


It would also be interesting to see the network stats. Grab Brendan's 
nicstat

and collect some samples

You're reference to low traffic is in bandwidth, which, as you indicate, 
is really,
really low. But the data, at least up to this point, suggests the 
workload is not
data/bandwidth intensive, but more attribute intensive. Note again 
zfs_create()
is the heavy ZFS function, along with zfs_getattr. Perhaps it's the 
attribute-intensive

nature of the load that is at the root of this.

I can spend more time on this tomorrow (traveling today).

Thanks,
/jim


Ben Rockwood wrote:

I've got a Thumper doing nothing but serving NFS.  Its using B43 with 
zil_disabled.  The system is being consumed in waves, but by what I don't know. 
 Notice vmstat:

 3 0 0 25693580 2586268 0 0  0  0  0  0  0  0  0  0  0  926   91  703  0 25 75
 21 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 13 14 1720   21 1105  0 92  8
 20 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 17 18 2538   70  834  0 100 0
 25 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0  0  0  745   18  179  0 100 0
 37 0 0 25693552 2586240 0 0 0  0  0  0  0  0  0  7  7 1152   52  313  0 100 0
 16 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0 15 13 1543   52  767  0 100 0
 17 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0  2  2  890   72  192  0 100 0
 27 0 0 25693572 2586260 0 0 0  0  0  0  0  0  0 15 15 3271   19 3103  0 98  2
 0 0 0 25693456 2586144 0 11 0  0  0  0  0  0  0 281 249 34335 242 37289 0 46 54
 0 0 0 25693448 2586136 0 2  0  0  0  0  0  0  0  0  0 2470  103 2900  0 27 73
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1062  105  822  0 26 74
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1076   91  857  0 25 75
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0  917  126  674  0 25 75

These spikes of sys load come in waves like this.  While there are close to a 
hundred systems mounting NFS shares on the Thumper, the amount of traffic is 
really low.  Nothing to justify this.  We're talking less than 10MB/s.

NFS is pathetically slow.  We're using NFSv3 TCP shared via ZFS sharenfs on a 
3Gbps aggregation (3*1Gbps).

I've been slamming my head against this problem for days and can't make 
headway.  I'll post some of my notes below.  Any thoughts or ideas are welcome!

benr.

===

Step 1 was to disable any ZFS features that might consume large amounts of CPU:

# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous

These changes had no effect.

Next was to consider that perhaps NFS was doing name lookups when it shouldn't. Indeed 
dns was specified in /etc/nsswitch.conf which won't work given that no DNS 
servers are accessable from the storage or private networks, but again, no improvement. 
In this process I removed dns from nsswitch.conf, deleted /etc/resolv.conf, and disabled 
the dns/client service in SMF.

Turning back to CPU usage, we can see the activity is all SYStem time and comes 
in waves:

[private:/tmp] root# sar 1 100

SunOS private.thumper1 5.11 snv_43 i86pc12/07/2006

10:38:05%usr%sys%wio   %idle
10:38:06   0  27   0  73
10:38:07   0  27   0  73
10:38:09   0  27   0  73
10:38:10   1  26   0  73
10:38:11   0  26   0  74
10:38:12   0  26   0  74
10:38:13   0  24   0  76
10:38:14   0   6   0  94
10:38:15   0   7   0  93
10:38:22   0  99   0   1  --
10:38:23   0  94   0   6  --
10:38:24   0  28   0  72
10:38:25   0  27   0  73
10:38:26   0  27   0  73
10:38:27   0  27   0  73
10:38:28   0  27   0  73
10:38:29   1  30   0  69
10:38:30   0  27   0  73

And so we consider whether or not there is a pattern to the frequency. The 
following is sar output from any lines in which sys is above 90%:

10:40:04%usr%sys%wio   %idleDelta
10:40:11   0  97   0   3
10:40:45   0  98   0   2   34 seconds
10:41:02   0  94   0   6   17 seconds
10:41:26   0 100   0   0   24 seconds
10:42:00   0 100   0   0   34 seconds
10:42:25   (end of sample) 25 seconds

Looking at the 

Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Neil Perrin

Ben,

The attached dscript might help determining the zfs_create issue.
It prints:
- a count of all functions called from zfs_create
- average wall count time of the 30 highest functions
- average cpu time of the 30 highest functions

Note, please ignore warnings of the following type:

dtrace: 1346 dynamic variable drops with non-empty dirty list

Neil.

Ben Rockwood wrote On 12/07/06 06:01,:

I've got a Thumper doing nothing but serving NFS.  Its using B43 with 
zil_disabled.  The system is being consumed in waves, but by what I don't know. 
 Notice vmstat:

 3 0 0 25693580 2586268 0 0  0  0  0  0  0  0  0  0  0  926   91  703  0 25 75
 21 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 13 14 1720   21 1105  0 92  8
 20 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 17 18 2538   70  834  0 100 0
 25 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0  0  0  745   18  179  0 100 0
 37 0 0 25693552 2586240 0 0 0  0  0  0  0  0  0  7  7 1152   52  313  0 100 0
 16 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0 15 13 1543   52  767  0 100 0
 17 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0  2  2  890   72  192  0 100 0
 27 0 0 25693572 2586260 0 0 0  0  0  0  0  0  0 15 15 3271   19 3103  0 98  2
 0 0 0 25693456 2586144 0 11 0  0  0  0  0  0  0 281 249 34335 242 37289 0 46 54
 0 0 0 25693448 2586136 0 2  0  0  0  0  0  0  0  0  0 2470  103 2900  0 27 73
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1062  105  822  0 26 74
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1076   91  857  0 25 75
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0  917  126  674  0 25 75

These spikes of sys load come in waves like this.  While there are close to a 
hundred systems mounting NFS shares on the Thumper, the amount of traffic is 
really low.  Nothing to justify this.  We're talking less than 10MB/s.

NFS is pathetically slow.  We're using NFSv3 TCP shared via ZFS sharenfs on a 
3Gbps aggregation (3*1Gbps).

I've been slamming my head against this problem for days and can't make 
headway.  I'll post some of my notes below.  Any thoughts or ideas are welcome!

benr.

===

Step 1 was to disable any ZFS features that might consume large amounts of CPU:

# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous

These changes had no effect.

Next was to consider that perhaps NFS was doing name lookups when it shouldn't. Indeed 
dns was specified in /etc/nsswitch.conf which won't work given that no DNS 
servers are accessable from the storage or private networks, but again, no improvement. 
In this process I removed dns from nsswitch.conf, deleted /etc/resolv.conf, and disabled 
the dns/client service in SMF.

Turning back to CPU usage, we can see the activity is all SYStem time and comes 
in waves:

[private:/tmp] root# sar 1 100

SunOS private.thumper1 5.11 snv_43 i86pc12/07/2006

10:38:05%usr%sys%wio   %idle
10:38:06   0  27   0  73
10:38:07   0  27   0  73
10:38:09   0  27   0  73
10:38:10   1  26   0  73
10:38:11   0  26   0  74
10:38:12   0  26   0  74
10:38:13   0  24   0  76
10:38:14   0   6   0  94
10:38:15   0   7   0  93
10:38:22   0  99   0   1  --
10:38:23   0  94   0   6  --
10:38:24   0  28   0  72
10:38:25   0  27   0  73
10:38:26   0  27   0  73
10:38:27   0  27   0  73
10:38:28   0  27   0  73
10:38:29   1  30   0  69
10:38:30   0  27   0  73

And so we consider whether or not there is a pattern to the frequency. The 
following is sar output from any lines in which sys is above 90%:

10:40:04%usr%sys%wio   %idleDelta
10:40:11   0  97   0   3
10:40:45   0  98   0   2   34 seconds
10:41:02   0  94   0   6   17 seconds
10:41:26   0 100   0   0   24 seconds
10:42:00   0 100   0   0   34 seconds
10:42:25   (end of sample) 25 seconds

Looking at the congestion in the run queue:

[private:/tmp] root# sar -q 5 100

10:45:43 runq-sz %runocc swpq-sz %swpocc
10:45:5127.0  85 0.0   0
10:45:57 1.0  20 0.0   0
10:46:02 2.0  60 0.0   0
10:46:1319.8  99 0.0   0
10:46:2317.7  99 0.0   0
10:46:3424.4  99 0.0   0
10:46:4122.1  97 0.0   0
10:46:4813.0  96 0.0   0
10:46:5525.3 102 0.0   0

Looking at the per-CPU breakdown:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   324  224000  1540 00 100   0   0
  10   00   1140  2260   10   130860   1   0  99
  20   00   162  138  1490540 00  

Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread eric kustarz

Ben Rockwood wrote:

Eric Kustarz wrote:


Ben Rockwood wrote:
 I've got a Thumper doing nothing but serving NFS.  Its using B43 with
 zil_disabled.  The system is being consumed in waves, but by what I
 don't know.  Notice vmstat:

We made several performance fixes in the NFS/ZFS area in recent 
builds, so if possible it would be great to upgrade you from snv_43.  
That said, there might be something else going on that we haven't 
accounted for.

...



Step 1 was to disable any ZFS features that might consume large 
amounts of CPU:


# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous



In our performance testing, we haven't found checksums to be anywhere 
near a large consumer of CPU, so i would recommend leaving that on 
(due to its benefits).


I suspect your apps/clients don't depend on atime, so i think its a 
good idea to turn that off.  We've gotten better NFS performance with 
this off.


More of a heads up as it sounds like compression on/off isn't your 
problem.  If you are not getting good I/O BW with compression turned 
on, its most likely due to:

6460622 zio_nowait() doesn't live up to its name

As Jim, mentioned, using lockstat to figure out where your CPU is 
being spent is the first step.  I've been using 'lockstat -kgIW -D 60 
sleep 60'.  That collects data for the top 60 callers for a 1 minute 
period. If you see 'mutex_enter' high up in the results, then we have 
at least mutex lock contention.



...


Interestingly, using prstat -mL to monitor thread latency, we see 
that a handful of threads are the culprates for consuming mass CPU:


[private:/tmp] root# prstat -mL
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG 
PROCESS/LWPID  22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 416   
1   0   0 nfsd/1506
 22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 415   0   0   0 
nfsd/1563
 22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 417   0   0   0 
nfsd/1554
 22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 419   0   0   0 
nfsd/1551
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0  26  74 418   0   0   0 
nfsd/1553
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1536
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1555
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
nfsd/1539
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1562
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
nfsd/1545
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1559
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 419   1   0   0 
nfsd/1541
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1546
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1543
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
nfsd/1560

Total: 33 processes, 218 lwps, load averages: 4.64, 6.20, 5.86



The high SYS times being charged to the userland nfsd threads is 
representative of what the kernel threads are doing (which is most 
likely going to be NFS and ZFS).



Running zvop_times.d 
(http://blogs.sun.com/erickustarz/resource/zvop_times.d) I get an 
idea of what ZFS is doing:


[private:/] root# dtrace -s  zvop_times.d
dtrace: script 'zvop_times.d' matched 66 probes
^C
CPU IDFUNCTION:NAME
  1  2 :END ZFS COUNT

  zfs_getsecattr4
  zfs_space70
  zfs_rename  111
  zfs_readdir 284
  zfs_read367
  zfs_mkdir   670
  zfs_setattr1054
  zfs_frlock 1562
  zfs_putpage3916
  zfs_write  4110
  zfs_create 4178
  zfs_remove 7794
  zfs_fid   14960
  zfs_inactive  17637
  zfs_access20809
  zfs_fsync 29668
  zfs_lookup31273
  zfs_getattr  175457



ZFS AVG TIME

  zfs_fsync  2337
  zfs_getattr2735
  zfs_access 2774
  zfs_fid2948
  zfs_inactive