Ben Rockwood wrote:
 > I've got a Thumper doing nothing but serving NFS.  Its using B43 with
 > zil_disabled.  The system is being consumed in waves, but by what I
 > don't know.  Notice vmstat:

We made several performance fixes in the NFS/ZFS area in recent builds, 
so if possible it would be great to upgrade you from snv_43.  That said, 
there might be something else going on that we haven't accounted for.
...

> 
> Step 1 was to disable any ZFS features that might consume large amounts of 
> CPU:
> 
> # zfs set compression=off joyous
> # zfs set atime=off joyous
> # zfs set checksum=off joyous

In our performance testing, we haven't found checksums to be anywhere 
near a large consumer of CPU, so i would recommend leaving that on (due 
to its benefits).

I suspect your apps/clients don't depend on atime, so i think its a good 
idea to turn that off.  We've gotten better NFS performance with this off.

More of a heads up as it sounds like compression on/off isn't your 
problem.  If you are not getting good I/O BW with compression turned on, 
its most likely due to:
6460622 zio_nowait() doesn't live up to its name

As Jim, mentioned, using lockstat to figure out where your CPU is being 
spent is the first step.  I've been using 'lockstat -kgIW -D 60 sleep 
60'.  That collects data for the top 60 callers for a 1 minute period. 
If you see 'mutex_enter' high up in the results, then we have at least 
mutex lock contention.


...


> Interestingly, using prstat -mL to monitor thread latency, we see that a 
> handful of threads are the culprates for consuming mass CPU:
> 
> [private:/tmp] root# prstat -mL
>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
>  22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 416   1   0   0 nfsd/1506
>  22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 415   0   0   0 nfsd/1563
>  22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 417   0   0   0 nfsd/1554
>  22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 419   0   0   0 nfsd/1551
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0  26  74 418   0   0   0 nfsd/1553
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 nfsd/1536
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 nfsd/1555
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 nfsd/1539
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 nfsd/1562
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 nfsd/1545
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 nfsd/1559
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 419   1   0   0 nfsd/1541
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 nfsd/1546
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 nfsd/1543
>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 nfsd/1560
> Total: 33 processes, 218 lwps, load averages: 4.64, 6.20, 5.86

The high SYS times being charged to the userland nfsd threads is 
representative of what the kernel threads are doing (which is most 
likely going to be NFS and ZFS).


> Running zvop_times.d (http://blogs.sun.com/erickustarz/resource/zvop_times.d) 
> I get an idea of what ZFS is doing:
> 
> [private:/] root# dtrace -s  zvop_times.d
> dtrace: script 'zvop_times.d' matched 66 probes
> ^C
> CPU     ID                    FUNCTION:NAME
>   1      2                             :END 
> ZFS COUNT
> 
>   zfs_getsecattr                                                    4
>   zfs_space                                                        70
>   zfs_rename                                                      111
>   zfs_readdir                                                     284
>   zfs_read                                                        367
>   zfs_mkdir                                                       670
>   zfs_setattr                                                    1054
>   zfs_frlock                                                     1562
>   zfs_putpage                                                    3916
>   zfs_write                                                      4110
>   zfs_create                                                     4178
>   zfs_remove                                                     7794
>   zfs_fid                                                       14960
>   zfs_inactive                                                  17637
>   zfs_access                                                    20809
>   zfs_fsync                                                     29668
>   zfs_lookup                                                    31273
>   zfs_getattr                                                  175457
> 
> 
> 
> ZFS AVG TIME
> 
>   zfs_fsync                                                      2337
>   zfs_getattr                                                    2735
>   zfs_access                                                     2774
>   zfs_fid                                                        2948
>   zfs_inactive                                                   4081
>   zfs_lookup                                                     6067
>   zfs_frlock                                                     6750
>   zfs_putpage                                                    6840
>   zfs_mkdir                                                      9259
>   zfs_getsecattr                                                10940
>   zfs_setattr                                                   15364
>   zfs_space                                                     25214
>   zfs_readdir                                                   25726
>   zfs_remove                                                    26117
>   zfs_write                                                     31942
>   zfs_rename                                                    60504
>   zfs_read                                                     144449
>   zfs_create                                                 71215587

An average of 71 msecs per create seems high.  Doing a quantize instead 
of just an average on zfs_create() would be interesting (should have 
included that in my script)...

...


> Locks...
> 
> [private:/tmp] root# lockstat -P sleep 10
> 
> Adaptive mutex spin: 20406 events in 10.026 seconds (2035 events/sec)
> 
> Count indv cuml rcnt     spin Lock                   Caller                  
> -------------------------------------------------------------------------------
>    32 100% 100% 0.00 59963092 0xffffffffb282a1c8     dmu_object_alloc+0x75 \

That's interesting.  This is build snv_43, right?  Doesn't exactly line 
up with current bits, but i would guess this is " 
mutex_enter(&osi->os_obj_lock);".

Can you show us the output for a longer sampling period, such as 
'lockstat -P -D 30 sleep 60'?  The "Adaptive mutex spin" category is 
usually the most/only interesting one in my experience.

I would be curious to know your memory usage.  What is the physical 
memory on the machine?

And can you run:
# echo 'arc::print -d' | mdb -k
# echo '::kmastat' | mdb -k | sort -n -k 5 | tail -40

Matt has a nice little dscript that figures out how long txgs are taking 
(i'll attach that).  I believe it will work with old bits (i defined 
"OLD" the for that to work).  Can you try running that?

Can you run 'zpool list' and 'zpool status'?  I'd like to know how full 
your pool is.

Let's start with that info,
eric
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: zfs_sync.d
URL: 
<http://mail.opensolaris.org/pipermail/nfs-discuss/attachments/20061207/9b11f986/attachment.ksh>

Reply via email to