[nfs-discuss] Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

eric kustarz Thu, 07 Dec 2006 19:54:55 -0800

Ben Rockwood wrote:
> Eric Kustarz wrote:
> 
>> Ben Rockwood wrote:
>> > I've got a Thumper doing nothing but serving NFS.  Its using B43 with
>> > zil_disabled.  The system is being consumed in waves, but by what I
>> > don't know.  Notice vmstat:
>>
>> We made several performance fixes in the NFS/ZFS area in recent 
>> builds, so if possible it would be great to upgrade you from snv_43.  
>> That said, there might be something else going on that we haven't 
>> accounted for.
>> ...
>>
>>>
>>> Step 1 was to disable any ZFS features that might consume large 
>>> amounts of CPU:
>>>
>>> # zfs set compression=off joyous
>>> # zfs set atime=off joyous
>>> # zfs set checksum=off joyous
>>
>>
>> In our performance testing, we haven't found checksums to be anywhere 
>> near a large consumer of CPU, so i would recommend leaving that on 
>> (due to its benefits).
>>
>> I suspect your apps/clients don't depend on atime, so i think its a 
>> good idea to turn that off.  We've gotten better NFS performance with 
>> this off.
>>
>> More of a heads up as it sounds like compression on/off isn't your 
>> problem.  If you are not getting good I/O BW with compression turned 
>> on, its most likely due to:
>> 6460622 zio_nowait() doesn't live up to its name
>>
>> As Jim, mentioned, using lockstat to figure out where your CPU is 
>> being spent is the first step.  I've been using 'lockstat -kgIW -D 60 
>> sleep 60'.  That collects data for the top 60 callers for a 1 minute 
>> period. If you see 'mutex_enter' high up in the results, then we have 
>> at least mutex lock contention.
>>
>>
>> ...
>>
>>
>>> Interestingly, using prstat -mL to monitor thread latency, we see 
>>> that a handful of threads are the culprates for consuming mass CPU:
>>>
>>> [private:/tmp] root# prstat -mL
>>>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG 
>>> PROCESS/LWPID  22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 416   
>>> 1   0   0 nfsd/1506
>>>  22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 415   0   0   0 
>>> nfsd/1563
>>>  22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 417   0   0   0 
>>> nfsd/1554
>>>  22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 419   0   0   0 
>>> nfsd/1551
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0  26  74 418   0   0   0 
>>> nfsd/1553
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
>>> nfsd/1536
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
>>> nfsd/1555
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
>>> nfsd/1539
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
>>> nfsd/1562
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
>>> nfsd/1545
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
>>> nfsd/1559
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 419   1   0   0 
>>> nfsd/1541
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
>>> nfsd/1546
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
>>> nfsd/1543
>>>  22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
>>> nfsd/1560
>>> Total: 33 processes, 218 lwps, load averages: 4.64, 6.20, 5.86
>>
>>
>> The high SYS times being charged to the userland nfsd threads is 
>> representative of what the kernel threads are doing (which is most 
>> likely going to be NFS and ZFS).
>>
>>
>>> Running zvop_times.d 
>>> (http://blogs.sun.com/erickustarz/resource/zvop_times.d) I get an 
>>> idea of what ZFS is doing:
>>>
>>> [private:/] root# dtrace -s  zvop_times.d
>>> dtrace: script 'zvop_times.d' matched 66 probes
>>> ^C
>>> CPU     ID                    FUNCTION:NAME
>>>   1      2                             :END ZFS COUNT
>>>
>>>   zfs_getsecattr                                                    4
>>>   zfs_space                                                        70
>>>   zfs_rename                                                      111
>>>   zfs_readdir                                                     284
>>>   zfs_read                                                        367
>>>   zfs_mkdir                                                       670
>>>   zfs_setattr                                                    1054
>>>   zfs_frlock                                                     1562
>>>   zfs_putpage                                                    3916
>>>   zfs_write                                                      4110
>>>   zfs_create                                                     4178
>>>   zfs_remove                                                     7794
>>>   zfs_fid                                                       14960
>>>   zfs_inactive                                                  17637
>>>   zfs_access                                                    20809
>>>   zfs_fsync                                                     29668
>>>   zfs_lookup                                                    31273
>>>   zfs_getattr                                                  175457
>>>
>>>
>>>
>>> ZFS AVG TIME
>>>
>>>   zfs_fsync                                                      2337
>>>   zfs_getattr                                                    2735
>>>   zfs_access                                                     2774
>>>   zfs_fid                                                        2948
>>>   zfs_inactive                                                   4081
>>>   zfs_lookup                                                     6067
>>>   zfs_frlock                                                     6750
>>>   zfs_putpage                                                    6840
>>>   zfs_mkdir                                                      9259
>>>   zfs_getsecattr                                                10940
>>>   zfs_setattr                                                   15364
>>>   zfs_space                                                     25214
>>>   zfs_readdir                                                   25726
>>>   zfs_remove                                                    26117
>>>   zfs_write                                                     31942
>>>   zfs_rename                                                    60504
>>>   zfs_read                                                     144449
>>>   zfs_create                                                 71215587
>>
>>
>> An average of 71 msecs per create seems high.  Doing a quantize 
>> instead of just an average on zfs_create() would be interesting 
>> (should have included that in my script)...
>>
>> ...
>>
>>
>>> Locks...
>>>
>>> [private:/tmp] root# lockstat -P sleep 10
>>>
>>> Adaptive mutex spin: 20406 events in 10.026 seconds (2035 events/sec)
>>>
>>> Count indv cuml rcnt     spin Lock                   
>>> Caller                  
>>> -------------------------------------------------------------------------------
>>>  
>>>
>>>    32 100% 100% 0.00 59963092 0xffffffffb282a1c8     
>>> dmu_object_alloc+0x75 \
>>
>>
>> That's interesting.  This is build snv_43, right?  Doesn't exactly 
>> line up with current bits, but i would guess this is " 
>> mutex_enter(&osi->os_obj_lock);".
>>
>> Can you show us the output for a longer sampling period, such as 
>> 'lockstat -P -D 30 sleep 60'?  The "Adaptive mutex spin" category is 
>> usually the most/only interesting one in my experience.
>>
>> I would be curious to know your memory usage.  What is the physical 
>> memory on the machine?
>>
>> And can you run:
>> # echo 'arc::print -d' | mdb -k
>> # echo '::kmastat' | mdb -k | sort -n -k 5 | tail -40
>>
>> Matt has a nice little dscript that figures out how long txgs are 
>> taking (i'll attach that).  I believe it will work with old bits (i 
>> defined "OLD" the for that to work).  Can you try running that?
>>
>> Can you run 'zpool list' and 'zpool status'?  I'd like to know how 
>> full your pool is.
>>
>> Let's start with that info,
>> eric
> 
> 
> 
> Here is a bunch of the data you asked for.  I can't tell you how much I 
> appreciate the help from everyone.
> 
> Lockstat:
> 
> [private:/tmp] root# lockstat -kgIW -D 60 sleep 60


Hmm, looks like we caught the wrong 60 seconds :/ ... from the below, we 
have 78% idle CPU, 20% at and "below" nfssys.  Though we still have good 
data here to go on.  Most of the time of what the NFS server is doing 
CPU-wise is in zfs_mknode()/dmu_object_alloc() - 17%.  Also, seeing 8% 
of the CPU in mutex_vector_enter() suggests lock contention (which from 
your previous posts is probably 'osi->os_obj_lock').

dmu_object_alloc() is being a pig at 17% because each of the three main 
functions called under the 'osi->os_obj_lock' are being mini-pigs: each 
at 4%.  Those three functions are: dnode_next_offset(), 
dnode_hold_impl(), dmu_object_next().

So i'm guessing there's lots of files being created over NFS in one 
particular dataset?

We should figure out how many creates/second you are doing over NFS (i 
should have put a timeout on the script).  Here's a real simple one 
(from your snoop it looked like you're only doing NFSv3, so i'm not 
tracking NFSv4):
"
#!/usr/sbin/dtrace -s

rfs3_create:entry,
zfs_create:entry
{
         @creates[probefunc] = count();
}

tick-60s
{
         exit(0);
}
"


If i'm reading the code right and if you are just creating a lot of 
files in just one dataset, perhaps you could create another dataset or 
two in place of just the one - not sure if that is possible for you.  It 
would be interesting data to see if this alleviates pain.

> 
> Profiling interrupt: 23296 events in 60.046 seconds (388 events/sec)
> 
> Count genr cuml rcnt     nsec Hottest CPU+PIL        
> Caller                 
> -------------------------------------------------------------------------------
>  
> 
> 18290  79% ---- 0.00    21994 cpu[3]                 
> thread_start           18102  78% ---- 0.00    21988 
> cpu[3]                 idle                   18024  77% ---- 0.00    
> 21983 cpu[3]                 cpu_halt               4626  20% ---- 
> 0.00    25530 cpu[2]                 sys_syscall32          4626  20% 
> ---- 0.00    25530 cpu[2]                 nfssys                 4626  
> 20% ---- 0.00    25530 cpu[2]                 svc_do_run             
> 4624  20% ---- 0.00    25539 cpu[2]                 
> svc_run                4522  19% ---- 0.00    25642 
> cpu[2]                 svc_getreq             4502  19% ---- 0.00    
> 25662 cpu[2]                 rfs_dispatch           4495  19% ---- 
> 0.00    25660 cpu[2]                 common_dispatch        3951  17% 
> ---- 0.00    26143 cpu[2]                 rfs3_create            3942  
> 17% ---- 0.00    26156 cpu[2]                 fop_create             
> 3941  17% ---- 0.00    26154 cpu[2]                 
> zfs_create             3922  17% ---- 0.00    26150 
> cpu[2]                 zfs_mknode             3913  17% ---- 0.00    
> 26163 cpu[2]                 dmu_object_alloc       1845   8% ---- 
> 0.00    20288 cpu[0]                 mutex_vector_enter      970   4% 
> ---- 0.00    31591 cpu[2]                 dmu_object_next         917   
> 4% ---- 0.00    31673 cpu[2]                 dnode_next_offset      
>  899   4% ---- 0.00    30573 cpu[2]                 
> dnode_hold_impl         883   4% ---- 0.00    31639 
> cpu[2]                 dnode_next_offset_level
>  725   3% ---- 0.00    30896 cpu[2]                 
> dbuf_hold_impl          627   3% ---- 0.00    28908 
> cpu[2]                 mutex_enter             361   2% ---- 0.00    
> 31135 cpu[2]                 dmu_buf_rele            350   2% ---- 
> 0.00    29945 cpu[2]                 dbuf_hold               350   2% 
> ---- 0.00    31147 cpu[2]+6               mac_rx                  341   
> 1% ---- 0.00    31344 cpu[2]                 arc_buf_add_ref        
>  303   1% ---- 0.00    31154 cpu[2]+6               
> intr_thread             290   1% ---- 0.00    31074 
> cpu[2]+6               av_dispatch_autovect    259   1% ---- 0.00    
> 24534 cpu[1]                 zio_next_stage          257   1% ---- 
> 0.00    31007 cpu[2]                 arc_buf_remove_ref      244   1% 
> ---- 0.00    31165 cpu[2]+6               e1000g_intr             243   
> 1% ---- 0.00    31507 cpu[2]                 buf_hash               
>  218   1% ---- 0.00    25154 cpu[0]                 
> putnext                 186   1% ---- 0.00    21154 
> cpu[0]                 svc_sendreply           184   1% ---- 0.00    
> 21194 cpu[0]                 svc_cots_ksend          175   1% ---- 
> 0.00    31147 cpu[2]+6               aggr_recv_cb            172   1% 
> ---- 0.00    31151 cpu[2]+6               i_dls_link_ether_rx     159   
> 1% ---- 0.00    31155 cpu[2]+6               ip_input               
>  158   1% ---- 0.00    21275 cpu[0]                 
> put                     155   1% ---- 0.00    30127 
> cpu[2]                 dbuf_find               152   1% ---- 0.00    
> 21597 cpu[0]                 rmm_wput                147   1% ---- 
> 0.00    21688 cpu[0]                 mir_wput                141   1% 
> ---- 0.00    31153 cpu[2]+6               squeue_enter_chain      138   
> 1% ---- 0.00    26152 cpu[3]                 zio_next_stage_async   
>  137   1% ---- 0.00    31165 cpu[2]+6               
> tcp_rput_data           134   1% ---- 0.00    21688 
> cpu[0]                 tcp_wput                128   1% ---- 0.00    
> 21804 cpu[0]                 squeue_enter            123   1% ---- 
> 0.00    21659 cpu[0]                 tcp_output              104   0% 
> ---- 0.00    25306 cpu[1]                 zio_nowait              102   
> 0% ---- 0.00    30574 cpu[2]                 atomic_add_ptr_nv        
> 91   0% ---- 0.00    22605 cpu[0]                 tcp_send_data          
>   89   0% ---- 0.00    21472 cpu[1]                 
> nfs3_fhtovp              87   0% ---- 0.00    27427 
> cpu[2]                 rw_enter                 86   0% ---- 0.00    
> 21176 cpu[1]                 fsop_vget                84   0% ---- 
> 0.00    24945 cpu[1]                 zio_vdev_io_start        81   0% 
> ---- 0.00    21110 cpu[1]                 zfs_vget                 81   
> 0% ---- 0.00    25006 cpu[3]                 zio_wait_for_children    
> 78   0% ---- 0.00    31423 cpu[2]                 atomic_add_ptr         
>   74   0% ---- 0.00    31148 cpu[2]+11              
> rmm_rput                 73   0% ---- 0.00    22876 
> cpu[0]                 dls_tx                 
> -------------------------------------------------------------------------------
>  
> 
> 
> 
> NFS Stats:
> 
> [private:/tmp] root# nfsstat -s -v 3
> 
> Server rpc:
> Connection oriented:
> calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  
> dupreqs   154161041  0          0          0          0          
> 35669015   0         Connectionless:
> calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  
> dupreqs   0          0          0          0          0          
> 0          0        
> Server NFSv3:
> calls     badcalls 153760861 0        Version 3: (153482035 calls)
> null         getattr      setattr      lookup       access       
> readlink    1 0%         68265095 44% 2334522 1%   33186002 21% 9373979 
> 6%   4430 0%     read         write        create       mkdir        
> symlink      mknod       776093 0%    6496524 4%   6353313 4%   1883254 
> 1%   814 0%       2 0%        remove       rmdir        rename       
> link         readdir      readdirplus 12084296 7%  2692 0%      208683 
> 0%    5969808 3%   23751 0%     335079 0%   fsstat       fsinfo       
> pathconf     commit      6 0%         1 0%         2 0%         6183688 4% 
> Server nfs_acl:
> Version 3: (278383 calls)
> null        getacl      setacl      getxattrdir
> 0 0%        32060 11%   5 0%        246318 88%
> 
> Memory:
> 
> [private:/tmp] root# echo 'arc::print -d' | mdb -k
> {
>    anon = ARC_anon
>    mru = ARC_mru
>    mru_ghost = ARC_mru_ghost
>    mfu = ARC_mfu
>    mfu_ghost = ARC_mfu_ghost
>    size = 0t7337733120

6.8GB in the ARC out of a possible 16GB (i think thats how much you 
have) is pretty decent.

>    p = 0t3673915285
>    c = 0t7338496468
>    c_min = 0t533448320
>    c_max = 0t15996604416
>    hits = 0t6864679166
>    misses = 0t518711

A 99% cache hit ratio - impressive!

>    deleted = 0t295199
>    skipped = 0t1701
>    hash_elements = 0t500010
>    hash_elements_max = 0t511362
>    hash_collisions = 0t1820702
>    hash_chains = 0t149161
>    hash_chain_max = 0t10
>    no_grow = 0
> }
> [private:/tmp] root# echo '::kmastat' | mdb -k | sort -n -k 5 | tail -40
> zio_buf_10240              10240   1524   1642  16814080     46367     0
> zio_buf_3072                3072   4663   5604  17215488    354215     0
> streams_dblk_1936           2048     43   8418  17240064 159700970     0
> zio_buf_77824              77824    221    226  17588224      1464     0
> zio_buf_28672              28672    660    664  19038208      9233     0
> zio_buf_32768              32768    578    595  19496960     74273     0
> zio_buf_3584                3584   4492   5952  21331968    186767     0
> zio_buf_24576              24576    876    885  21749760     15378     0
> zio_buf_14336              14336   1008   1556  22306816   2101355     0
> kmem_va_8192                8192   2361   2752  22544384      4067     0
> kmem_alloc_128               128 161894 173042  22863872  41686966     0
> zio_buf_4096                4096   6002   6016  24641536    193926     0
> kmem_magazine_15             128 207407 212939  28135424    237076     0
> zio_buf_5120                5120   5031   6144  31457280    126567     0
> zio_buf_20480              20480   1557   1567  32092160     39694     0
> kmem_alloc_8                   8 4064087 4064240  32964608  15892691     0
> kmem_va_32768              32768    595   1028  33685504      1731     0
> kmem_va_24576              24576    885   1560  40894464      2278     0
> kmem_alloc_112               112 289900 366192  41664512   7743803     0
> kmem_alloc_40                 40 806400 1048077  42504192  14060780     0
> kmem_alloc_256               256 187308 191970  52420608  33326285     0
> arc_buf_hdr_t                128 500219 512089  67661824  14104876     0
> kmem_slab_cache               56 1239939 1250640  71147520   1612006     0
> kmem_va_12288              12288   4609   5690  74579968      8137     0
> zfs_znode_cache              192 544134 545643 106426368   7771071     0

ok, cool, the 'znode_cache' is not a problem (we're actually using most 
of the buffers allocated to the cache).  That's good news.

> kmem_va_28672              28672   2651   3292 107872256      6270     0
> rootnex_dmahdl              2504  42668  42832 109649920  32118262     0
> kmem_io_64G_4096            4096  32768  32768 134217728     32768     0
> vn_cache                     240 552163 552255 150802432    634861     0
> kmem_bufctl_cache             24 7085442 7085568 172752896   8534442     0
> kmem_va_20480              20480   9939  11568 252706816     14227     0
> zio_buf_512                  512 578348 604080 309288960  29468142     0
> dmu_buf_impl_t               328 874033 908160 309985280  21869642     0

Same thing for the 'dbuf_cache'.

> kmem_alloc_96                 96 3879869 3879918 378384384  21906309     0
> zio_buf_65536              65536   7269   7273 476643328   1368534     0
> kmem_va_4096                4096 1178724 1178752 533200896   1485372     0
> dnode_t                      640 4641005 4641006 3168260096   8165695     0

And the 'dnode_cache'.

> zio_buf_131072            131072  25251  25256 3310354432    245308     0
> zio_buf_16384              16384 214643 214667 3517104128    747095     0
> kmem_va_16384              16384 214783 217944 3570794496    375400     0
> ZPool (I'm omit'ing zfs list output because the filesystems are named by 
> customer numbers):

Ooops, i think the 'zpool list' command got cut out.  Mostly curious at 
what percentage of capacity you're at.

> 
> [private:/tmp] root# zpool status
>  pool: joyous
> state: ONLINE
> scrub: none requested
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        joyous      ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c5t4d0  ONLINE       0     0     0
>            c4t4d0  ONLINE       0     0     0
>            c7t4d0  ONLINE       0     0     0
>            c6t4d0  ONLINE       0     0     0
>            c1t4d0  ONLINE       0     0     0
>            c0t4d0  ONLINE       0     0     0
>            c4t0d0  ONLINE       0     0     0
>            c7t0d0  ONLINE       0     0     0
>            c6t0d0  ONLINE       0     0     0
>            c1t0d0  ONLINE       0     0     0
>            c0t0d0  ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c5t5d0  ONLINE       0     0     0
>            c4t5d0  ONLINE       0     0     0
>            c7t5d0  ONLINE       0     0     0
>            c6t5d0  ONLINE       0     0     0
>            c1t5d0  ONLINE       0     0     0
>            c0t5d0  ONLINE       0     0     0
>            c4t1d0  ONLINE       0     0     0
>            c7t1d0  ONLINE       0     0     0
>            c6t1d0  ONLINE       0     0     0
>            c1t1d0  ONLINE       0     0     0
>            c0t1d0  ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c5t6d0  ONLINE       0     0     0
>            c4t6d0  ONLINE       0     0     0
>            c7t6d0  ONLINE       0     0     0
>            c6t6d0  ONLINE       0     0     0
>            c1t6d0  ONLINE       0     0     0
>            c0t6d0  ONLINE       0     0     0
>            c4t2d0  ONLINE       0     0     0
>            c7t2d0  ONLINE       0     0     0
>            c6t2d0  ONLINE       0     0     0
>            c1t2d0  ONLINE       0     0     0
>            c0t2d0  ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c5t7d0  ONLINE       0     0     0
>            c4t7d0  ONLINE       0     0     0
>            c7t7d0  ONLINE       0     0     0
>            c6t7d0  ONLINE       0     0     0
>            c1t7d0  ONLINE       0     0     0
>            c0t7d0  ONLINE       0     0     0
>            c4t3d0  ONLINE       0     0     0
>            c7t3d0  ONLINE       0     0     0
>            c6t3d0  ONLINE       0     0     0
>            c1t3d0  ONLINE       0     0     0
>            c0t3d0  ONLINE       0     0     0
>        spares
>          c5t1d0    AVAIL           c5t2d0    AVAIL           c5t3d0    
> AVAIL 
> errors: No known data errors
> 
> 
> ZFS_Sync Dtrace:
> 
> [private:/tmp] root# ./zfs_sync.d
>   txg     time     sync-write     async-write       sync-read     
> async-read
>      0  5206ms.    0/s   0MB/s;    0/s   0MB/s;   10/s   0MB/s;    0/s   
> 0MB/s
> 2234681   281ms.  156/s   0MB/s; 32249/s  29MB/s;   14/s   0MB/s;    
> 0/s   0MB/s
> 2234682   249ms.  176/s   0MB/s; 23216/s  23MB/s;   40/s   2MB/s;    
> 0/s   0MB/s
>      0  5840ms.    0/s   0MB/s;    0/s   0MB/s;   24/s   1MB/s;    0/s   
> 0MB/s
>      0  6239ms.    0/s   0MB/s;    0/s   0MB/s;   67/s   4MB/s;    0/s   
> 0MB/s
> 2234683   427ms.  103/s   0MB/s; 22346/s  24MB/s;  117/s   7MB/s;    
> 0/s   0MB/s
>      0  6729ms.    0/s   0MB/s;    0/s   0MB/s;    6/s   0MB/s;    0/s   
> 0MB/s
> 2234684   228ms.  192/s   0MB/s; 24052/s  23MB/s;   17/s   1MB/s;    
> 0/s   0MB/s
>      0  4770ms.    0/s   0MB/s;    0/s   0MB/s;    7/s   0MB/s;    0/s   
> 0MB/s
> 2234685   212ms.  207/s   0MB/s; 25382/s  20MB/s;    0/s   0MB/s;    
> 0/s   0MB/s
>      0  4787ms.    0/s   0MB/s;    0/s   0MB/s;   31/s   1MB/s;    0/s   
> 0MB/s
> 2234686   215ms.  200/s   0MB/s; 19544/s  17MB/s;   18/s   1MB/s;    
> 0/s   0MB/s
> 2234687   197ms.  223/s   0MB/s; 25253/s  23MB/s;   15/s   0MB/s;    
> 0/s   0MB/s
>      0  8010ms.    0/s   0MB/s;    0/s   0MB/s;    3/s   0MB/s;    0/s   
> 0MB/s
>      0  5611ms.    0/s   0MB/s;    0/s   0MB/s;    4/s   0MB/s;    0/s   
> 0MB/s
> 2234688   255ms.  172/s   0MB/s; 23250/s  21MB/s;   19/s   1MB/s;    
> 0/s   0MB/s
>      0  5357ms.    0/s   0MB/s;    0/s   0MB/s;    8/s   0MB/s;    0/s   
> 0MB/s
> 2234689   249ms.  176/s   0MB/s; 21678/s  27MB/s;  100/s   6MB/s;    
> 0/s   0MB/s
>      0  6465ms.    0/s   0MB/s;    0/s   0MB/s;    1/s   0MB/s;    0/s   
> 0MB/s
> 2234690   240ms.  183/s   0MB/s; 20162/s  16MB/s;   25/s   1MB/s;    
> 0/s   0MB/s

Ok, the txgs look really good.  Its taking them about 1/4 of a second to 
push out all the dirty data.  We like txgs to take 5 seconds or under.

In summary, at this data collection point everything looked good except 
dmu_object_alloc().

If you can grab the same lockstat data as before ('lockstat -kgIW -D 30 
sleep 60' and 'lockstat -P -D 30 sleep 60') at the sametime 'vmstat 1' 
shows no idle CPU, that would be great as we could verify.  Perhaps the 
sleep times for the lockstat collection need to be bumped, or a couple 
successive runs grabbed, as i believe you were seeing spikey CPU usage - 
so we might have grabbed data in between the spikes.

eric

> 
> 
> And finally, 30 seconds of nicstat:
> 
>    Time   Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs   
> %Util     Sat
> 21:00:58 aggr0/1   130.3   136.2   657.5   629.0   203.0   221.8    
> 0.22    0.00
> 21:00:58 e1000g1/0   19.00   17.87   112.6   103.4   172.7   177.0    
> 0.03    0.00
> 21:00:58 aggr1   130.3   136.2   657.4   629.0   203.0   221.8    
> 0.22    0.00
> 21:00:58 e1000g3/0   51.48   27.27   272.0   156.1   193.8   178.9    
> 0.06    0.00
> 21:00:58 e1000g0    0.16    0.47    2.39    3.99   67.50   120.6    
> 0.00    0.00
> 21:00:58 e1000g2/0   59.83   91.11   272.8   369.6   224.6   252.4    
> 0.12    0.00
> 21:00:58 e1000g0/0    0.16    0.47    2.39    3.99   67.50   120.5    
> 0.00    0.00
>    Time   Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs   
> %Util     Sat
> 21:01:08 aggr0/1  1158.0  1213.2  6497.7  5838.5   182.5   212.8    
> 1.94    0.00
> 21:01:08 e1000g1/0   506.0   158.4  2851.2   368.6   181.7   440.0    
> 0.54    0.00
> 21:01:08 aggr1  1158.0  1213.1  6497.3  5838.0   182.5   212.8    
> 1.94    0.00
> 21:01:08 e1000g3/0   567.9   31.65  3296.7   175.4   176.4   184.8    
> 0.49    0.00
> 21:01:08 e1000g0    0.12    0.42    1.99    3.49   64.00   123.4    
> 0.00    0.00
> 21:01:08 e1000g2/0   83.36  1023.5   345.1  5296.2   247.4   197.9    
> 0.91    0.00
> 21:01:08 e1000g0/0    0.12    0.42    1.99    3.49   64.00   123.4    
> 0.00    0.00
>    Time   Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs   
> %Util     Sat
> 21:01:18 aggr0/1   762.1   876.4  4133.2  3775.1   188.8   237.7    
> 1.34    0.00
> 21:01:18 e1000g1/0   492.2   171.6  2733.4   671.8   184.4   261.6    
> 0.54    0.00
> 21:01:18 aggr1   762.1   876.5  4133.6  3775.6   188.8   237.7    
> 1.34    0.00
> 21:01:18 e1000g3/0   191.2   25.95  1081.8   141.3   181.0   188.1    
> 0.18    0.00
> 21:01:18 e1000g0    0.14    0.41    2.19    3.39   64.00   123.9    
> 0.00    0.00
> 21:01:18 e1000g2/0   79.32   678.5   322.3  2960.3   252.0   234.7    
> 0.62    0.00
> 21:01:18 e1000g0/0    0.14    0.41    2.19    3.39   64.00   123.9    
> 0.00    0.00
> 
> 
> 
> We know that there are things that can be improved in our network and 
> we're rushing to fix those things, but more and more I've become less 
> certain that this is a network issue at its heart.
> 
> Again, all the help is greatly appreciated.
> 
> benr.
> 
>

[nfs-discuss] Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

Reply via email to