Ben Rockwood wrote: > I've got a Thumper doing nothing but serving NFS. Its using B43 with > zil_disabled. The system is being consumed in waves, but by what I > don't know. Notice vmstat:
We made several performance fixes in the NFS/ZFS area in recent builds, so if possible it would be great to upgrade you from snv_43. That said, there might be something else going on that we haven't accounted for. ... > > Step 1 was to disable any ZFS features that might consume large amounts of > CPU: > > # zfs set compression=off joyous > # zfs set atime=off joyous > # zfs set checksum=off joyous In our performance testing, we haven't found checksums to be anywhere near a large consumer of CPU, so i would recommend leaving that on (due to its benefits). I suspect your apps/clients don't depend on atime, so i think its a good idea to turn that off. We've gotten better NFS performance with this off. More of a heads up as it sounds like compression on/off isn't your problem. If you are not getting good I/O BW with compression turned on, its most likely due to: 6460622 zio_nowait() doesn't live up to its name As Jim, mentioned, using lockstat to figure out where your CPU is being spent is the first step. I've been using 'lockstat -kgIW -D 60 sleep 60'. That collects data for the top 60 callers for a 1 minute period. If you see 'mutex_enter' high up in the results, then we have at least mutex lock contention. ... > Interestingly, using prstat -mL to monitor thread latency, we see that a > handful of threads are the culprates for consuming mass CPU: > > [private:/tmp] root# prstat -mL > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID > 22643 daemon 0.0 75 0.0 0.0 0.0 0.0 25 0.0 416 1 0 0 nfsd/1506 > 22643 daemon 0.0 75 0.0 0.0 0.0 0.0 25 0.0 415 0 0 0 nfsd/1563 > 22643 daemon 0.0 74 0.0 0.0 0.0 0.0 26 0.0 417 0 0 0 nfsd/1554 > 22643 daemon 0.0 74 0.0 0.0 0.0 0.0 26 0.0 419 0 0 0 nfsd/1551 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 26 74 418 0 0 0 nfsd/1553 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417 0 0 0 nfsd/1536 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417 0 0 0 nfsd/1555 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418 0 0 0 nfsd/1539 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417 0 0 0 nfsd/1562 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418 0 0 0 nfsd/1545 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417 0 0 0 nfsd/1559 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 419 1 0 0 nfsd/1541 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417 0 0 0 nfsd/1546 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417 0 0 0 nfsd/1543 > 22643 daemon 0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418 0 0 0 nfsd/1560 > Total: 33 processes, 218 lwps, load averages: 4.64, 6.20, 5.86 The high SYS times being charged to the userland nfsd threads is representative of what the kernel threads are doing (which is most likely going to be NFS and ZFS). > Running zvop_times.d (http://blogs.sun.com/erickustarz/resource/zvop_times.d) > I get an idea of what ZFS is doing: > > [private:/] root# dtrace -s zvop_times.d > dtrace: script 'zvop_times.d' matched 66 probes > ^C > CPU ID FUNCTION:NAME > 1 2 :END > ZFS COUNT > > zfs_getsecattr 4 > zfs_space 70 > zfs_rename 111 > zfs_readdir 284 > zfs_read 367 > zfs_mkdir 670 > zfs_setattr 1054 > zfs_frlock 1562 > zfs_putpage 3916 > zfs_write 4110 > zfs_create 4178 > zfs_remove 7794 > zfs_fid 14960 > zfs_inactive 17637 > zfs_access 20809 > zfs_fsync 29668 > zfs_lookup 31273 > zfs_getattr 175457 > > > > ZFS AVG TIME > > zfs_fsync 2337 > zfs_getattr 2735 > zfs_access 2774 > zfs_fid 2948 > zfs_inactive 4081 > zfs_lookup 6067 > zfs_frlock 6750 > zfs_putpage 6840 > zfs_mkdir 9259 > zfs_getsecattr 10940 > zfs_setattr 15364 > zfs_space 25214 > zfs_readdir 25726 > zfs_remove 26117 > zfs_write 31942 > zfs_rename 60504 > zfs_read 144449 > zfs_create 71215587 An average of 71 msecs per create seems high. Doing a quantize instead of just an average on zfs_create() would be interesting (should have included that in my script)... ... > Locks... > > [private:/tmp] root# lockstat -P sleep 10 > > Adaptive mutex spin: 20406 events in 10.026 seconds (2035 events/sec) > > Count indv cuml rcnt spin Lock Caller > ------------------------------------------------------------------------------- > 32 100% 100% 0.00 59963092 0xffffffffb282a1c8 dmu_object_alloc+0x75 \ That's interesting. This is build snv_43, right? Doesn't exactly line up with current bits, but i would guess this is " mutex_enter(&osi->os_obj_lock);". Can you show us the output for a longer sampling period, such as 'lockstat -P -D 30 sleep 60'? The "Adaptive mutex spin" category is usually the most/only interesting one in my experience. I would be curious to know your memory usage. What is the physical memory on the machine? And can you run: # echo 'arc::print -d' | mdb -k # echo '::kmastat' | mdb -k | sort -n -k 5 | tail -40 Matt has a nice little dscript that figures out how long txgs are taking (i'll attach that). I believe it will work with old bits (i defined "OLD" the for that to work). Can you try running that? Can you run 'zpool list' and 'zpool status'? I'd like to know how full your pool is. Let's start with that info, eric -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: zfs_sync.d URL: <http://mail.opensolaris.org/pipermail/nfs-discuss/attachments/20061207/9b11f986/attachment.ksh>