Hello, I've been trying to write a DTrace script to do some latency analysis inside Illumos's kmem code. Sadly, I'm running into an issue where DTrace seems to *sometime* fail to output anything other than an empty line when I printa() my latency aggregate. I tried to simplify the script as much as possible, but I still see the problem.
Attached are both the full-fledged and the simplified scripts. I hope that I'm just missing something silly. It misbehaves like this on several (new-ish) versions of illumos-gate. Anyway, I run the script simply as "dtrace -s kmem-reap-auto-minimal.d" nothing fancy anymore. (The full script uses the pre-processor, "this", printf, fancy arithmetic, llquantize, arguments, and two aggregates.) To generate reaping, I run two dd's (see below) on a system with 16GB RAM and these datasets: # zfs list -r -o name,mountpoint,recordsize storage/test NAME MOUNTPOINT RECSIZE storage/test /storage/test 128K storage/test/16k /storage/test/16k 16K storage/test/64k /storage/test/64k 64K # cd /storage/test # ls -lh inp -rw-r--r-- 1 root root 16G Dec 8 15:49 inp # dd if=inp of=16k/file bs=1M ; dd if=inp of=64k/file bs=1M This tries to write out 32GB of data (total) which will obviously not fit in RAM. And because of how ZFS behaves, it will cause the ARC to reap the ZFS related caches. Since the kernel's reaping code is single threaded, there shouldn't be any racing threads clearing the (global) @lat aggregate. Now, what do I see exactly when I run the minimal script? Eventually, while running my dd(1) test, there is a reap that takes more than 1ms. This causes the first of the two kmem_depot_ws_reap:return probes to execute. I see the trace line dumping the latency indicating that it was over the threshold and then a newline which comes from the printa(@lat). In a previous version of this minimal script, I had a counter which told me how many times kmem_slab_destroy:return fired - it's always *many* times when the latency is over the threshold. Here's sample output: ... 2 11151 kmem_depot_ws_reap:entry start 2 11152 kmem_depot_ws_reap:return 19599 2 11151 kmem_depot_ws_reap:entry start 2 11152 kmem_depot_ws_reap:return 1654 2 11151 kmem_depot_ws_reap:entry start 2 11152 kmem_depot_ws_reap:return 3537 2 11151 kmem_depot_ws_reap:entry start 2 11152 kmem_depot_ws_reap:return 1652 2 11151 kmem_depot_ws_reap:entry start 2 11152 kmem_depot_ws_reap:return 84775031 <terminates due to exit(0)> And for completeness, here's what I see with the full script: # dtrace -s kmem-reap-auto.d -C dtrace: script 'kmem-reap-auto.d' matched 21 probes CPU ID FUNCTION:NAME 6 1 :BEGIN Started. 2 11158 kmem_depot_ws_reap:return zio_buf_512 2 11158 kmem_depot_ws_reap:return 2014 Dec 17 11:33:59 zio_data_buf_512 7736915 ns, 279 mags, 940 objs, 103 slabs, 103 TLB invals, 103 xcalls reap 7736915ns, mags 6773367ns, objs 1743102ns, slabs 2015422ns, TLB invals 767839ns, xcalls 297358ns zio_data_buf_512 (us) value ------------- Distribution ------------- count 7000 | 0 7500 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8000 | 0 <terminates due to exit(0)> There should be a per-function histogram between the two lines of summary and the per cache histogram. Based on the two lines, there were plenty of function calls which should have added data to @lat yet nothing is printed. Any ideas why I'm seeing this odd behavior? Keep in mind it's non-deterministic. Thanks, Jeff. -- The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers. - Bill Gates, The Road Ahead, pg. 265
#!/usr/sbin/dtrace -s fbt::kmem_depot_ws_reap:entry { self->reap_tq = timestamp; trace("start"); } fbt::kmem_slab_destroy:entry /self->reap_tq/ { self->start = timestamp; } fbt::kmem_slab_destroy:return /self->reap_tq && self->start/ { @lat = quantize(timestamp - self->start); self->start = 0; } fbt::kmem_depot_ws_reap:return /self->reap_tq && ((timestamp - self->reap_tq) >= 1000000)/ { trace(timestamp - self->reap_tq); printa(@lat); trunc(@lat); self->reap_tq = 0; exit(0); /* debug */ } /* end of reap - clear all the per/cache stats */ fbt::kmem_depot_ws_reap:return /self->reap_tq/ { trace(timestamp - self->reap_tq); printa(@lat); trunc(@lat); self->reap_tq = 0; }
#!/usr/sbin/dtrace -sCq /* minimum latency we care about in ns */ #define LIMIT 1000000 /* llquantize usecs given nsec latency */ #define Q(x) llquantize((x) / 1000, 10, 0, 6, 20) #define UNITS_SUFFIX " (us)" /* measure function latency */ #define FBT_LAT(fxn, counter, run) \ fbt::kmem_depot_ws_reap:entry \ /self->reap_tq/ \ { \ self->counter = 0; \ self->run = 0ull; \ } \ fbt::fxn:entry \ /self->reap_tq/ \ { \ self->counter++; \ self->start[probefunc] = timestamp; \ } \ fbt::fxn:return \ /self->reap_tq && self->start[probefunc]/ \ { \ this->latency = timestamp - self->start[probefunc]; \ self->start[probefunc] = 0; \ \ self->run += this->latency; \ @lat[strjoin(probefunc, UNITS_SUFFIX)] = Q(this->latency); \ } BEGIN { printf("Started.\n"); already_reaping = 0; } /* * gather stats of what's happening on the kmem reap taskq */ fbt::kmem_depot_ws_reap:entry /already_reaping/ { stack(); exit(1); } fbt::kmem_depot_ws_reap:entry /strstr(args[0]->cache_name, "zio_") != NULL/ { already_reaping = 1; self->reap_tq = timestamp; self->kmemcache = args[0]; } /* count magazines freed */ FBT_LAT(kmem_magazine_destroy, magcount, magrun) /* count objects returned to slabs */ FBT_LAT(kmem_slab_free, objcount, objrun) /* count slabs freed */ FBT_LAT(kmem_slab_destroy, slabcount, slabrun) /* count tlb invals */ FBT_LAT(hat_tlb_inval, tlb_invals, tlb_invalrun) /* count xcalls issued */ FBT_LAT(xc_common, xcalls, xcallrun) /* * end of reap - stash away the reap latency for this cache, and print out * stats for this reap */ fbt::kmem_depot_ws_reap:return /self->reap_tq && ((this->latency = timestamp - self->reap_tq) >= LIMIT)/ { /* * Results: * * reap latency: this->latency * mags processed: self->magcount * objects processed: self->objcount * slabs proccessed: self->slabcount * TLB invals issued: self->tlb_invals * xcalls issued: self->xcalls * mag destroy runtime: self->magrun * objects processing runtime: self->objrun * slab destroying runtime: self->slabrun * TLB invals runtime: self->tlb_invalrun * xcall runtime: self->xcallrun * magazine destroy latencies: @lat["kmem_magazine_destroy"] * object return to slab latencies: @lat["kmem_slab_free"] * slab free latencies: @lat["kmem_slab_destroy"] * TLB inval latencies: @lat["hat_tlb_inval"] * xcall latencies: @lat["xc_common"] */ printf("%Y %s %u ns, %u mags, %u objs, %u slabs, %u TLB invals, %u xcalls\n", walltimestamp, self->kmemcache->cache_name, this->latency, self->magcount, self->objcount, self->slabcount, self->tlb_invals, self->xcalls); printf("reap %luns, mags %lluns, objs %lluns, slabs %lluns, TLB invals %lluns, xcalls %lluns\n", this->latency, self->magrun, self->objrun, self->slabrun, self->tlb_invalrun, self->xcallrun); printa(@lat); @kmem_reaps[strjoin(self->kmemcache->cache_name, UNITS_SUFFIX)] = Q(this->latency); /* clear(@lat); */ trunc(@lat); self->reap_tq = 0; self->kmemcache = 0; already_reaping = 0; printa(@kmem_reaps); /* debug */ exit(0); /* debug */ } /* end of reap - clear all the per/cache stats */ fbt::kmem_depot_ws_reap:return /self->reap_tq/ { printf("%s\n", self->kmemcache->cache_name); /* clear(@lat); */ trunc(@lat); self->reap_tq = 0; self->kmemcache = 0; already_reaping = 0; } tick-60sec { printf("Periodic kmem_reap latency dump:\n"); printa(@kmem_reaps); printf("done.\n"); } ------------------------------------------- dtrace-discuss Archives: https://www.listbox.com/member/archive/184261/=now RSS Feed: https://www.listbox.com/member/archive/rss/184261/25769126-e243886f Modify Your Subscription: https://www.listbox.com/member/?member_id=25769126&id_secret=25769126-8d47a7b2 Powered by Listbox: http://www.listbox.com