Hello,

I've been trying to write a DTrace script to do some latency analysis inside
Illumos's kmem code.  Sadly, I'm running into an issue where DTrace seems to
*sometime* fail to output anything other than an empty line when I printa()
my latency aggregate.  I tried to simplify the script as much as possible,
but I still see the problem.

Attached are both the full-fledged and the simplified scripts.

I hope that I'm just missing something silly.  It misbehaves like this on
several (new-ish) versions of illumos-gate.

Anyway, I run the script simply as "dtrace -s kmem-reap-auto-minimal.d"
nothing fancy anymore.  (The full script uses the pre-processor, "this",
printf, fancy arithmetic, llquantize, arguments, and two aggregates.)

To generate reaping, I run two dd's (see below) on a system with 16GB RAM
and these datasets:

# zfs list -r -o name,mountpoint,recordsize storage/test
NAME              MOUNTPOINT         RECSIZE
storage/test      /storage/test         128K
storage/test/16k  /storage/test/16k      16K
storage/test/64k  /storage/test/64k      64K

# cd /storage/test
# ls -lh inp
-rw-r--r-- 1 root root 16G Dec  8 15:49 inp
# dd if=inp of=16k/file bs=1M ; dd if=inp of=64k/file bs=1M

This tries to write out 32GB of data (total) which will obviously not fit
in RAM.  And because of how ZFS behaves, it will cause the ARC to reap the
ZFS related caches.

Since the kernel's reaping code is single threaded, there shouldn't be any
racing threads clearing the (global) @lat aggregate.

Now, what do I see exactly when I run the minimal script?

Eventually, while running my dd(1) test, there is a reap that takes more
than 1ms.  This causes the first of the two kmem_depot_ws_reap:return probes
to execute.  I see the trace line dumping the latency indicating that it was
over the threshold and then a newline which comes from the printa(@lat).  In
a previous version of this minimal script, I had a counter which told me how
many times kmem_slab_destroy:return fired - it's always *many* times when
the latency is over the threshold.  Here's sample output:

...
  2  11151         kmem_depot_ws_reap:entry   start                            
  2  11152        kmem_depot_ws_reap:return             19599

  2  11151         kmem_depot_ws_reap:entry   start                            
  2  11152        kmem_depot_ws_reap:return              1654

  2  11151         kmem_depot_ws_reap:entry   start                            
  2  11152        kmem_depot_ws_reap:return              3537

  2  11151         kmem_depot_ws_reap:entry   start                            
  2  11152        kmem_depot_ws_reap:return              1652

  2  11151         kmem_depot_ws_reap:entry   start                            
  2  11152        kmem_depot_ws_reap:return          84775031

<terminates due to exit(0)>


And for completeness, here's what I see with the full script:

# dtrace -s kmem-reap-auto.d -C 
dtrace: script 'kmem-reap-auto.d' matched 21 probes
CPU     ID                    FUNCTION:NAME
  6      1                           :BEGIN Started.

  2  11158        kmem_depot_ws_reap:return zio_buf_512

  2  11158        kmem_depot_ws_reap:return 2014 Dec 17 11:33:59 
zio_data_buf_512 7736915 ns, 279 mags, 940 objs, 103 slabs, 103 TLB invals, 103 
xcalls
reap 7736915ns, mags 6773367ns, objs 1743102ns, slabs 2015422ns, TLB invals 
767839ns, xcalls 297358ns


  zio_data_buf_512 (us)                             
           value  ------------- Distribution ------------- count    
            7000 |                                         0        
            7500 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
            8000 |                                         0        

<terminates due to exit(0)>


There should be a per-function histogram between the two lines of summary
and the per cache histogram.  Based on the two lines, there were plenty of
function calls which should have added data to @lat yet nothing is printed.

Any ideas why I'm seeing this odd behavior?  Keep in mind it's
non-deterministic.

Thanks,

Jeff.

-- 
The obvious mathematical breakthrough would be development of an easy way to
factor large prime numbers.
                - Bill Gates, The Road Ahead, pg. 265

#!/usr/sbin/dtrace -s

fbt::kmem_depot_ws_reap:entry
{
        self->reap_tq = timestamp;
        trace("start");
}

fbt::kmem_slab_destroy:entry
/self->reap_tq/
{
        self->start = timestamp;
}
fbt::kmem_slab_destroy:return
/self->reap_tq && self->start/
{
        @lat = quantize(timestamp - self->start);
        self->start = 0;
}


fbt::kmem_depot_ws_reap:return
/self->reap_tq && ((timestamp - self->reap_tq) >= 1000000)/
{
        trace(timestamp - self->reap_tq);

        printa(@lat);

        trunc(@lat);

        self->reap_tq = 0;

        exit(0); /* debug */
}

/* end of reap - clear all the per/cache stats */
fbt::kmem_depot_ws_reap:return
/self->reap_tq/
{
        trace(timestamp - self->reap_tq);

        printa(@lat);

        trunc(@lat);

        self->reap_tq = 0;
}
#!/usr/sbin/dtrace -sCq

/* minimum latency we care about in ns */
#define LIMIT   1000000

/* llquantize usecs given nsec latency */
#define Q(x)    llquantize((x) / 1000, 10, 0, 6, 20)
#define UNITS_SUFFIX    " (us)"

/* measure function latency */
#define FBT_LAT(fxn, counter, run)                      \
fbt::kmem_depot_ws_reap:entry                           \
/self->reap_tq/                                         \
{                                                       \
        self->counter = 0;                              \
        self->run = 0ull;                               \
}                                                       \
fbt::fxn:entry                                          \
/self->reap_tq/                                         \
{                                                       \
        self->counter++;                                \
        self->start[probefunc] = timestamp;             \
}                                                       \
fbt::fxn:return                                         \
/self->reap_tq && self->start[probefunc]/               \
{                                                       \
        this->latency = timestamp - self->start[probefunc];     \
        self->start[probefunc] = 0;                     \
                                                        \
        self->run += this->latency;                     \
        @lat[strjoin(probefunc, UNITS_SUFFIX)] = Q(this->latency);      \
}

BEGIN
{
        printf("Started.\n");
        already_reaping = 0;
}

/*
 * gather stats of what's happening on the kmem reap taskq
 */
fbt::kmem_depot_ws_reap:entry
/already_reaping/
{
        stack();
        exit(1);
}
fbt::kmem_depot_ws_reap:entry
/strstr(args[0]->cache_name, "zio_") != NULL/
{
        already_reaping = 1;
        self->reap_tq = timestamp;
        self->kmemcache = args[0];
}

/* count magazines freed */
FBT_LAT(kmem_magazine_destroy, magcount, magrun)

/* count objects returned to slabs */
FBT_LAT(kmem_slab_free, objcount, objrun)

/* count slabs freed */
FBT_LAT(kmem_slab_destroy, slabcount, slabrun)

/* count tlb invals */
FBT_LAT(hat_tlb_inval, tlb_invals, tlb_invalrun)

/* count xcalls issued */
FBT_LAT(xc_common, xcalls, xcallrun)

/*
 * end of reap - stash away the reap latency for this cache, and print out
 * stats for this reap
 */
fbt::kmem_depot_ws_reap:return
/self->reap_tq && ((this->latency = timestamp - self->reap_tq) >= LIMIT)/
{
        /*
         * Results:
         *
         * reap latency:                     this->latency
         * mags processed:                   self->magcount
         * objects processed:                self->objcount
         * slabs proccessed:                 self->slabcount
         * TLB invals issued:                self->tlb_invals
         * xcalls issued:                    self->xcalls
         * mag destroy runtime:              self->magrun
         * objects processing runtime:       self->objrun
         * slab destroying runtime:          self->slabrun
         * TLB invals runtime:               self->tlb_invalrun
         * xcall runtime:                    self->xcallrun
         * magazine destroy latencies:       @lat["kmem_magazine_destroy"]
         * object return to slab latencies:  @lat["kmem_slab_free"]
         * slab free latencies:              @lat["kmem_slab_destroy"]
         * TLB inval latencies:              @lat["hat_tlb_inval"]
         * xcall latencies:                  @lat["xc_common"]
         */

        printf("%Y %s %u ns, %u mags, %u objs, %u slabs, %u TLB invals, %u 
xcalls\n",
               walltimestamp, self->kmemcache->cache_name, this->latency,
               self->magcount, self->objcount, self->slabcount,
               self->tlb_invals, self->xcalls);
        printf("reap %luns, mags %lluns, objs %lluns, slabs %lluns, TLB invals 
%lluns, xcalls %lluns\n",
               this->latency, self->magrun, self->objrun, self->slabrun,
               self->tlb_invalrun, self->xcallrun);
        printa(@lat);

        @kmem_reaps[strjoin(self->kmemcache->cache_name, UNITS_SUFFIX)] = 
Q(this->latency);

        /* clear(@lat); */
        trunc(@lat);

        self->reap_tq = 0;
        self->kmemcache = 0;
        already_reaping = 0;

        printa(@kmem_reaps); /* debug */
        exit(0); /* debug */
}

/* end of reap - clear all the per/cache stats */
fbt::kmem_depot_ws_reap:return
/self->reap_tq/
{
        printf("%s\n", self->kmemcache->cache_name);

        /* clear(@lat); */
        trunc(@lat);

        self->reap_tq = 0;
        self->kmemcache = 0;
        already_reaping = 0;
}

tick-60sec
{
        printf("Periodic kmem_reap latency dump:\n");
        printa(@kmem_reaps);
        printf("done.\n");
}


-------------------------------------------
dtrace-discuss
Archives: https://www.listbox.com/member/archive/184261/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184261/25769126-e243886f
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769126&id_secret=25769126-8d47a7b2
Powered by Listbox: http://www.listbox.com

Reply via email to