I've started working with Adam Leventhal and Jon Haslam on 6325485 ("A
stdev() aggregator would be a nice adjunct to avg()").  Here's the
proposal, comments are welcome.

Chad

SUMMARY

        This fast-track enhances the DTrace utility to address an
        existing RFE[1] requesting an aggregating function to calculate
        standard deviation, similar to the current aggregating function
        for average.

        The new function is a committed interface; this case seeks patch
        release binding.

DETAILS

    Overview

        Currently, the DTrace utility includes an aggregating function
        to calculate the average of a set of numbers but does not
        provide the same to calculate standard deviation.  Because this
        would be a useful for statistical analysis, we plan to
        introduce an aggregating function to calculate standard
        deviation.

        We plan to use the following approximation to standard deviation:

        sqrt(average(x^2) - average(x)^2)

        It is recognised that this is an imprecise approximation to
        standard deviation, but it is calculable as an aggregation, and
        it should be sufficient for most of the purposes to which
        DTrace is put.  The approximation and its imprecision should be
        noted in documentation for DTrace.

        The planned implementation involves storing three 64-bit
        values:  the total count, the sum of x, and the sum of x^2.
        These values will be post-processed in user-land to present the
        standard deviation.  (This is similar to the implementation of
        the avg() aggregating function, which stores the total count
        and the sum of x.)  Storing the sum of x^2 presents a very real
        possibility of integer overflow.  We plan to store the sum of
        x^2 as a 128-bit value in two unsigned 64-bit integers.  This
        will require implementing 128-bit addition and multiplication
        to support this.  This will also involve implementing an
        arbitrary-precision square root function in user-land to handle
        those cases in which a long double is insufficient.

EXAMPLE

        This is an example D script demonstrating the use of the stddev()
        aggregating function:

#pragma D option quiet

syscall::exece:entry,
syscall::exec:entry
{
        self->ts = timestamp;
}

syscall::exece:return,
syscall::exec:return
/ self->ts /
{
        t = timestamp - self->ts;
        @foo[probefunc] = avg(t);
        @bar[probefunc] = stddev(t);
        @baz[probefunc] = quantize(t);

        self->ts = 0;
}

END
{
        printf("AVERAGE:");
        printa(@foo);
        printf("\nSTDDEV:");
        printa(@bar);
        printf("\n");
        printa(@baz);
}

        With sample output as follows:

# ./stddev.d
^C
AVERAGE:
  exece                                                        567257

STDDEV:
  exece                                                        158867


  exece
           value  ------------- Distribution ------------- count
          131072 |                                         0
          262144 |@@@@@@@@@@@@@@@@@@@                      128
          524288 |@@@@@@@@@@@@@@@@@@@@@                    144
         1048576 |                                         0


#

REFERENCES

[1] A stdev() aggregator would be a nice adjunct to avg()
    (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6325485)
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to