Dan Price pointed out the confusing wording in the "The planned
implementation invovles" paragraph. Here's a better version:
SUMMARY
This fast-track enhances the DTrace utility to address an
existing RFE[1] requesting an aggregating function to calculate
standard deviation, similar to the current aggregating function
for average.
The new function is a committed interface; this case seeks patch
release binding.
DETAILS
Overview
Currently, the DTrace utility includes an aggregating function
to calculate the average of a set of numbers but does not
provide the same to calculate standard deviation. Because this
would be a useful for statistical analysis, we plan to
introduce an aggregating function to calculate standard
deviation.
We plan to use the following approximation to standard deviation:
sqrt(average(x^2) - average(x)^2)
It is recognised that this is an imprecise approximation to
standard deviation, but it is calculable as an aggregation, and
it should be sufficient for most of the purposes to which
DTrace is put. The approximation and its imprecision should be
noted in documentation for DTrace.
The planned implementation involves storing two 64-bit values
and one 128-bit value: the total count, the sum of x, and the
sum of x^2. These values will be post-processed in user-land
to present the standard deviation. (This is similar to the
implementation of the avg() aggregating function, which stores
the total count and the sum of x.) Note that storing the sum
of x^2 would present the possibility of integer overflow. We
plan to store the sum of x^2 as a 128-bit value in two unsigned
64-bit integers. This will require implementing 128-bit
addition and multiplication to support this. This will also
involve implementing an arbitrary-precision square root
function in user-land to handle those cases in which a long
double is insufficient.
EXAMPLE
This is an example D script demonstrating the use of the stddev()
aggregating function:
#pragma D option quiet
syscall::exece:entry,
syscall::exec:entry
{
self->ts = timestamp;
}
syscall::exece:return,
syscall::exec:return
/ self->ts /
{
t = timestamp - self->ts;
@foo[probefunc] = avg(t);
@bar[probefunc] = stddev(t);
@baz[probefunc] = quantize(t);
self->ts = 0;
}
END
{
printf("AVERAGE:");
printa(@foo);
printf("\nSTDDEV:");
printa(@bar);
printf("\n");
printa(@baz);
}
With sample output as follows:
# ./stddev.d
^C
AVERAGE:
exece 567257
STDDEV:
exece 158867
exece
value ------------- Distribution ------------- count
131072 | 0
262144 |@@@@@@@@@@@@@@@@@@@ 128
524288 |@@@@@@@@@@@@@@@@@@@@@ 144
1048576 | 0
#
REFERENCES
[1] A stdev() aggregator would be a nice adjunct to avg()
(http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6325485)
_______________________________________________
dtrace-discuss mailing list
[email protected]