How 'bout random sample request profiling? The Alpha processor used to do this (still does if you are using EV6 or later), called ProfileMe:
Alpha 21264A processors (and later) use a different method called "instruction sampling." PC sampling on out-of-order execution engines like the Alpha 21264 smears and skews sample data and profile information cannot be precisely attributed to specific instructions. Instruction sampling solves this problem by periodically selecting a specific instruction and collecting data about it as it flows through the processor pipeline. The program counter is known precisely as well as the execution history of the instruction. The problems of smear and skew are eliminated. Like PC sampling, the sampling period is randomized to get a statistically meaningful estimate of program behavior. [From http://h21007.www2.hp.com/portal/download/files/unprot/tru64/metrics.pdf, Section 1.1, third paragraph] One could randomly sample requests in a similar manner, each one "profiled" to document all the choices made leading to the result of the request. You can then allow a listener to capture that sample, and then that listener can collect a bunch, and see shift through the data to find out what is happening. It means adding branches on the fast path, or compiling two sets of routines, one collecting one not, to avoid any hits on the fast path. Once that is done, then clients can be built up to analyze the performance offline. Or not. -peter On Mon, Aug 1, 2011 at 1:12 AM, dormando <[email protected]> wrote: > > I owe all of you better tap documentation (the last couple of weeks > > have really killed me). It does some pretty great stuff in this area > > and has many practical uses. > > Now would be a great time to sell us on it, then :) > >
