Before I talk about performance monitoring units (PMUs) and KVM, let
me sketch PMUs and the software we have to put them to use.  You may
wish to skip to the next occurence of "KVM".


Modern processors sport PMUs in various forms and shapes.  The
simplest form is a couple of performance counters, each of which can
be programmed to count a certain number of a certain event, then
interrupt.  Supported events depend on the PMU, and typically include
cycles spent executing instructions, cache misses, stall cycles and
such.  Precision varies; there's undercounting, and the interrupt can
occur pretty far from the instruction that triggered the event.
Smarter hardware exists that can record samples in a buffer, and only
interrupts when that buffer fills up.

Our existing tool for putting the PMUs to use is OProfile.  OProfile
is a low-overhead, transparent (no instrumentation), system-wide
profiler.

OProfile consists of a kernel part that messes with the hardware and
delivers samples to user space, a daemon that records these samples,
and utilities for control and analysis.

OProfile is a very useful tool, but it has its limitations.  It uses
PMUs only in their simple form.  Users also want to monitor single
threads instead of the whole system, write applications that monitor
selected parts of themselves and more.  Perfmon2 attempts to provide a
generic interface to all the various PMUs that can support all that.
But it's a big, complex hairball out of tree, and merging it will take
time and hard work.


So, what does all this have to do with virtualization in general and
KVM in particular?

As I explained above, use of the PMU beyond what OProfile can do is
quite a hairball.  Adding virtualization to it can only make it
hairier.  I feel that hairball needs to be untangled elsewhere before
we touch it.  That leaves system-wide profiling.

System-wide profiling comes with two competing definitions of system:
virtual and real.  Both are useful.  And both need work.


System-wide profiling of the *virtual* machine is related to profiling
just a process.  That's hard.  I guess building on Perfmon2 would make
sense there, but as long as it's out of tree...  Can we wait for it?
If not, what then?


System-wide profiling of the *real* machine we already have: OProfile.
The fact that we're running guests guests doesn't disturb it.
However, presence of guests makes it harder to interpret samples: we
need to map code address to program/DSO.  The information necessary
for that lives in the guest kernel.

An obvious way to join the sample with the information is delegating
the recording of samples to the guest.  Note, however, that you then
need to set up the recording of samples in each guest in addition to
the host, which is inconvenient.


Such a real-system-wide profiler already exists for Xen: Xenoprof,
which is a patch to OProfile.  Here's how it works.

Xenoprof splits OProfile's kernel part: the hardware part moves into
the hypervisor, while the deliver-to-user-space part remains in the
kernel.  Kernel and hypervisor talk through hypercalls, shared memory
and virtual interrupts.  Instead of the driver for the real PMU, the
kernel uses a special Xenoprof driver that talks to the hypervisor.
The hypervisor accepts commands controlling the PMU only from the
privileged guest (dom0).

Xen guests (domains in Xen parlance) running Xenoprof are called
active: they receive their samples from the hypervisor and record
them.  Domains not running Xenoprof are called passive, and the
hypervisor routes their samples to dom0 for recording.  Dom0 can make
sense of passive domain's Linux kernel space samples, if given
suitable kernel symbols, but can't make sense of passive user space.

Active Xenoprof is useful because it gets you the most data.  It's
also quite a rain dance to use: starting and stopping profiling takes
several steps spread over all active domains, and if you misstep,
things tend to fail in the most confusing way imaginable.  More robust
error handling and better automation should help there.

Passive Xenoprof is useful because it works even when a domain can't
cooperate (no Xenoprof), or you can't be bothered to start OProfile
there.


The same ideas should work for KVM.  The whole hypervisor headache
just evaporates, of course.  What remains is the host kernel routing
samples to active guests (over virtio, I guess), and guests kernels
receiving samples from there instead of the hardware PMU.  In other
words, the sample channel from the host becomes our virtual PMU for
the guest.  Which needs a driver for it.  It's a weird PMU, because
you can't program its performance counters.  That's left to the host.

How much of Xenoprof's kernel code we could use I don't know.  A
common user space should be quite feasible.


So, what do you think?  Is this worthwhile?  Other ideas?

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to