Re: [kvm-devel] Performance monitoring units and KVM

Markus Armbruster Thu, 31 Jan 2008 07:42:46 -0800

[Note cc: Will, who knows much more about OProfile than I do]

Avi Kivity <[EMAIL PROTECTED]> writes:


> Markus Armbruster wrote:
>> Avi Kivity <[EMAIL PROTECTED]> writes:
>>
>>   
>>> Markus Armbruster wrote:
>>>     
>>>> System-wide profiling of the *virtual* machine is related to profiling
>>>> just a process.  That's hard.  I guess building on Perfmon2 would make
>>>> sense there, but as long as it's out of tree...  Can we wait for it?
>>>> If not, what then?
>>>>
>>>>         
>>> Give the guest access to the real PMU.  Save them on every exit
>>> (switching profiling off), and restore them on every entry (switching
>>> profiling on).  The only problem with this is that it is very cpu
>>> model dependent, losing the hardware independence that virtual
>>> machines have.  If you are satisfied with the architectural
>>> performance counters, then we even have hardware independence.
>>>     
>>
>> Saving and restoring the PMU is *slow* on most machines.  Especially
>> bad on machines where reading / writing a PMU register involves
>> serializing instructions.
>>
>>   
>
> That is true. It will increase vmexit latencies by several microseconds.
>
>> Want to try anyway?
>>   
>
> If we want to support unmodified oprofile/VTune, we have to. I can't
> judge how important it would be to users.

Neither can I, at this time.

We'd have to make sure that we pay only as we go: save / restore the
PMU only when and as far as it is in use.

Migration was discussed elsewhere in this thread.  Andi suggested to
provide a virtual architectural PMU instead of virtualizing the real
PMU.  That way we could migrate between dissimilar hardware.  But it
would involve faking a CPU with an architectural PMU.  Hmm.

Isn't asking for a bit much to want migration between hardware with
different real PMUs, *and* virtual performance monitoring at the same
time?

As far as OProfile is concerned: we can make it work with whatever
kind of virtual PMU we want, without a complete CPU fake.  It just
needs to be able to detect our virtual PMU.

>>>> The same ideas should work for KVM.  The whole hypervisor headache
>>>> just evaporates, of course.  What remains is the host kernel routing
>>>> samples to active guests (over virtio, I guess), and guests kernels
>>>> receiving samples from there instead of the hardware PMU.  In other
>>>> words, the sample channel from the host becomes our virtual PMU for
>>>> the guest.  Which needs a driver for it.  It's a weird PMU, because
>>>> you can't program its performance counters.  That's left to the host.
>>>>         
>>> Is there really a requirement to profile several userspace programs,
>>> on several guests, simultaneously?  If not, passing through the PMU
>>> will work best, with the additional advantage that guests will not
>>> need modification (so you can run Windows with VTune, for example).
>>>     
>>
>> There are uses for both kinds of system-wide profiling.
>>
>>   
>
> Okay; we can do both. Pass-through should be quite simple.
>
>>> If this three-tier profiling is actually needed, perhaps we can do all
>>> recording on the host, but have an interface to let the guest
>>> translate rip samples to something more meaningful.  This might work
>>> in this way:
>>>
>>> - oprofile on the host receives the pmu nmi
>>> - oprofile calls a hook (placed there by kvm) when it sees that the
>>> task is actually a virtual machine, instead of the usual translation
>>> process
>>> - kvm injects an interrupt into the guest
>>> - the guest converts the pmu rip value into a meaningful string and
>>> writes it into memory
>>> - (later) kvm picks this up and passes it back to oprofile
>>>
>>> The advantage here is that besides a fairly simple driver that needs
>>> to be loaded into the guest (and can be loaded automatically),
>>> everything is controlled from the host.  All the information is
>>> available on the host, so that sorting by counter occurences, for
>>> example, works.
>>>     
>>
>> Yep.
>>
>> Problems include:
>>
>> * OProfile user space receives dcookies from the kernel, which it
>>   passes to lookup_dcookie().  We'd have to delegate that to the
>>   appropriate guest.
>>   
>
> This part looks doable.
>
>> * OProfile user space needs to be taught where do find each guest's
>>   debug info.
>>   
>
> This one seems too horrible to contemplate. NFS exports on each guest
> and mounts on the host? With fuse sshfs?

OProfile searches for debug info in a couple of places in the
filesystem.  Perhaps we could teach it to take a guest root directory
parameter, and search a guest's debuginfo below that.  How the
debuginfo gets there is then the user's problem (NFS mount, fetch &
unpack rpms, ...).

> Collecting and analyzing all the data on the host looks much better
> than distributing it to guests, however, if we can manage to transfer
> the debug information.

Yes, it liberates the user from a whole lot of hassle.

However, there are uses for collecting a guest's data on the guest as
well.  Say you run some very low-overhead, system-wide sampling
continuously on the host, and let guests subscribe to it (no root on
host required).  Kind of like a very limited virtual PMU that doesn't
give you many choices on how to sample.

If we need this capability anyway, we can just as well start with it,
because it seems easier than collecting everything on the host.

> [Wild idea: rewrite lookup_dcookie in a systemtap-like language, and
> execute it on the host instead of on the guest. Basically the host
> would use the guest's vmlinux debug info to decode the information
> from raw kernel memory]

Urgs.  A bit too wild for my taste, I have to admit :)

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Performance monitoring units and KVM

Reply via email to