I was messing around with using the perf counters a couple weeks ago
as a way to get deterministic exits in the instruction stream of the
guest. I used the h/w msr save/restore area to disable the counters
and save the values on guest exit and restore them on entry. I also
set up the LVT to deliver NMI's on overflow.

This basically worked as expected, but I never got around the problem
of inconsistent NMI delivery. A large majority of the time the NMI
would be delivered in non-root mode and a vmexit would occur, as
expected. Occasionally, though the NMI is delivered in root mode. It
seems if the overflow occurs near the time a vmexit occurs for some
other reason, the NMI takes long enough to propagate that it's
delivered in root mode.

Based on Avi's recommendation, I just hacked the host IDT to still do
the necessary handling and reset the counters, but I'm interested in
whether or not others have seen the same thing. If not, I'm interested
in why.  I'm still dealing with other synchronization issues and
haven't been able to verify if my current approach to using the perf
counters will work consistently, but I'd like to avoid the IDT hacking
in any case.

-Casey

> > >>
> > >
> > >The Performance counters (PMU) cannot be fully virtualized, they need to
> > >run on the actual MSR registers. The PMU interrupt is controlled by the
> > >local APIC. To get overflow-based sampling to work in a guest, we need to
> > >allow the PMU to interrupt. Supposing we have allowed wrmsr,rdmsr to the
> > >PMU registers, the guest perfmon will setup the virtual APIC and virtual
> > >IDT as it normally would on real HW. VT-x takes care of the IDT but not
> > >of the APIC. The guest never touches the real APIC, qemu handles this.
> > >However if the host kernel is running perfmon, it does already have the
> > >actual APIC programmed for the PMU.
> > >
> > >In this configuration, the host perfmon interrupt driver catches the PMU
> > >interrupt generated while running in non-root VMX mode. At that point,
> > >there
> > >is a VM-exit. I have now been able to track down the type of exit in this
> > >case. You have a VM-exit for an external interrupt, which is fine, however
> > >the intr_info (VM_EXIT_INTR_INFO) is 0x0, in other words, VT-x does not
> > >give
> > >you any good info as to why you exited. As soon as you leave the VM_RESUME
> > >code,
> > >you branch to the host perfmon interrupt handler.
> > >
> >
> > Actually it can be convinced to give the interrupt number.  Right now,
> > we program VT not to ack interrupts, so we don't know their number, and
> > they are dispatched by the processor as soon as we enable interrupts on
> > the host.
> >
> > An alternative mechanism exists.  We can tell VT to ack the interrupt,
> > in which case the vector number becomes valid, but we need to dispatch
> > the interrupt ourselves using the 'int' instruction.
> >
> Ok, I missed that control but I see it now (bit 15).
>
> > As I'd rather not do that, perhaps we can program the apic to issue an
> > nmi instead of an interrupt while in guest mode.  On receipt of nmi, we
> > can call the host perfmon handler directly to interpret the performance
> > counters.
> >
> Yes, but that would be no different from what I have now without the ack-intr.
> What you'd like is to catch the PMU intr right away and re-inject it without
> using the host perfmon interrupt handler. It seem the only way to do this
> is by acking intr. Unfortunately, it is an all or nothing control.
>
> The other worry in this scheme is that the injection would be done without
> qemu intervening. Thus you would not be able to check whether the virtual APIC
> LVT vector is curently masked. Its configuration may be different from the
> actual APIC. But that is probably ok for now. Is there a plan to move the
> APIC emulation into KVM?
>
> > >In any case, the current solution I have for this is sort of hybrid because
> > >you rely on the host APIC to be programmed correctly, and then you need
> > >communication between the host perfmon code and the KVM kernel code to be
> > >able to inject the PMU interrupt back into the guest. Another solution I
> > >have
> > >experimented is for the host perfmon to notify the user level qemu APIC
> > >code
> > >(SIGIO) which then issues the right KVM_INTERRUPT ioctl(), but that is slow
> > >and has some rce condition with the guest.
> > >
> >
> > That looks promising.  The slowness can be addressed by (first) moving
> > to queued signals instead of delivered signals and (later) pushing the
> > apic emulation into the kernel.
> >
> > VT also has a facility to swap msrs on entry to the guest and back.
> >
> Yes, I am using some of that to stop monitoring when entering KVM.
>
> >
> > It really depends on what one wants to do with the performance monitor
> > on the guest:
> >
> > - if it's just to shut up the nmi watchdog, we can report a cpu model
> > that does not have the performance monitor (which would be a classic
> > Pentium? or maybe a 486?)
>
> No, the goal is to provide full acecss to the PMU for performance monitoring
> just like you would be able on bare HW.
>
> > - if we want something like the nmi watchdog to run, we can emulate all
> > counters based on cpu cycles, even if they count branches or something
> > else.  That gives an inaccurate but sort-of-working counter, which we
> > can emulate using host timers.
>
> No, that's is my goal. I want to allow monitoring tools to run in a guest.
> I think people would want to assess performance of their applications when
> running in a guest. You can get the outside view using the host perfmon,
> but you also want the inside view.
>
> > - if we want real performance monitoring, we need to do the msr swap.
>
> You mean if you do not want to conflict with the host using the PMU
> for itself? Well, the host perfmon can take care of this.
>
> --
> -Stephane

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to