Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code

Alexander Graf Tue, 24 Jun 2014 15:42:00 -0700


On 24.06.14 20:53, Scott Wood wrote:

On Sun, 2014-06-22 at 23:23 +0200, Alexander Graf wrote:

Howdy,


Ben reminded me a while back that we have a nasty race in our KVM PV code.

We replace a few instructions with longer streams of instructions to check
whether it's necessary to trap out from it (like mtmsr, no need to trap if
we only disable interrupts). During those replacement chunks we must not get
any interrupts, because they might overwrite scratch space that we already
used to save otherwise clobbered register state into.

So we have a thing called "critical sections" which allows us to atomically
get in and out of "interrupt disabled" modes without touching MSR. When we
are supposed to deliver an interrupt into the guest while we are in a critical
section, we just don't inject the interrupt yet, but leave it be until the
next trap.

However, we never really know when the next trap would be. For all we know it
could be never. At this point we created a race that is a potential source
for interrupt loss or at least deferral.

This patch set aims at solving the race. Instead of merely deferring an
interrupt when we see such a situation, we go into a special instruction
interpretation mode. In this mode, we interpret all PPC assembler instructions
that happen until we are out of the critical section again, at which point
we can now inject the interrupt.

This bug only affects KVM implementations that make use of the magic page, so
e500v2, book3s_32 and book3s_64 PR KVM.

Would it be possible to single step through the critical section
instead?  Or set a high res timer to expire very quickly?


There are a few other alternatives to this implementation:

1) Unmap the magic page, emulate all memory access to it while incritical and irq pending2) Trigger a timer that sends a request to the vcpu to wake it frompotential sleep and inject the irq

  3) Single step until we're beyond the critical section
  4) Probably more that I can't think of right now :)

Each has their good and bad sides. Unmapping the magic page addscomplexity to the MMU mapping code, since we need to make sure we don'tmap it back in on demand and treat faults to it specially.

The timer interrupt works, but I'm not fully convinced that it's a goodidea for things like MC events which we also block during criticalsections on e500v2.

Single stepping is hard enough to get right on interaction between QEMU,KVM and the guest. I didn't really want to make that stuff any morecomplicated.

This approach is really just one out of many - and it's one that'snicely self-contained and shouldn't have any impact at all onimplementations that don't care about it ;).



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code

Reply via email to