On 19.05.19 02:40, [email protected] wrote:
Hello,

As part of my research, I’ve been looking to modify Jailhouse slightly to take 
advantage of the preemption timer provided by VMX for Intel x86-64. I know 
Jailhouse already uses it, but I was hoping to also use it to have the root 
cell periodically “check up” on the inmate (the intended use case of the 
preemption timer). I guess it’s similar to this topic, but for x86, not arm: 
https://groups.google.com/d/topic/jailhouse-dev/F074sQtFvao/discussion

Reading the 2014 LWN Jailhouse article, I found this paragraph:

“Currently, NMIs can only come from the hypervisor itself which uses them to 
control CPUs... When NMI occurs in VM, it exits and Jailhouse re-throws NMI in 
host mode. The CPU dispatches it through the host IDT... It schedules another 
VM exit using VMX feature known as preemption timer. vmcs_setup() sets this 
timer to zero, so if it is enabled, VM exit occurs immediately after VM entry. 
The reason behind this indirection is serialization: this way, NMIs (which are 
asynchronous by nature) are always delivered after guest entries (VM entry).”

So I have a few questions about this:

* What does ‘serializing NMIs’ mean?
* Why is that important?


Like most hypervisors, Jailhouse tries to intercept as little of the guest
activities as possible. On x86, this means that we do not root interrupts to the
hypervisor first but rather let them hit the guest directly. They are actually
blocked while the hypervisor is running.

But we need some event so that CPU 1 can inform CPU 2 that there is something to
do in hypervisor mode, e.g. to stop a guest. For that purpose, we intercept
NMIs, which are separate events on x86. But NMIs are usually non-maskable. Thus,
they may not only hit the guest and lead to a vmexit, they may also hit us while
in host mode. On AMD, we can actually block them (cgli), but Intel does not have
an equivalent feature.

Now, we could simply process the NMI reason on Intel in the NMI handler. But
that would significantly complicate the state machine. It's much easier to only
react on vmexit reasons. And now the preemption timer comes into play: The NMI
handler only ensures that an NMI event is translated into a follow-up vmexit,
right after the next entry - as if the NMI hit us in guest mode.

* How does Jailhouse reinject/deliver the NMI to the guest?


We ignore all NMIs targeting the guest, in fact. That's mostly harmless, but it
breaks perf & Co. Could be fixed, but it's neither simpler nor was there an
urgent need so far.

This is my current understanding of the code:

1) NMI occurs in guest, causing a VM exit.
2) vm_exit --> vcpu_handle_exit() --> vmx_handle_exception_nmi()
3) asm volatile("int %0" : : "i" (NMI_VECTOR));
4) nmi_entry --> vcpu_nmi_handler() --> enable preemption timer
5) vcpu_nmi_handler() returns
6) nmi_entry returns
7) vmx_check_events() --> disable preemption timer, x86_check_events()
8) vmx_check_events() returns
9) vmx_handle_exception_nmi() returns
10) vcpu_handle_exit() returns, (triggering a VM entry?)

There must be a VM entry somehow between steps 4 and 7, or else the preemption 
timer would continue to be disabled after step 10 and would never trigger (but 
it does). So where is the VM entry?

There is none in your case. As described above, the preemption timer comes into
play when the original vmexit reason was not NMI but something different. We
only get one reason per exit, and the NMI hitting is in host mode will not
trigger a second reason - unless we play the preemption timer trick.


Also, I don’t understand how this serializes the NMI, because I don’t see how 
the host delivers the NMI interrupt to the guest. Steps 2-10 are all on the 
host, correct?


As explained above, there is no NMI delivery to the guest as this point.

So, regarding your additional use case for the preemption timer: If you start
using it for two purposes, you must ensure that programming the timer is not
causing it to tick longer than required. If there was a value programmed that is
supposed to trigger immediate exit on next entry, this must be preserved.

Here is a possible pattern to achieve this: Whenever you program the preemption
timer with a larger timeout than 0, check if there is already an immediate exit
pending. However, this has to take into account that you cannot make the
conditional writing atomic /wrt the NMI handler. So the NMI handler should also
set some this_cpu_data()->immediate_exit flag and write 0 into the timer. Then
the site trying to program a larger timeout can do the following:

        vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, timeout);
        if (this_cpu_data()->immediate_exit)
                vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, 0);

By checking after writing a non-zero timeout you ensure that you reliably
restore any zero timeout that might have been written by the NMI handler 
earlier.

HTH,
Jan

--
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/b055f893-dfa4-d0ca-5399-32b1e413229a%40web.de.
For more options, visit https://groups.google.com/d/optout.

Reply via email to