Handle multiple interrupts injection in one vmexit
Hi there, External interrupts are injected in function vcpu_enter_guest, with checking KVM_REQ_EVENT. If there are more than one interrupts in one vmexit (e.g. nmi and external events occur concurrently in one vmexit), KVM will handle only one interrupt of the highest priority (e.g. NMI), right? So only NMI is injected in this vmexit, thus when will the other external events injected? I don't see any extra setting of KVM_REQ_EVENT to handle the lower priority interrupts injection in KVM. Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handle multiple interrupts injection in one vmexit
Thanks Jan. On Mon, May 26, 2014 at 6:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2014-05-26 15:51, Arthur Chunqi Li wrote: Hi there, External interrupts are injected in function vcpu_enter_guest, with checking KVM_REQ_EVENT. If there are more than one interrupts in one vmexit (e.g. nmi and external events occur concurrently in one vmexit), KVM will handle only one interrupt of the highest priority (e.g. NMI), right? So only NMI is injected in this vmexit, thus when will the other external events injected? I don't see any extra setting of KVM_REQ_EVENT to handle the lower priority interrupts injection in KVM. [you should mention that you are talking about x86 here] If both events are pending, inject_pending_event will try to inject the NMI. vcpu_enter_guest will then notice that there are still pending interrupts and request the interrupt window vmexit. If the NMI should be blocked, an NMI window exit is requested. But on NMI injection, another KVM_REQ_EVENT is send (see e.g. handle_nmi_window in vmx.c). Yes, I see the Bit 2 (Interrupt-window exiting) and Bit 22 (NMI-window exiting) on Primary Processor-Based VM-Execution Controls is used to handle simultaneous (or contiguous) interrupt/NMI injection. See Intel SDM Chapter 24.6.2 if any other guys need this information. Arthur Jan -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to disable IDE DMA in KVM or in guest OS
On Thu, May 15, 2014 at 2:39 PM, Jan Kiszka jan.kis...@web.de wrote: On 2014-05-15 07:54, Arthur Chunqi Li wrote: Hi Jan and there, I want to disable IDE BMDMA in Qemu/KVM and let guest OS uses only PIO mode. Are there any configurations in Qemu or KVM to disable the hardware support of DMA? Not that I know. These features are built into the chipsets we emulate, and there seems to be no option to disable them. Maybe the isapc will not expose DMA capabilities - but will also lack a lot of other things like PCI... Well, if I boot guest Linux with ide-core.nodma=0.0 libata.dma=0 ide=nodma ide0=nodma, why are bmdma irqs (14 and 15) also triggered? I think guest OS should only use PIO in this situation. Arthur Jan I have tried to disable IDE DMA in guest OS booting params as follows: ide-core.nodma=0.0 libata.dma=0 ide=nodma ide0=nodma But I can also get the followings in dmesg: [0.533276] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14 [0.533641] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15 and I do tracked irq 14 and irq 15 in ioapic_deliver when read/write disk. How could I totally disable IDE BMDMA from guest's boot time? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to disable IDE DMA in KVM or in guest OS
Hi Jan and there, I want to disable IDE BMDMA in Qemu/KVM and let guest OS uses only PIO mode. Are there any configurations in Qemu or KVM to disable the hardware support of DMA? I have tried to disable IDE DMA in guest OS booting params as follows: ide-core.nodma=0.0 libata.dma=0 ide=nodma ide0=nodma But I can also get the followings in dmesg: [0.533276] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14 [0.533641] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15 and I do tracked irq 14 and irq 15 in ioapic_deliver when read/write disk. How could I totally disable IDE BMDMA from guest's boot time? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
CPUs support APIC virtualization
Hi there, I have noticed in Intel SDM that some kinds of CPUs support APIC virtualization (e.g. Virtual-interrupt delivery). I checked all my Intel CPUs' MSR and found none of them support this. So do anybody know which types of Intel CPU supports APIC virtualization? Or where can I get related information? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The action of Accessed and Dirty bit for EPT
Hi there, I write a piece of code to test the action of Accessed and Dirty bit of EPT in Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz. Firstly I build a totally new EPT paging structure with A/D logging on, then run some operating system codes and log all the EPT violation (say trap log). At some point I paused the OS, parse the EPT paging structure and log all the entries built in the past period (say A/D log). Here I get some interesting points: 1. Some EPT entries are built without either Accessed or Dirty bit set, does this mean that CPU only construct these entries but doesn't touch them? 2. Some entries only exist in A/D log. Does A/D logging module has some bias or some mistake? These two logs (trap log and A/D log) should be the same according to my understanding, and when I tried in the previous CPU without A/D bit supporting, these two logs are exactly the same, though I just parse the EPT paging structure and cannot distinguish Accessed or Dirty in it. Thanks ahead, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Guest VMs access a strange address
Hi there, I have tried to log EPT construction status at VM startup, that is to add some codes in function __direct_map (arch/x86/kvm/mmu.c). __direct_map constructs the EPT paging structure when a guest page firstly touched, and I can get related gfn and pfn here. But I tracked a strange address, which is vcpu 0 pfn 0x8000 gfn 0xfebf1. Here pfn and gfn are the value of the params in function __direct_map. How can pfn be 0x8000? Besides, I searched 0xfebf1 in kvm-memslots and cannot get it in any memslots, but __direct_map catches this memory access and build the mapping. Why should this happen? Thanks ahead, Arthur A1. Here's my code in __direct_map: for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { printk(KERN_NOTICE vcpu %d\tpfn 0x%llx\tgfn 0x%llx\n, kvm-tm_turn, vcpu-vcpu_id, pfn, gfn); } mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, write, emulate, level, gfn, pfn, prefault, map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu-stat.pf_fixed; break; } -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to get to know vcpu status from outside
Hi Paolo, On Tue, Dec 17, 2013 at 8:28 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 17/12/2013 12:43, Arthur Chunqi Li ha scritto: Hi Paolo, Thanks very much. And...(see below) On Tue, Dec 17, 2013 at 7:21 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 17/12/2013 07:11, Arthur Chunqi Li ha scritto: Hi Paolo, Since VCPU is managed the same as a process in kernel, how can I know the status (running, sleeping etc.) of a vcpu in kernel? Is there a variant in struct kvm_vcpu or something else indicate this? waitqueue_active(vcpu-wq) means that the VCPU is sleeping in the kernel (i.e. in a halted state). vcpu-mode == IN_GUEST_MODE means that the VCPU is running. Anything else means that the host is running some kind of glue code (either kernel or userspace). Another question about scheduler. When I have 4 vcpus and the workload of VM is low, and I noticed that it tends to activate only 1 or 2 vcpus. Does this mean the other 2 vcpus are scheduled out or into sleeping status? This depends on what the guest scheduler is doing. The other 2 VCPUs are probably running for so little time (a few microseconds every 1/100th of a second) that you do not see them, and they stay halted the rest of the time. Remember that KVM has no scheduler of its own. What you see is the combined result of the guest and host schedulers. Besides, if vcpu1 is running on pcpu1, and a kernel thread running on pcpu0. Can the kernel thread send a message to force vcpu1 trap to VMM? How can I do this? Yes, with kvm_vcpu_kick. KVM tracks internally which pcpu will run the vcpu in vcpu-cpu, and kvm_vcpu_kick sends either a wakeup (if the vcpu is sleeping) or an IPI (if it is running). What is vcpu's action if kvm_vcpu_kick(vcpu)? What is the exit_reason of the kicked vcpu? No exit reason, you just get a lightweight exit to the host kernel. If you want a userspace exit, you'd need to set a bit in vcpu-requests before kvm_vcpu_kick (which you can do best with kvm_make_request), and change that to a userspace exit in vcpu_enter_guest. There's already an example of that, search arch/x86/kvm/x86.c for KVM_REQ_TRIPLE_FAULT. I failed to kvm_vcpu_kick inactive vcpus at the beginning of the boot time (from power up to grub) of a VM. I think this may because other vcpus are not activated by SMP system at boot time, right? How can I distinguish vcpus in such status? Thanks, Arthur Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to trace every memory access
Hi Paolo, When using EPT in KVM, does every vcpu has an EPT paging structure or all vcpus share one? Thanks, Arthur On Wed, Nov 20, 2013 at 6:41 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 20/11/2013 08:55, Arthur Chunqi Li ha scritto: Hi Paolo, Currently I can trap every first write/read to a memory page from guest VM (add codes in tdp_page_fault). If I want to trace every memory access to a page, how can I achieve such goal in KVM? You don't. :) If you are looking for something like this, a dynamic recompilation engine (such as QEMU's TCG) probably ends up being faster. Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to trace every memory access
Hi Paolo, I want to rebuild the EPT paging structure, so I use kvm_mmu_unload() followed with kvm_mmu_reload(). But it seems fail because I cannot trap EPT_VIOLATION I want after the rebuild. How could I totally rebuild the EPT paging structure? Arthur On Fri, Dec 20, 2013 at 7:58 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 20/12/2013 10:15, Arthur Chunqi Li ha scritto: Hi Paolo, When using EPT in KVM, does every vcpu has an EPT paging structure or all vcpus share one? All MMU structures are in vcpu-arch.mmu and vcpu-arch.nested_mmu, so they're per-VCPU. Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to trace every memory access
On Fri, Dec 20, 2013 at 7:58 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 20/12/2013 10:15, Arthur Chunqi Li ha scritto: Hi Paolo, When using EPT in KVM, does every vcpu has an EPT paging structure or all vcpus share one? All MMU structures are in vcpu-arch.mmu and vcpu-arch.nested_mmu, so they're per-VCPU. If an EPT entry is built by a VCPU, will this entry be propagated to other VCPU's EPT paging structures? Arthur Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
About preemption timer
Hi Jan and Paolo, I've tried to use preemption timer in KVM to trap vcpu regularly, but there's something unexpected. I run a VM with 4 vcpus and give them the same preemption timer value (e.g. 100) with all bits set (activate/save bits), then reset the value in preemption time-out handler. Thus I expected these vcpus trap regularly in some special turns. But I found that when the VM is not busy, some vcpus are trapped much less frequently than others. In Intel SDM, I noticed that preemption timer is only related to TSC, and I think all the vcpus should trap in a similar frequency. Could u help me explain this phenomenon? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About preemption timer
Hi Jan, On Tue, Dec 17, 2013 at 7:21 PM, Jan Kiszka jan.kis...@siemens.com wrote: On 2013-12-17 10:32, Arthur Chunqi Li wrote: Hi Jan and Paolo, I've tried to use preemption timer in KVM to trap vcpu regularly, but there's something unexpected. I run a VM with 4 vcpus and give them the same preemption timer value (e.g. 100) with all bits set (activate/save bits), then reset the value in preemption time-out handler. Thus I expected these vcpus trap regularly in some special turns. But I found that when the VM is not busy, some vcpus are trapped much less frequently than others. In Intel SDM, I noticed that preemption timer is only related to TSC, and I think all the vcpus should trap in a similar frequency. Could u help me explain this phenomenon? Are you on a CPU that has non-broken preemption timer support? Anything prior Haswell is known to tick with arbitrary frequencies. My CPU is Intel(R) Xeon(R) CPU E5620 @ 2.40GHz. Besides, what do you mean by arbitrary frequencies? Arthur BTW, we will have to re-implement preemption timer support with the help of a regular host timer due to the breakage when halting L2 (see my test case). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to get to know vcpu status from outside
Hi Paolo, Thanks very much. And...(see below) On Tue, Dec 17, 2013 at 7:21 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 17/12/2013 07:11, Arthur Chunqi Li ha scritto: Hi Paolo, Since VCPU is managed the same as a process in kernel, how can I know the status (running, sleeping etc.) of a vcpu in kernel? Is there a variant in struct kvm_vcpu or something else indicate this? waitqueue_active(vcpu-wq) means that the VCPU is sleeping in the kernel (i.e. in a halted state). vcpu-mode == IN_GUEST_MODE means that the VCPU is running. Anything else means that the host is running some kind of glue code (either kernel or userspace). Another question about scheduler. When I have 4 vcpus and the workload of VM is low, and I noticed that it tends to activate only 1 or 2 vcpus. Does this mean the other 2 vcpus are scheduled out or into sleeping status? Besides, if vcpu1 is running on pcpu1, and a kernel thread running on pcpu0. Can the kernel thread send a message to force vcpu1 trap to VMM? How can I do this? Yes, with kvm_vcpu_kick. KVM tracks internally which pcpu will run the vcpu in vcpu-cpu, and kvm_vcpu_kick sends either a wakeup (if the vcpu is sleeping) or an IPI (if it is running). What is vcpu's action if kvm_vcpu_kick(vcpu)? What is the exit_reason of the kicked vcpu? Paolo Besides, can I pin a vcpu to a pcpu? That is to say, I assigned a pcpu only for a vcpu and pcpu can only run this vcpu? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About preemption timer
On Tue, Dec 17, 2013 at 8:43 PM, Jan Kiszka jan.kis...@siemens.com wrote: On 2013-12-17 12:31, Arthur Chunqi Li wrote: Hi Jan, On Tue, Dec 17, 2013 at 7:21 PM, Jan Kiszka jan.kis...@siemens.com wrote: On 2013-12-17 10:32, Arthur Chunqi Li wrote: Hi Jan and Paolo, I've tried to use preemption timer in KVM to trap vcpu regularly, but there's something unexpected. I run a VM with 4 vcpus and give them the same preemption timer value (e.g. 100) with all bits set (activate/save bits), then reset the value in preemption time-out handler. Thus I expected these vcpus trap regularly in some special turns. But I found that when the VM is not busy, some vcpus are trapped much less frequently than others. In Intel SDM, I noticed that preemption timer is only related to TSC, and I think all the vcpus should trap in a similar frequency. Could u help me explain this phenomenon? Are you on a CPU that has non-broken preemption timer support? Anything prior Haswell is known to tick with arbitrary frequencies. My CPU is Intel(R) Xeon(R) CPU E5620 @ 2.40GHz. Hmm, this one seems unaffected. Didn't find a specification update. Just like Paolo asked: Your original test case passes? Besides, what do you mean by arbitrary frequencies? On older CPUs, the tick rate of the preemption timer does not correlate with the TSC, definitely not in the way the spec defined. Back to your original question: Are we talking about native use of the preemption timer via a patched KVM or nested use inside a KVM virtual machine? It is about the native use. I think it may due to the scheduling. When vcpu is scheduled out of pcpu, will the preemption timer work still? Oh, another problem, I use the released kernel 3.11, not the latest one. Does this matter? Arthur Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to get to know vcpu status from outside
Hi Paolo, Since VCPU is managed the same as a process in kernel, how can I know the status (running, sleeping etc.) of a vcpu in kernel? Is there a variant in struct kvm_vcpu or something else indicate this? Besides, if vcpu1 is running on pcpu1, and a kernel thread running on pcpu0. Can the kernel thread send a message to force vcpu1 trap to VMM? How can I do this? Thanks very much, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PMU in KVM
Hi Gleb, I noticed that arch/x86/kvm/pmu.c is on your management and I have some questions about PMU in KVM. Thanks ahead if you can spare time answering these questions. 1. How could PMU cooperate with Intel VT? For example, I only find flags in IA32_PERFEVTSELx MSRs to count in OS and USER mode (Ring 0 and other rings). What is the consequence when I open VMXON with PMU enabled? Can I distinguish the counts in root and non-root mode? I cannot find the related descriptions in Intel manual. 2. What is the current status of vPMU in KVM? Is it auto-enabled? And how can I use (or enable/disable) it? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to trace every memory access
Hi Paolo, Currently I can trap every first write/read to a memory page from guest VM (add codes in tdp_page_fault). If I want to trace every memory access to a page, how can I achieve such goal in KVM? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EPT page fault procedure
Hi Paolo, On Thu, Oct 31, 2013 at 6:54 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 31/10/2013 10:07, Arthur Chunqi Li ha scritto: Sorry to disturb you with so many trivial questions in KVM EPT memory management and thanks for your patience. No problem, please remain onlist though. Adding back kvm@vger.kernel.org. I got confused in the EPT page fault processing function (tdp_page_fault). I think when Qemu registers the memory region for a VM, physical memory mapped to this PVA region isn't allocated indeed. So the page fault procedure of EPT violation which maps GFN to PFN should allocate the real physical memory and establish the real mapping from PVA to PFA in Qemu's page Do you mean HVA to PFN? If so, you can look at function hva_to_pfn. :) I mean in this procedure, how is physical memory actually allocated? When qemu firstly initialized the mapping of its userspace memory region to VM, the physical memory corresponding to this region are not actually allocated. So I think KVM should do this allocation somewhere. table. What is the point in tdp_page_fault() handling such mapping from PVA to PFA? The EPT page table entry is created in __direct_map using the pfn returned by try_async_pf. try_async_pf itself gets the pfn from gfn_to_pfn_async and gfn_to_pfn_prot. Both of them call __gfn_to_pfn with different arguments. __gfn_to_pfn first goes from GFN to HVA using the memslots (gfn_to_memslot and, in __gfn_to_pfn_memslot, __gfn_to_hva_many), then it calls hva_to_pfn. Ultimately, hva_to_pfn_fast and hva_to_pfn_slow is where KVM calls functions from the kernel's get_user_page family. What will KVM do if get_user_page() returns a page not really exists in physical memory? Thanks, Arthur Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Calling to kvm_mmu_load
Hi Paolo, On Tue, Oct 29, 2013 at 8:55 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 29/10/2013 06:39, Arthur Chunqi Li ha scritto: What is the dirty page tracking code path? I find a obsoleted flag dirty_page_log_all in the very previous codes, but I cannot get the most recent version of tracking dirty pages. Basically everything that accesses the dirty_bitmap field of struct kvm_memory_slot is involved. It all starts when the KVM_SET_USER_MEMORY_REGION ioctl is called with the KVM_MEM_LOG_DIRTY_PAGES flag set. I find the mechanism here is set all pages read-only to track all the dirty pages. But EPT provides such a dirty bit in EPT paging structures. Why don't we use this? Arthur Besides, I noticed that memory management in KVM uses the mechanism with struct kvm_memory_slot. How is kvm_memory_slot used with the cooperation of Linux memory management? kvm_memory_slot just maps a host userspace address range to a guest physical address range. Cooperation with Linux memory management is done with the Linux MMU notifiers. MMU notifiers let KVM know that a page has been swapped out, and KVM reacts by invalidating the shadow page tables for the corresponding guest physical address. Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Calling to kvm_mmu_load
On Tue, Oct 29, 2013 at 8:55 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 29/10/2013 06:39, Arthur Chunqi Li ha scritto: What is the dirty page tracking code path? I find a obsoleted flag dirty_page_log_all in the very previous codes, but I cannot get the most recent version of tracking dirty pages. Basically everything that accesses the dirty_bitmap field of struct kvm_memory_slot is involved. It all starts when the KVM_SET_USER_MEMORY_REGION ioctl is called with the KVM_MEM_LOG_DIRTY_PAGES flag set. Besides, I noticed that memory management in KVM uses the mechanism with struct kvm_memory_slot. How is kvm_memory_slot used with the cooperation of Linux memory management? kvm_memory_slot just maps a host userspace address range to a guest physical address range. Cooperation with Linux memory management is done with the Linux MMU notifiers. MMU notifiers let KVM know that a page has been swapped out, and KVM reacts by invalidating the shadow page tables for the corresponding guest physical address. So for each VM, qemu need to register its memory region and KVM stores this region of GPA to HVA mapping in kvm_memory_slot, and at the first page fault KVM uses EPT to map GPA to HPA. Am I right? In this way, how is ballooning mechanism implemented in KVM memory management module? Thanks, Arthur Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Calling to kvm_mmu_load
Hi Paolo, On Fri, Oct 25, 2013 at 8:43 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 24/10/2013 08:55, Arthur Chunqi Li ha scritto: Hi Paolo, Thanks for your reply. On Wed, Oct 23, 2013 at 2:21 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 21/10/2013 08:56, Arthur Chunqi Li ha scritto: Hi there, I noticed that kvm_mmu_reload() is called every time in vcpu enter, and kvm_mmu_load() is called in this function when root_hpa is INVALID_PAGE. I get confused why and when root_hpa can be set to INVALID_PAGE? I find one condition that if vcpu get request KVM_REQ_MMU_RELOAD, kvm_mmu_unload() is called to invalid root_hpa, but this condition cannot cover all occasions. Look also at mmu_free_roots, kvm_mmu_unload and kvm_mmu_reset_context. In normal cases and without EPT, it should be called when CR3 changes or when the paging mode changes (32-bit, PAE, 64-bit, no paging). With EPT, this kind of change won't reset the MMU (CR3 changes won't cause a vmexit at all, in fact). When EPT is enabled, why will root_hpa be set to INVALID_PAGE when a VM boots? Because EPT page tables are only built lazily. The EPT page tables start all-invalid, and are built as the guest accesses pages at new guest physical addresses (instead, shadow page tables are built as the guest accesses pages at new guest virtual addresses). I find that Qemu reset root_hpa with KVM_REQ_MMU_RELOAD request several time when booting a VM, why? This happens when the memory map changes. A previously-valid guest physical address might become invalid now, and the EPT page tables have to be emptied. And will VM use EPT from the very beginning when booting? Yes. But it's not the VM. It's KVM that uses EPT. The VM only uses EPT if you're using nested virtualization, and EPT is enabled. L1's KVM uses EPT, L2 doesn't (because it doesn't run KVM). With nested virtualization, roots are invalidated whenever kvm-arch.mmu changes meaning from L1-L0 or L2-L0 or vice versa (in the special case where EPT is disabled on L0, this is trivially because vmentry loads CR3 from the vmcs02). Besides, in function tdp_page_fault(), I find two different execution flow which may not reach __direct_map() (which I think is the normal path to handle PF), they are fast_page_fault() and try_async_pf(). When will these two paths called when handling EPT page fault? fast_page_fault() is called if you're using dirty page tracking. It checks if we have a read-only page that is in a writeable memory slot (SPTE_HOST_WRITEABLE) and whose PTE allows writes (SPTE_MMU_WRITEABLE). If these conditions are satisfied, the page was read-only because of dirty page tracking; it is made read-write with a single cmpxchg and sets the bit for the page in the dirty bitmap. What is the dirty page tracking code path? I find a obsoleted flag dirty_page_log_all in the very previous codes, but I cannot get the most recent version of tracking dirty pages. Besides, I noticed that memory management in KVM uses the mechanism with struct kvm_memory_slot. How is kvm_memory_slot used with the cooperation of Linux memory management? Thanks, Arthur try_async_pf will inject a dummy pagefault instead of creating the EPT page table, and create the page table in the background. The guest will do something else (run another task) until the EPT page table has been created; then a second dummy pagefault is injected. kvm_arch_async_page_not_present signals the first page fault, kvm_arch_async_page_present signals the second. For this to happen, the guest must have enabled the asynchronous page fault feature with a write to a KVM-specific MSR. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Calling to kvm_mmu_load
Hi Paolo, Thanks for your reply. On Wed, Oct 23, 2013 at 2:21 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 21/10/2013 08:56, Arthur Chunqi Li ha scritto: Hi there, I noticed that kvm_mmu_reload() is called every time in vcpu enter, and kvm_mmu_load() is called in this function when root_hpa is INVALID_PAGE. I get confused why and when root_hpa can be set to INVALID_PAGE? I find one condition that if vcpu get request KVM_REQ_MMU_RELOAD, kvm_mmu_unload() is called to invalid root_hpa, but this condition cannot cover all occasions. Look also at mmu_free_roots, kvm_mmu_unload and kvm_mmu_reset_context. In normal cases and without EPT, it should be called when CR3 changes or when the paging mode changes (32-bit, PAE, 64-bit, no paging). With EPT, this kind of change won't reset the MMU (CR3 changes won't cause a vmexit at all, in fact). When EPT is enabled, why will root_hpa be set to INVALID_PAGE when a VM boots? I find that Qemu reset root_hpa with KVM_REQ_MMU_RELOAD request several time when booting a VM, why? And will VM use EPT from the very beginning when booting? With nested virtualization, roots are invalidated whenever kvm-arch.mmu changes meaning from L1-L0 or L2-L0 or vice versa (in the special case where EPT is disabled on L0, this is trivially because vmentry loads CR3 from the vmcs02). Besides, in function tdp_page_fault(), I find two different execution flow which may not reach __direct_map() (which I think is the normal path to handle PF), they are fast_page_fault() and try_async_pf(). When will these two paths called when handling EPT page fault? Thanks, Arthur Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Calling to kvm_mmu_load
Hi there, I noticed that kvm_mmu_reload() is called every time in vcpu enter, and kvm_mmu_load() is called in this function when root_hpa is INVALID_PAGE. I get confused why and when root_hpa can be set to INVALID_PAGE? I find one condition that if vcpu get request KVM_REQ_MMU_RELOAD, kvm_mmu_unload() is called to invalid root_hpa, but this condition cannot cover all occasions. Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Hi Jan, On Fri, Oct 11, 2013 at 12:12 AM, Jan Kiszka jan.kis...@siemens.com wrote: On 2013-10-02 20:47, Jan Kiszka wrote: On 2013-09-30 11:08, Jan Kiszka wrote: On 2013-09-26 17:04, Paolo Bonzini wrote: Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- ChangeLog to v4: Format changes and remove a flag in nested_vmx. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 44 +++-- 2 files changed, 43 insertions(+), 2 deletions(-) Hi all, the test fails for me if the preemption timer value is set to a value that is above ~2000 (which means ~65000 TSC cycles on this machine). The preemption timer seems to count faster than what is expected, for example only up to 4 million cycles if you set it to one million. So, I am leaving the patch out of kvm/queue for now, until I can test it on more processors. I've done some measurements with the help of ftrace on the time it takes to let the preemption timer trigger (no adjustments via Arthur's patch were involved): On my Core i7-620M, the preemption timer seems to tick almost 10 times faster than spec and scale value (5) suggests. I've loaded a value of 10, and it took about 130 盜 until I got a vmexit with reason PREEMPTION_TIMER (no other exists in between). qemu-system-x86-13765 [003] 298562.966079: bprint: prepare_vmcs02: preempt val 10 qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0 qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason PREEMPTION_TIMER rip 0x401fea info 0 0 That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz * 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty clear on this. It just doesn't match reality. Very strange. ...but documented: I found an related errata for my processor (AAT59) and also for Xeon 5500 (AAK139). At least current Haswell generation is no affected. I can test the patch on a Haswell board I have at work later this week. To complete this story: Arthur's patch works fine on a non-broken CPU (here: i7-4770S). Arthur, find some fix-ups for your test case below. It avoids printing from within L2 as this could deadlock when the timer fires and L1 then tries to print something. Also, it disables the preemption timer on leave so that it cannot fire later on again. If you want to fold this into your patch, feel free. Otherwise I can post a separate patch on top. I think this can be treated as a separate patch to our test suite. You can post it on top. I have tested it and it works fine. Arthur Jan diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 4372878..66a4201 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -141,6 +141,9 @@ void preemption_timer_init() preempt_val = 1000; vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); preempt_scale = rdmsr(MSR_IA32_VMX_MISC) 0x1F; + + if (!(ctrl_exit_rev.clr EXI_SAVE_PREEMPT)) + printf(\tSave preemption value is not supported\n); } void preemption_timer_main() @@ -150,9 +153,7 @@ void preemption_timer_main() printf(\tPreemption timer is not supported\n); return; } - if (!(ctrl_exit_rev.clr EXI_SAVE_PREEMPT)) - printf(\tSave preemption value is not supported\n); - else { + if (ctrl_exit_rev.clr EXI_SAVE_PREEMPT) { set_stage(0); vmcall(); if (get_stage() == 1) @@ -161,8 +162,8 @@ void preemption_timer_main() while (1) { if (((rdtsc() - tsc_val) preempt_scale) 10 * preempt_val) { - report(Preemption timer, 0); - break; + set_stage(2); + vmcall(); } } } @@ -183,7 +184,7 @@ int preemption_timer_exit_handler() report(Preemption timer, 0); else report(Preemption timer, 1); - return VMX_TEST_VMEXIT; + break; case VMX_VMCALL: switch (get_stage()) { case 0: @@ -195,24 +196,29 @@ int preemption_timer_exit_handler() EXI_SAVE_PREEMPT) ctrl_exit_rev.clr; vmcs_write(EXI_CONTROLS, ctrl_exit); } - break
[PATCH v2] kvm-unit-tests: VMX: Comments on the framework and writing test cases
Add some comments on the framework of nested VMX testing, and guides of how to write new test cases. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 30 ++ x86/vmx_tests.c | 13 + 2 files changed, 43 insertions(+) diff --git a/x86/vmx.c b/x86/vmx.c index 9db4ef4..d5ae609 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -1,3 +1,33 @@ +/* + * x86/vmx.c : Framework for testing nested virtualization + * This is a framework to test nested VMX for KVM, which + * started as a project of GSoC 2013. All test cases should + * be located in x86/vmx_tests.c and framework related + * functions should be in this file. + * + * How to write test cases? + * Add callbacks of test suite in variant vmx_tests. You can + * write: + * 1. init function used for initializing test suite + * 2. main function for codes running in L2 guest, + * 3. exit_handler to handle vmexit of L2 to L1 + * 4. syscall handler to handle L2 syscall vmexit + * 5. vmenter fail handler to handle direct failure of vmenter + * 6. guest_regs is loaded when vmenter and saved when + * vmexit, you can read and set it in exit_handler + * If no special function is needed for a test suite, use + * coressponding basic_* functions as callback. More handlers + * can be added to vmx_tests, see details of struct vmx_test + * and function test_run(). + * + * Currently, vmx test framework only set up one VCPU and one + * concurrent guest test environment with same paging for L2 and + * L1. For usage of EPT, only 1:1 mapped paging is used from VFN + * to PFN. + * + * Author : Arthur Chunqi Li yzt...@gmail.com + */ + #include libcflat.h #include processor.h #include vm.h diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 0759e10..5fc16a3 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,3 +1,8 @@ +/* + * All test cases of nested virtualization should be in this file + * + * Author : Arthur Chunqi Li yzt...@gmail.com + */ #include vmx.h #include msr.h #include processor.h @@ -782,6 +787,14 @@ struct insn_table { u32 test_field; }; +/* + * Add more test cases of instruction intercept here. Elements in this + * table is: + * name/control flag/insn function/type/exit reason/exit qulification/ + * instruction info/field to test + * The last field defines which fields (exit_qual and insn_info) need to be + * tested in exit handler. If set to 0, only reason is checked. + */ static struct insn_table insn_table[] = { // Flags for Primary Processor-Based VM-Execution Controls {HLT, CPU_HLT, insn_hlt, INSN_CPU0, 12, 0, 0, 0}, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- ChangeLog to v4: Format changes and remove a flag in nested_vmx. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 44 +++-- 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..e1fa13a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) || + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_exit_ctls_high = ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + nested_vmx_pinbased_ctls_high = ~PIN_BASED_VMX_PREEMPTION_TIMER; + } nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + if (!(get_vmcs12(vcpu)-pin_based_vm_exec_control + PIN_BASED_VMX_PREEMPTION_TIMER)) + return; + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc()) + - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 = preempt_val_l1) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); +} + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) atomic_switch_perf_msrs(vmx); debugctlmsr = get_debugctlmsr(); + if (is_guest_mode(vcpu) !(vmx-nested.nested_run_pending)) + nested_adjust_preemption_timer(vcpu); vmx-__launched = vmx-loaded_vmcs-launched; asm( /* Store host registers */ @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12-guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); + if ((vmcs12-pin_based_vm_exec_control + PIN_BASED_VMX_PREEMPTION_TIMER) + (vmcs12-vm_exit_controls + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
[PATCH] kvm-unit-tests: VMX: Comments on the framework and writing test cases
Add some comments on the framework of nested VMX testing, and guides of how to write new test cases. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 25 + x86/vmx_tests.c | 13 + 2 files changed, 38 insertions(+) diff --git a/x86/vmx.c b/x86/vmx.c index 9db4ef4..3aa8600 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -1,3 +1,28 @@ +/* + * x86/vmx.c : Framework for testing nested virtualization + * This is a framework to test nested VMX for KVM, which is + * a project of GSoC 2013. All test cases are located in + * vmx_tests, which is defined in x86/vmx_tests.c. All test + * cases should be located in x86/vmx_tests.c and framework + * related functions should be in this file. + * + * How to write test suite? + * Add functions of test suite in variant vmx_tests. You can + * write: + * init function used for initializing test suite + * main function for codes running in L2 guest, + * exit_handler to handle vmexit of L2 to L1 (framework) + * syscall handler to handle L2 syscall vmexit + * vmenter fail handler to handle direct failure of vmenter + * init registers used to store register value in initialization + * If no special function is needed for a test suite, you can use + * basic_* series of functions. More handlers can be added to + * vmx_tests, see details of struct vmx_test and function + * test_run(). + * + * Author : Arthur Chunqi Li yzt...@gmail.com + */ + #include libcflat.h #include processor.h #include vm.h diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 0759e10..5fc16a3 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,3 +1,8 @@ +/* + * All test cases of nested virtualization should be in this file + * + * Author : Arthur Chunqi Li yzt...@gmail.com + */ #include vmx.h #include msr.h #include processor.h @@ -782,6 +787,14 @@ struct insn_table { u32 test_field; }; +/* + * Add more test cases of instruction intercept here. Elements in this + * table is: + * name/control flag/insn function/type/exit reason/exit qulification/ + * instruction info/field to test + * The last field defines which fields (exit_qual and insn_info) need to be + * tested in exit handler. If set to 0, only reason is checked. + */ static struct insn_table insn_table[] = { // Flags for Primary Processor-Based VM-Execution Controls {HLT, CPU_HLT, insn_hlt, INSN_CPU0, 12, 0, 0, 0}, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer
On Sat, Sep 14, 2013 at 3:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-09-13 19:15, Paolo Bonzini wrote: Il 06/09/2013 04:04, Arthur Chunqi Li ha scritto: +preempt_val_l1 = delta_tsc_l1 preempt_scale; +if (preempt_val_l2 = preempt_val_l1) +preempt_val_l2 = 0; +else +preempt_val_l2 -= preempt_val_l1; +vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); Did you test that a value of 0 triggers an immediate exit, rather than counting down by 2^32? Perhaps it's safer to limit the value to 1 instead of 0. To my experience, 0 triggers immediate exists when the preemption timer is enabled. Yes, L2 VM will exit immediately when the value is 0 with my patch. Arthur Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer
On Sat, Sep 14, 2013 at 1:15 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 06/09/2013 04:04, Arthur Chunqi Li ha scritto: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- ChangeLog to v3: Move nested_adjust_preemption_timer to the latest place just before vmenter. Some minor changes. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 49 +++-- 2 files changed, 48 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..f364d16 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -374,6 +374,8 @@ struct nested_vmx { */ struct page *apic_access_page; u64 msr_ia32_feature_control; + /* Set if vmexit is L2-L1 */ + bool nested_vmx_exit; }; #define POSTED_INTR_ON 0 @@ -2204,7 +2206,17 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high + PIN_BASED_VMX_PREEMPTION_TIMER) || + !(nested_vmx_exit_ctls_high + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { Align this under the other !. Also, I prefer to have one long line for the whole !(... ...) || (and likewise below), but I don't know if Gleb agrees + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); Please remove parentheses around ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, and likewise elsewhere in the patch. + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); + } nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6719,24 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; Should this exit immediately if the preemption timer pin-based control is disabled? Hi Paolo, How can I get pin-based control here from struct kvm_vcpu *vcpu? Arthur + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; Please format this like: delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 = preempt_val_l1) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); Did you test that a value of 0 triggers an immediate exit, rather than counting down by 2^32? Perhaps it's safer to limit the value to 1 instead of 0. +} + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6736,9 +6766,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) vmx-nested.nested_run_pending = 0; if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { + vmx-nested.nested_vmx_exit = true; I think this assignment should be in nested_vmx_vmexit, since it is called from other places as well. nested_vmx_vmexit(vcpu); return 1; } + vmx-nested.nested_vmx_exit = false; if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY; @@ -7132,6 +7164,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu
Re: [PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Sep 15, 2013 at 8:31 PM, Gleb Natapov g...@redhat.com wrote: On Fri, Sep 06, 2013 at 10:04:51AM +0800, Arthur Chunqi Li wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- ChangeLog to v3: Move nested_adjust_preemption_timer to the latest place just before vmenter. Some minor changes. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 49 +++-- 2 files changed, 48 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..f364d16 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -374,6 +374,8 @@ struct nested_vmx { */ struct page *apic_access_page; u64 msr_ia32_feature_control; + /* Set if vmexit is L2-L1 */ + bool nested_vmx_exit; Do not see why it is needed, see bellow. }; #define POSTED_INTR_ON 0 @@ -2204,7 +2206,17 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high + PIN_BASED_VMX_PREEMPTION_TIMER) || + !(nested_vmx_exit_ctls_high + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); + } nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6719,24 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 = preempt_val_l1) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); +} + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6736,9 +6766,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) vmx-nested.nested_run_pending = 0; if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { + vmx-nested.nested_vmx_exit = true; nested_vmx_vmexit(vcpu); return 1; } + vmx-nested.nested_vmx_exit = false; if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY; @@ -7132,6 +7164,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) debugctlmsr = get_debugctlmsr(); vmx-__launched = vmx-loaded_vmcs-launched; + if (is_guest_mode(vcpu) !(vmx-nested.nested_vmx_exit)) How is_guest_mode() and nested_vmx_exi can be both true? The only place nested_vmx_exit is set to true is just before call to nested_vmx_vmexit(). The firs thing nested_vmx_vmexit() does is makes is_guest_mode() false. To enter guest mode again at least one other vmexit from L1 to L0 is needed at which point nested_vmx_exit will be reset to false again. If you want to avoid calling nested_adjust_preemption_timer() during vmlaunch/vmresume emulation (and it looks like this is what you are trying to achieve here) you can check nested_run_pending. Besides vmlaunch/vmresume emulation, every exit from L2-L1 should not call
[RFC PATCH 1/2] kvm-unit-tests: VMX: Add vmentry failed handler to framework
Add vmentry failed handler to vmx framework to catch direct fail of vmentry. When vmlaunch/vmresume directly fail to the next instruction, a entry failed handler is used to handle this failure. Resume failure from entry failed handler will cause entry double fail and directly exit to L1. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- lib/x86/vm.h|3 +++ x86/vmx.c | 34 -- x86/vmx.h | 15 +-- x86/vmx_tests.c | 31 +-- 4 files changed, 57 insertions(+), 26 deletions(-) diff --git a/lib/x86/vm.h b/lib/x86/vm.h index 6e0ce2b..c8565b5 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -19,7 +19,10 @@ #define X86_CR0_PE 0x0001 #define X86_CR0_MP 0x0002 #define X86_CR0_TS 0x0008 +#define X86_CR0_ET 0x0010 #define X86_CR0_WP 0x0001 +#define X86_CR0_NW 0x2000 +#define X86_CR0_CD 0x4000 #define X86_CR0_PG 0x8000 #define X86_CR4_VMXE 0x0001 #define X86_CR4_TSD 0x0004 diff --git a/x86/vmx.c b/x86/vmx.c index 9db4ef4..6a2bf44 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -44,14 +44,6 @@ void report(const char *name, int result) } } -static int make_vmcs_current(struct vmcs *vmcs) -{ - bool ret; - - asm volatile (vmptrld %1; setbe %0 : =q (ret) : m (vmcs) : cc); - return ret; -} - /* entry_sysenter */ asm( .align 4, 0x90\n\t @@ -631,6 +623,7 @@ static int exit_handler() static int vmx_run() { u32 ret = 0, fail = 0; + bool entry_double_fail = false; while (1) { asm volatile ( @@ -657,28 +650,41 @@ static int vmx_run() ); if (fail) - ret = launched ? VMX_TEST_RESUME_ERR : - VMX_TEST_LAUNCH_ERR; + if (entry_double_fail) + ret = launched ? VMX_TEST_RESUME_ERR : + VMX_TEST_LAUNCH_ERR; + else { + ret = current-entry_failed_handler(launched); + if (ret == VMX_TEST_RESUME) { + entry_double_fail = true; + host_rflags = ~(X86_EFLAGS_ZF | + X86_EFLAGS_CF); + } + } else { launched = 1; + entry_double_fail = false; ret = exit_handler(); } if (ret != VMX_TEST_RESUME) break; + ret = fail = 0; } launched = 0; switch (ret) { case VMX_TEST_VMEXIT: return 0; case VMX_TEST_LAUNCH_ERR: - printf(%s : vmlaunch failed.\n, __func__); + printf(%s : vmlaunch failed, entry_double_fail=%d.\n, + __func__, entry_double_fail); if ((!(host_rflags X86_EFLAGS_CF) !(host_rflags X86_EFLAGS_ZF)) || ((host_rflags X86_EFLAGS_CF) (host_rflags X86_EFLAGS_ZF))) printf(\tvmlaunch set wrong flags\n); report(test vmlaunch, 0); break; case VMX_TEST_RESUME_ERR: - printf(%s : vmresume failed.\n, __func__); + printf(%s : vmresume failed, entry_double_fail=%d.\n, + __func__, entry_double_fail); if ((!(host_rflags X86_EFLAGS_CF) !(host_rflags X86_EFLAGS_ZF)) || ((host_rflags X86_EFLAGS_CF) (host_rflags X86_EFLAGS_ZF))) printf(\tvmresume set wrong flags\n); @@ -700,12 +706,12 @@ static int test_run(struct vmx_test *test) return 1; } init_vmcs((test-vmcs)); + current = test; /* Directly call test-init is ok here, init_vmcs has done vmcs init, vmclear and vmptrld*/ if (test-init) - test-init(test-vmcs); + test-init(); test-exits = 0; - current = test; regs = test-guest_regs; vmcs_write(GUEST_RFLAGS, regs.rflags | 0x2); launched = 0; diff --git a/x86/vmx.h b/x86/vmx.h index dc1ebdf..469b4dc 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -4,7 +4,8 @@ #include libcflat.h struct vmcs { - u32 revision_id; /* vmcs revision identifier */ + u32 revision_id:31, /* vmcs revision identifier */ + shadow:1; /* shadow-VMCS indicator */ u32 abort; /* VMX-abort indicator */ /* VMCS data */ char data[0]; @@ -32,10 +33,11 @@ struct regs { struct vmx_test { const char *name; - void (*init)(struct vmcs *vmcs); + void (*init)(); void (*guest_main)(); int (*exit_handler)(); void
[RFC PATCH 2/2] kvm-unit-tests: VMX: Add test cases for vmentry checks
This patch design a framwork to check vmentry fields, then test all supported features in Intel SDM 26.1 and 26.2. Unsupported features are not tested, but listed in the code. To add new tests for vmentry checks, just write functions for test initialization and add related item in vmentry_cases. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h | 22 +- x86/vmx_tests.c | 647 +++ 2 files changed, 668 insertions(+), 1 deletion(-) diff --git a/x86/vmx.h b/x86/vmx.h index 469b4dc..aeee602 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -344,6 +344,8 @@ enum Ctrl_exi { enum Ctrl_ent { ENT_GUEST_64= 1UL 9, + ENT_ENT_SMM = 1UL 10, + ENT_DEATV_DM= 1UL 11, ENT_LOAD_PAT= 1UL 14, ENT_LOAD_EFER = 1UL 15, }; @@ -375,10 +377,13 @@ enum Ctrl0 { }; enum Ctrl1 { + CPU_VIRT_APIC = 1ul 0, CPU_EPT = 1ul 1, + CPU_VIRT_X2APIC = 1ul 4, CPU_VPID= 1ul 5, - CPU_URG = 1ul 7, CPU_WBINVD = 1ul 6, + CPU_URG = 1ul 7, + CPU_VIRT_INTR = 1ul 9, CPU_RDRAND = 1ul 11, CPU_SHADOW = 1ul 14, }; @@ -453,6 +458,7 @@ enum Ctrl1 { #define HYPERCALL_VMEXIT 0x1 #define EPTP_PG_WALK_LEN_SHIFT 3ul +#define EPTP_PG_WALK_LEN_MASK 0x31 #define EPTP_AD_FLAG (1ul 6) #define EPT_MEM_TYPE_UC0ul @@ -460,6 +466,7 @@ enum Ctrl1 { #define EPT_MEM_TYPE_WT4ul #define EPT_MEM_TYPE_WP5ul #define EPT_MEM_TYPE_WB6ul +#define EPT_MEM_TYPE_MASK 0x7 #define EPT_RA 1ul #define EPT_WA 2ul @@ -506,6 +513,19 @@ enum Ctrl1 { #define INVEPT_SINGLE 1 #define INVEPT_GLOBAL 2 +#define INTR_INFO_TYPE_MASK0x0700 +#define INTR_INFO_TYPE_SHIFT 8 +#define INTR_INFO_TYPE_EXT 0 +#define INTR_INFO_TYPE_REV 1 +#define INTR_INFO_TYPE_NMI 2 +#define INTR_INFO_TYPE_HARD_EXP3 +#define INTR_INFO_TYPE_SOFT_INTR 4 +#define INTR_INFO_TYPE_PSE 5 +#define INTR_INFO_TYPE_SOFT_EXP6 +#define INTR_INFO_TYPE_OTHER 7 +#define INTR_INFO_DELIVER_ERR 0x0800 +#define INTR_INFO_VALID0x8000 + extern struct regs regs; extern union vmx_basic basic; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index e95e6b8..4372878 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -15,6 +15,13 @@ bool init_fail; unsigned long *pml4; u64 eptp; void *data_page1, *data_page2; +static u32 cur_test; +volatile static bool test_success; +static u32 phy_addr_width; + +extern struct vmx_test *current; +extern u64 host_rflags; +extern bool launched; static inline void vmcall() { @@ -1113,6 +1120,643 @@ static int ept_exit_handler() return VMX_TEST_VMEXIT; } +static int reset_vmstat(struct vmcs *vmcs) +{ + if (vmcs_clear(current-vmcs)) { + printf(\tERROR : %s : vmcs_clear failed.\n, __func__); + return -1; + } + if (make_vmcs_current(current-vmcs)) { + printf(\tERROR : %s : make_vmcs_current failed.\n, __func__); + return -1; + } + launched = 0; + return 0; +} + +static int vmentry_vmcs_absence() +{ + vmcs_clear(current-vmcs); + return 0; +} + +static int vmentry_vmlaunch_err() +{ + launched = 0; + return 0; +} + +static int vmentry_vmresume_err() +{ + if (reset_vmstat(current-vmcs)) + return -1; + launched = 1; + return 0; +} + +static int vmentry_pin_ctrl() +{ + vmcs_write(PIN_CONTROLS, ~(ctrl_pin_rev.clr)); + return 0; +} + +static int vmentry_cpu0_ctrl() +{ + vmcs_write(CPU_EXEC_CTRL0, ~(ctrl_cpu_rev[0].clr)); + return 0; +} + +static int vmentry_cpu1_ctrl() +{ + u32 ctrl_cpu[2]; + if (!(ctrl_cpu_rev[0].clr CPU_SECONDARY)) { + printf(\t%s : Features are not supported for nested.\n, __func__); + test_success = true; + return 0; + } + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0] | CPU_SECONDARY); + vmcs_write(CPU_EXEC_CTRL1, ~(ctrl_cpu_rev[1].clr)); + return 0; +} + +static int vmentry_cr3_target_count() +{ + vmcs_write(CR3_TARGET_COUNT, 5); + return 0; +} + +static int vmentry_iobmp_invalid1() +{ + u32 ctrl_cpu0; + if (!(ctrl_cpu_rev[0].clr CPU_IO_BITMAP)) { + printf(\t%s : Features are not supported for nested.\n, __func__); + test_success = true; + return 0; + } + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO_BITMAP; + ctrl_cpu0 = (~CPU_IO
[RFC PATCH 0/2] kvm-unit-tests: VMX: vmentry checks
This series implement a framework to capture early exit in vmenter, means vmenter fails to next instruction instead of causing vmexit. Then all supported features referred in Intel SDM 26.1 and 26.2 are tested. Some test cases are commented since they may cause fatal error of KVM, thus will crash test environment and affect the following tests. They are hoped to uncomment after related bugs are fixed. Arthur Chunqi Li (2): kvm-unit-tests: VMX: Add vmentry failed handler to framework kvm-unit-tests: VMX: Add test cases for vmentry checks lib/x86/vm.h|3 + x86/vmx.c | 34 +-- x86/vmx.h | 37 ++- x86/vmx_tests.c | 678 ++- 4 files changed, 725 insertions(+), 27 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] kvm-unit-tests: VMX: vmentry checks
Hi Gleb, Paolo and Jan, There are indeed many checks that fail to check in vmentry, the result in my computer is here: Test suite : vmentry check PASS: No current VMCS vmenter PASS: VMLAUNCH with state not clear PASS: VMRESUME with state not launched PASS: Reserved bits in PIN_CONTROLS field PASS: Reserved bits in primary CPU CONTROLS field PASS: Reserved bits in secondary CPU CONTROLS field FAIL: CR3 target count greater than 4 FAIL: I/O bitmap address invalid (aligned) FAIL: I/O bitmap address invalid (exceed) PASS: MSR bitmap address invalid (aligned) FAIL: MSR bitmap address invalid (exceed) FAIL: Consistency of NMI exiting and virtual NMIs PASS: APIC-accesses address invalid (aligned) FAIL: APIC-accesses address invalid (exceed) FAIL: EPTP memory type FAIL: EPTP page walk length FAIL: EPTP page reserved bits (11:7) PASS: Reserved bits in EXI_CONTROLS field FAIL: Consistency of VMX-preemption timer (activate and save) PASS: Reserved bits in ENT_CONTROLS field PASS: Entry to SMM with processor not in SMM PASS: Deactivate dual-monitor treatment with processor not in SMM PASS: Invalid bits in host CR0 FAIL: Invalid bits in host CR4 FAIL: Invalid bits in host CR3 FAIL: Invalid host sysenter esp addr FAIL: Invalid host sysenter eip addr FAIL: Invalid CS selector field - TI flag FAIL: Invalid TR selector field - TI flag FAIL: Invalid CS selector field - H FAIL: Invalid TR selector field - H FAIL: Invalid base address of FS FAIL: Invalid base address of GS FAIL: Invalid base address of GDTR FAIL: Invalid base address of IDTR FAIL: Invalid base address of TR FAIL: Consistency of EXI_HOST_64 and CR4.PAE SUMMARY: 90 tests, 24 failures Besides, all commented cases also fail, they are: EPTP page reserved bits (63:N) Invalid host PAT Invalid host EFER - bits reserved Invalid host EFER - LMA LME Invalid CS selector field - RPL Invalid TR selector field - RPL You can find detailed description of these cases in Intel SDM 26.1 and 26.2. Arthur On Fri, Sep 13, 2013 at 2:35 PM, Arthur Chunqi Li yzt...@gmail.com wrote: This series implement a framework to capture early exit in vmenter, means vmenter fails to next instruction instead of causing vmexit. Then all supported features referred in Intel SDM 26.1 and 26.2 are tested. Some test cases are commented since they may cause fatal error of KVM, thus will crash test environment and affect the following tests. They are hoped to uncomment after related bugs are fixed. Arthur Chunqi Li (2): kvm-unit-tests: VMX: Add vmentry failed handler to framework kvm-unit-tests: VMX: Add test cases for vmentry checks lib/x86/vm.h|3 + x86/vmx.c | 34 +-- x86/vmx.h | 37 ++- x86/vmx_tests.c | 678 ++- 4 files changed, 725 insertions(+), 27 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The state of vmexit/vmenter MSR store/load in nested vmx
Hi Jan and maillist, Does nest VMX support vmexit MSR store/load and vmenter MSR load now? I tried to set VM-exit MSR-store address with valid address and set VM-exit MSR-store count to 1, then the vmenter fails. Anything else should I set to use these features? Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to recreate MOV-SS blocking vmentry fail
Hi Gleb, Paolo and related folks, I was trying to recreate MOV-SS blocking vmentry fail (Intel SDM 26.1, 5. a). Here the manual refers to Table 24-3, but later in 26.3.1.5 also describe it. I got confused how this scenario can be recreated. Do you have any ideas? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to recreate MOV-SS blocking vmentry fail
On Wed, Sep 11, 2013 at 8:53 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Sep 11, 2013 at 08:49:28PM +0800, Arthur Chunqi Li wrote: Hi Gleb, Paolo and related folks, I was trying to recreate MOV-SS blocking vmentry fail (Intel SDM 26.1, 5. a). Here the manual refers to Table 24-3, but later in 26.3.1.5 also describe it. I got confused how this scenario can be recreated. Do you have any ideas? mov $0, %ss vmlaunch Should these two instructions execute sequentially? Thanks, Arthur -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to recreate MOV-SS blocking vmentry fail
On Wed, Sep 11, 2013 at 9:03 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Sep 11, 2013 at 03:01:07PM +0200, Paolo Bonzini wrote: Il 11/09/2013 14:53, Gleb Natapov ha scritto: I was trying to recreate MOV-SS blocking vmentry fail (Intel SDM 26.1, 5. a). Here the manual refers to Table 24-3, but later in 26.3.1.5 also describe it. I got confused how this scenario can be recreated. Do you have any ideas? mov $0, %ss vmlaunch Probably better to save %ss somewhere around these instructions... :) Details, details :) It can be: mov %ss, tmp mov tmp, $ss vmlaunch Well, this seems hard to test this in our framework ;( vmlaunch is surrounded with many instructions and we cannot add vmlaunch in exit handler. Thanks, Arthur -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/6] KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in load_vmcs12_host_state
On Mon, Sep 2, 2013 at 4:21 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Aug 08, 2013 at 04:26:28PM +0200, Jan Kiszka wrote: Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the Not a typo :) That what Avi asked for do during initial nested VMX review: http://markmail.org/message/hhidqyhbo2mrgxxc But there is at least one transition check that kvm_set_cr0() does that should not be done during vmexit emulation, namely CS.L bit check, so I tend to agree that kvm_set_cr0() is not appropriate here, at lest not as it is. But can we skip other checks kvm_set_cr0() does? For instance what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0 during nested vmexit? What _should_ prevent it is vmentry check from 26.2.4 If the host address-space size VM-exit control is 1, the following must hold: - Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1. Hi Jan and Gleb, Our nested VMX testing framework may not support such testing modes. Here we need to catch the failed result (ZF flag) close to vmresume, but vmresume/vmlaunch is well encapsulated in our framework. If we simply write a vmresume inline function, the VMX will act unexpectedly when it doesn't cause vmresume fail. Do you have any ideas about this? Arthur But I do not see that we do that check on vmentry. What about NW/CD bit checks, or reserved bits checks? 27.5.1 says: The following bits are not modified: For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64 architecture), 28:19, 17, and 15:6; and any bits that are fixed in VMX operation (see Section 23.8). But again current vmexit code does not emulate this properly and just sets everything from host_cr0. vmentry should also preserve all those bit but it looks like it doesn't too. state transition that may prevent loading L1's cr0. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..d001b019 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8185,7 +8185,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, * fpu_active (which may have changed). * Note that vmx_set_cr0 refers to efer set above. */ - kvm_set_cr0(vcpu, vmcs12-host_cr0); + vmx_set_cr0(vcpu, vmcs12-host_cr0); /* * If we did fpu_activate()/fpu_deactivate() during L2's run, we need * to apply the same changes to L1's vmcs. We just set cr0 correctly, -- 1.7.3.4 -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-unit-tests: VMX: Fix two minor bugs
This patch just contains two minor changes to EPT framwork. 1. Reorder macro definition 2. Fix bug of setting CPU_EPT without check. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |2 +- x86/vmx_tests.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index e02183f..dc1ebdf 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -366,9 +366,9 @@ enum Ctrl0 { CPU_NMI_WINDOW = 1ul 22, CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, + CPU_MSR_BITMAP = 1ul 28, CPU_MONITOR = 1ul 29, CPU_PAUSE = 1ul 30, - CPU_MSR_BITMAP = 1ul 28, CPU_SECONDARY = 1ul 31, }; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index e891a9f..0759e10 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -925,7 +925,7 @@ static void ept_init() ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT) ctrl_cpu_rev[1].clr; vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]); - vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT); + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]); if (setup_ept()) init_fail = true; data_page1 = alloc_page(); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: VMX: Fix two minor bugs
Hi Paolo, Sorry but I should trouble you merging these two minor changes to vmx branch. Until now, all the commits in vmx branch seems fine (if others have no comments). Because I have some patches to commit based on vmx branch, should we merge this branch to master or I just commit patches based on vmx? Thanks, Arthur On Wed, Sep 11, 2013 at 11:11 AM, Arthur Chunqi Li yzt...@gmail.com wrote: This patch just contains two minor changes to EPT framwork. 1. Reorder macro definition 2. Fix bug of setting CPU_EPT without check. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |2 +- x86/vmx_tests.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index e02183f..dc1ebdf 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -366,9 +366,9 @@ enum Ctrl0 { CPU_NMI_WINDOW = 1ul 22, CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, + CPU_MSR_BITMAP = 1ul 28, CPU_MONITOR = 1ul 29, CPU_PAUSE = 1ul 30, - CPU_MSR_BITMAP = 1ul 28, CPU_SECONDARY = 1ul 31, }; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index e891a9f..0759e10 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -925,7 +925,7 @@ static void ept_init() ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT) ctrl_cpu_rev[1].clr; vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]); - vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT); + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]); if (setup_ept()) init_fail = true; data_page1 = alloc_page(); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm-unit-tests: VMX: Test nested EPT features
On Mon, Sep 9, 2013 at 3:17 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-09-09 06:57, Arthur Chunqi Li wrote: This series of patches provide the framework of nested EPT and some test cases for nested EPT features. Arthur Chunqi Li (2): kvm-unit-tests: VMX: The framework of EPT for nested VMX testing kvm-unit-tests: VMX: Test cases for nested EPT x86/vmx.c | 159 - x86/vmx.h | 76 x86/vmx_tests.c | 266 +++ 3 files changed, 497 insertions(+), 4 deletions(-) I suppose this is v2 of the previous patch? What is the delta? A meta changelog could go here. Yes, v1 just provide the framework of EPT (similar to the first patch of this series), and some more tests about nested EPT is added in this series (the second patch). Arthur Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: VMX: Test suite for preemption timer
On Mon, Sep 9, 2013 at 8:51 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 05/09/2013 11:22, Arthur Chunqi Li ha scritto: Hi Jan, Gleb and Paolo, It suddenly occurred to me that, if guest's PIN_PREEMPT disabled while EXI_SAVE_PREEMPT_VALUE enabled, what will happen? The preempt value in vmcs will not be affected, yes? Indeed. The VMX preemption timer will not count down, so it will remain set to the same value. Saving it on exit will actually do nothing. This cases fails to test in this patch. You can add it as a follow up. Yep. Because this patch was committed several weeks ago, I have committed a second version, you can just review that patch. Arthur Paolo -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT
On Mon, Sep 9, 2013 at 9:56 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 09/09/2013 06:57, Arthur Chunqi Li ha scritto: Some test cases for nested EPT features, including: 1. EPT basic framework tests: read, write and remap. 2. EPT misconfigurations test cases: page permission mieconfiguration and memory type misconfiguration 3. EPT violations test cases: page permission violation and paging structure violation Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx_tests.c | 266 +++ 1 file changed, 266 insertions(+) diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..a0b9824 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,36 @@ #include vmx.h +#include processor.h +#include vm.h +#include msr.h +#include fwcfg.h + +volatile u32 stage; +volatile bool init_fail; Why volatile? Because init_fail is only set but not used later in ept_init(), and if I don't add volatile, compiler may optimize the setting to init_fail. This occasion firstly occurred when I write set_stage/get_stage. If one variant is set in a function but not used later, the compiler usually optimizes this setting as redundant assignment and remove it. Arthur The patch looks good. +unsigned long *pml4; +u64 eptp; +void *data_page1, *data_page2; + +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} + +static inline void vmcall() +{ + asm volatile (vmcall); +} void basic_init() { @@ -76,6 +108,238 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +static int setup_ept() +{ + int support_2m; + unsigned long end_of_memory; + + if (!(ept_vpid.val EPT_CAP_UC) + !(ept_vpid.val EPT_CAP_WB)) { + printf(\tEPT paging-structure memory type + UCWB are not supported\n); + return 1; + } + if (ept_vpid.val EPT_CAP_UC) + eptp = EPT_MEM_TYPE_UC; + else + eptp = EPT_MEM_TYPE_WB; + if (!(ept_vpid.val EPT_CAP_PWL4)) { + printf(\tPWL4 is not supported\n); + return 1; + } + eptp |= (3 EPTP_PG_WALK_LEN_SHIFT); + pml4 = alloc_page(); + memset(pml4, 0, PAGE_SIZE); + eptp |= virt_to_phys(pml4); + vmcs_write(EPTP, eptp); + support_2m = !!(ept_vpid.val EPT_CAP_2M_PAGE); + end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE); + if (end_of_memory (1ul 32)) + end_of_memory = (1ul 32); + if (setup_ept_range(pml4, 0, end_of_memory, + 0, support_2m, EPT_WA | EPT_RA | EPT_EA)) { + printf(\tSet ept tables failed.\n); + return 1; + } + return 0; +} + +static void ept_init() +{ + u32 ctrl_cpu[2]; + + init_fail = false; + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1); + ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY) + ctrl_cpu_rev[0].clr; + ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT) + ctrl_cpu_rev[1].clr; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]); + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT); + if (setup_ept()) + init_fail = true; + data_page1 = alloc_page(); + data_page2 = alloc_page(); + memset(data_page1, 0x0, PAGE_SIZE); + memset(data_page2, 0x0, PAGE_SIZE); + *((u32 *)data_page1) = MAGIC_VAL_1; + *((u32 *)data_page2) = MAGIC_VAL_2; + install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2, + EPT_RA | EPT_WA | EPT_EA); +} + +static void ept_main() +{ + if (init_fail) + return; + if (!(ctrl_cpu_rev[0].clr CPU_SECONDARY) + !(ctrl_cpu_rev[1].clr CPU_EPT)) { + printf(\tEPT is not supported); + return; + } + set_stage(0); + if (*((u32 *)data_page2) != MAGIC_VAL_1 + *((u32 *)data_page1) != MAGIC_VAL_1) + report(EPT basic framework - read, 0); + else { + *((u32 *)data_page2) = MAGIC_VAL_3; + vmcall(); + if (get_stage() == 1) { + if (*((u32 *)data_page1) == MAGIC_VAL_3 + *((u32 *)data_page2) == MAGIC_VAL_2) + report(EPT basic framework, 1); + else + report(EPT basic framework - remap, 1); + } + } + // Test EPT Misconfigurations + set_stage(1); + vmcall(); + *((u32 *)data_page1) = MAGIC_VAL_1; + if (get_stage() != 2) { + report(EPT misconfigurations, 0); + goto t1; + } + set_stage(2); + vmcall
Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT
On Mon, Sep 9, 2013 at 12:57 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Some test cases for nested EPT features, including: 1. EPT basic framework tests: read, write and remap. 2. EPT misconfigurations test cases: page permission mieconfiguration and memory type misconfiguration 3. EPT violations test cases: page permission violation and paging structure violation Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx_tests.c | 266 +++ 1 file changed, 266 insertions(+) diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..a0b9824 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,36 @@ #include vmx.h +#include processor.h +#include vm.h +#include msr.h +#include fwcfg.h + +volatile u32 stage; +volatile bool init_fail; +unsigned long *pml4; +u64 eptp; +void *data_page1, *data_page2; + +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} + +static inline void vmcall() +{ + asm volatile (vmcall); +} void basic_init() { @@ -76,6 +108,238 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +static int setup_ept() +{ + int support_2m; + unsigned long end_of_memory; + + if (!(ept_vpid.val EPT_CAP_UC) + !(ept_vpid.val EPT_CAP_WB)) { + printf(\tEPT paging-structure memory type + UCWB are not supported\n); + return 1; + } + if (ept_vpid.val EPT_CAP_UC) + eptp = EPT_MEM_TYPE_UC; + else + eptp = EPT_MEM_TYPE_WB; + if (!(ept_vpid.val EPT_CAP_PWL4)) { + printf(\tPWL4 is not supported\n); + return 1; + } + eptp |= (3 EPTP_PG_WALK_LEN_SHIFT); + pml4 = alloc_page(); + memset(pml4, 0, PAGE_SIZE); + eptp |= virt_to_phys(pml4); + vmcs_write(EPTP, eptp); + support_2m = !!(ept_vpid.val EPT_CAP_2M_PAGE); + end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE); + if (end_of_memory (1ul 32)) + end_of_memory = (1ul 32); + if (setup_ept_range(pml4, 0, end_of_memory, + 0, support_2m, EPT_WA | EPT_RA | EPT_EA)) { + printf(\tSet ept tables failed.\n); + return 1; + } + return 0; +} + +static void ept_init() +{ + u32 ctrl_cpu[2]; + + init_fail = false; + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1); + ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY) +ctrl_cpu_rev[0].clr; + ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT) +ctrl_cpu_rev[1].clr; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]); + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT); + if (setup_ept()) + init_fail = true; + data_page1 = alloc_page(); + data_page2 = alloc_page(); + memset(data_page1, 0x0, PAGE_SIZE); + memset(data_page2, 0x0, PAGE_SIZE); + *((u32 *)data_page1) = MAGIC_VAL_1; + *((u32 *)data_page2) = MAGIC_VAL_2; + install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2, + EPT_RA | EPT_WA | EPT_EA); +} + +static void ept_main() +{ + if (init_fail) + return; + if (!(ctrl_cpu_rev[0].clr CPU_SECONDARY) +!(ctrl_cpu_rev[1].clr CPU_EPT)) { + printf(\tEPT is not supported); + return; + } + set_stage(0); + if (*((u32 *)data_page2) != MAGIC_VAL_1 + *((u32 *)data_page1) != MAGIC_VAL_1) + report(EPT basic framework - read, 0); + else { + *((u32 *)data_page2) = MAGIC_VAL_3; + vmcall(); + if (get_stage() == 1) { + if (*((u32 *)data_page1) == MAGIC_VAL_3 + *((u32 *)data_page2) == MAGIC_VAL_2) + report(EPT basic framework, 1); + else + report(EPT basic framework - remap, 1); + } + } + // Test EPT Misconfigurations + set_stage(1); + vmcall(); + *((u32 *)data_page1) = MAGIC_VAL_1; + if (get_stage() != 2) { + report(EPT misconfigurations, 0); + goto t1; + } + set_stage(2); + vmcall(); + *((u32 *)data_page1) = MAGIC_VAL_1; + if (get_stage() != 3) { + report(EPT misconfigurations, 0); + goto t1; + } + report(EPT misconfigurations, 1); +t1: + // Test EPT violation
[PATCH] kvm-unit-tests: VMX: Fix some nested EPT related bugs
This patch fix 3 bugs in VMX framework and EPT framework 1. Fix bug of setting default value of CPU_SECONDARY 2. Fix bug of reading MSR_IA32_VMX_PROCBASED_CTLS2 and MSR_IA32_VMX_EPT_VPID_CAP 3. For EPT violation and misconfiguration reduced vmexit, vmcs field VM-exit instruction length is not used and will return unexpected value when read. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 13 ++--- x86/vmx_tests.c |2 -- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index 87d1d55..9db4ef4 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -304,7 +304,8 @@ static void init_vmcs_ctrl(void) /* Disable VMEXIT of IO instruction */ vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]); if (ctrl_cpu_rev[0].set CPU_SECONDARY) { - ctrl_cpu[1] |= ctrl_cpu_rev[1].set ctrl_cpu_rev[1].clr; + ctrl_cpu[1] = (ctrl_cpu[1] | ctrl_cpu_rev[1].set) + ctrl_cpu_rev[1].clr; vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]); } vmcs_write(CR3_TARGET_COUNT, 0); @@ -489,8 +490,14 @@ static void init_vmx(void) : MSR_IA32_VMX_ENTRY_CTLS); ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC : MSR_IA32_VMX_PROCBASED_CTLS); - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); - ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP); + if ((ctrl_cpu_rev[0].clr CPU_SECONDARY) != 0) + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); + else + ctrl_cpu_rev[1].val = 0; + if ((ctrl_cpu_rev[1].clr (CPU_EPT | CPU_VPID)) != 0) + ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP); + else + ept_vpid.val = 0; write_cr0((read_cr0() fix_cr0_clr) | fix_cr0_set); write_cr4((read_cr4() fix_cr4_clr) | fix_cr4_set | X86_CR4_VMXE); diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 6d972c0..e891a9f 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1075,7 +1075,6 @@ static int ept_exit_handler() print_vmexit_info(); return VMX_TEST_VMEXIT; } - vmcs_write(GUEST_RIP, guest_rip + insn_len); return VMX_TEST_RESUME; case VMX_EPT_VIOLATION: switch(get_stage()) { @@ -1100,7 +1099,6 @@ static int ept_exit_handler() print_vmexit_info(); return VMX_TEST_VMEXIT; } - vmcs_write(GUEST_RIP, guest_rip + insn_len); return VMX_TEST_RESUME; default: printf(Unknown exit reason, %d\n, reason); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Correct way of tracking reads on given gfn ?
On Mon, Sep 9, 2013 at 8:29 PM, Gleb Natapov g...@redhat.com wrote: On Mon, Sep 09, 2013 at 12:53:02PM +0200, Paolo Bonzini wrote: Il 09/09/2013 12:22, SPA ha scritto: Thanks Paolo. Is there a way where reads would trap ? I explored a bit on PM_PRESENT_MASK. Though its not READ bit, but a PRESENT bit, it looks like it should generate traps on reads if this bit is reset. From code, looks like rmap_write_protect() like function I stated in previous mail should do. Would this approach work ? Are there any glaring problems with this approach ? I cannot say right away. Another way could be to set reserved bits to generate EPT misconfigurations. See ept_set_mmio_spte_mask and is_mmio_spte. This would trap both reads and writes. Dropping all sptes will also work, but trapping each read access will be dog slow. QEMU emulation will be much faster. Hi Gleb, I'm interested in this topic, what do you mean by QEMU emulation? Do you mean the functions in arch/x86/kvm/emulate.c? In what scenario will KVM call these functions? Thanks, Arthur -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT
Some test cases for nested EPT features, including: 1. EPT basic framework tests: read, write and remap. 2. EPT misconfigurations test cases: page permission mieconfiguration and memory type misconfiguration 3. EPT violations test cases: page permission violation and paging structure violation Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx_tests.c | 266 +++ 1 file changed, 266 insertions(+) diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..a0b9824 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,36 @@ #include vmx.h +#include processor.h +#include vm.h +#include msr.h +#include fwcfg.h + +volatile u32 stage; +volatile bool init_fail; +unsigned long *pml4; +u64 eptp; +void *data_page1, *data_page2; + +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} + +static inline void vmcall() +{ + asm volatile (vmcall); +} void basic_init() { @@ -76,6 +108,238 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +static int setup_ept() +{ + int support_2m; + unsigned long end_of_memory; + + if (!(ept_vpid.val EPT_CAP_UC) + !(ept_vpid.val EPT_CAP_WB)) { + printf(\tEPT paging-structure memory type + UCWB are not supported\n); + return 1; + } + if (ept_vpid.val EPT_CAP_UC) + eptp = EPT_MEM_TYPE_UC; + else + eptp = EPT_MEM_TYPE_WB; + if (!(ept_vpid.val EPT_CAP_PWL4)) { + printf(\tPWL4 is not supported\n); + return 1; + } + eptp |= (3 EPTP_PG_WALK_LEN_SHIFT); + pml4 = alloc_page(); + memset(pml4, 0, PAGE_SIZE); + eptp |= virt_to_phys(pml4); + vmcs_write(EPTP, eptp); + support_2m = !!(ept_vpid.val EPT_CAP_2M_PAGE); + end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE); + if (end_of_memory (1ul 32)) + end_of_memory = (1ul 32); + if (setup_ept_range(pml4, 0, end_of_memory, + 0, support_2m, EPT_WA | EPT_RA | EPT_EA)) { + printf(\tSet ept tables failed.\n); + return 1; + } + return 0; +} + +static void ept_init() +{ + u32 ctrl_cpu[2]; + + init_fail = false; + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1); + ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY) +ctrl_cpu_rev[0].clr; + ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT) +ctrl_cpu_rev[1].clr; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]); + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT); + if (setup_ept()) + init_fail = true; + data_page1 = alloc_page(); + data_page2 = alloc_page(); + memset(data_page1, 0x0, PAGE_SIZE); + memset(data_page2, 0x0, PAGE_SIZE); + *((u32 *)data_page1) = MAGIC_VAL_1; + *((u32 *)data_page2) = MAGIC_VAL_2; + install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2, + EPT_RA | EPT_WA | EPT_EA); +} + +static void ept_main() +{ + if (init_fail) + return; + if (!(ctrl_cpu_rev[0].clr CPU_SECONDARY) +!(ctrl_cpu_rev[1].clr CPU_EPT)) { + printf(\tEPT is not supported); + return; + } + set_stage(0); + if (*((u32 *)data_page2) != MAGIC_VAL_1 + *((u32 *)data_page1) != MAGIC_VAL_1) + report(EPT basic framework - read, 0); + else { + *((u32 *)data_page2) = MAGIC_VAL_3; + vmcall(); + if (get_stage() == 1) { + if (*((u32 *)data_page1) == MAGIC_VAL_3 + *((u32 *)data_page2) == MAGIC_VAL_2) + report(EPT basic framework, 1); + else + report(EPT basic framework - remap, 1); + } + } + // Test EPT Misconfigurations + set_stage(1); + vmcall(); + *((u32 *)data_page1) = MAGIC_VAL_1; + if (get_stage() != 2) { + report(EPT misconfigurations, 0); + goto t1; + } + set_stage(2); + vmcall(); + *((u32 *)data_page1) = MAGIC_VAL_1; + if (get_stage() != 3) { + report(EPT misconfigurations, 0); + goto t1; + } + report(EPT misconfigurations, 1); +t1: + // Test EPT violation + set_stage(3); + vmcall(); + *((u32 *)data_page1) = MAGIC_VAL_1; + if (get_stage() == 4) + report(EPT violation - page permission, 1); + else + report(EPT violation - page permission, 0
[PATCH 1/2] kvm-unit-tests: VMX: The framework of EPT for nested VMX testing
The framework of EPT for nested VMX, including functions to build up EPT paging structures, read/set EPT PTEs and setup a range of 1:1 map EPT. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 159 +++-- x86/vmx.h | 76 + 2 files changed, 231 insertions(+), 4 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..87d1d55 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -143,6 +143,159 @@ asm( call hypercall\n\t ); +/* EPT paging structure related functions */ +/* install_ept_entry : Install a page to a given level in EPT + @pml4 : addr of pml4 table + @pte_level : level of PTE to set + @guest_addr : physical address of guest + @pte : pte value to set + @pt_page : address of page table, NULL for a new page + */ +void install_ept_entry(unsigned long *pml4, + int pte_level, + unsigned long guest_addr, + unsigned long pte, + unsigned long *pt_page) +{ + int level; + unsigned long *pt = pml4; + unsigned offset; + + for (level = EPT_PAGE_LEVEL; level pte_level; --level) { + offset = (guest_addr ((level-1) * EPT_PGDIR_WIDTH + 12)) +EPT_PGDIR_MASK; + if (!(pt[offset] (EPT_PRESENT))) { + unsigned long *new_pt = pt_page; + if (!new_pt) + new_pt = alloc_page(); + else + pt_page = 0; + memset(new_pt, 0, PAGE_SIZE); + pt[offset] = virt_to_phys(new_pt) + | EPT_RA | EPT_WA | EPT_EA; + } + pt = phys_to_virt(pt[offset] 0xff000ull); + } + offset = ((unsigned long)guest_addr ((level-1) * + EPT_PGDIR_WIDTH + 12)) EPT_PGDIR_MASK; + pt[offset] = pte; +} + +/* Map a page, @perm is the permission of the page */ +void install_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 1, guest_addr, (phys PAGE_MASK) | perm, 0); +} + +/* Map a 1G-size page */ +void install_1g_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 3, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* Map a 2M-size page */ +void install_2m_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 2, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging structure. + @start : start address of guest page + @len : length of address to be mapped + @map_1g : whether 1G page map is used + @map_2m : whether 2M page map is used + @perm : permission for every page + */ +int setup_ept_range(unsigned long *pml4, unsigned long start, + unsigned long len, int map_1g, int map_2m, u64 perm) +{ + u64 phys = start; + u64 max = (u64)len + (u64)start; + + if (map_1g) { + while (phys + PAGE_SIZE_1G = max) { + install_1g_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_1G; + } + } + if (map_2m) { + while (phys + PAGE_SIZE_2M = max) { + install_2m_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_2M; + } + } + while (phys + PAGE_SIZE = max) { + install_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE; + } + return 0; +} + +/* get_ept_pte : Get the PTE of a given level in EPT, +@level == 1 means get the latest level*/ +unsigned long get_ept_pte(unsigned long *pml4, + unsigned long guest_addr, int level) +{ + int l; + unsigned long *pt = pml4, pte; + unsigned offset; + + for (l = EPT_PAGE_LEVEL; l 1; --l) { + offset = (guest_addr (((l-1) * EPT_PGDIR_WIDTH) + 12)) +EPT_PGDIR_MASK; + pte = pt[offset]; + if (!(pte (EPT_PRESENT))) + return 0; + if (l == level) + return pte; + if (l 4 (pte EPT_LARGE_PAGE)) + return pte; + pt = (unsigned long *)(pte 0xff000ull); + } + offset = (guest_addr (((l-1) * EPT_PGDIR_WIDTH) + 12)) +EPT_PGDIR_MASK
[PATCH 0/2] kvm-unit-tests: VMX: Test nested EPT features
This series of patches provide the framework of nested EPT and some test cases for nested EPT features. Arthur Chunqi Li (2): kvm-unit-tests: VMX: The framework of EPT for nested VMX testing kvm-unit-tests: VMX: Test cases for nested EPT x86/vmx.c | 159 - x86/vmx.h | 76 x86/vmx_tests.c | 266 +++ 3 files changed, 497 insertions(+), 4 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer
On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote: Arthur Chunqi Li wrote on 2013-09-04: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- This series depends on queue. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 51 ++--- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..870caa8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); The following logic is more clearly: if(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) nested_vmx_exit_ctls_high |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Here I have such consideration: this logic is wrong if CPU support PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does occurs. So the codes above reads the MSR and reserves the features it supports, and here I just check if these two features are supported simultaneously. You remind that this piece of codes can write like this: if (!(nested_vmx_pin_based_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) || !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { nested_vmx_exit_ctls_high =(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); nested_vmx_pinbased_ctls_high = (~PIN_BASED_VMX_PREEMPTION_TIMER); } This may reflect the logic I describe above that these two flags should support simultaneously, and brings less confusion. BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's capability when expose those vmx features(not just preemption timer) to L1. The codes just above here, when setting pinbased control for nested vmx, it firstly rdmsr MSR_IA32_VMX_PINBASED_CTLS, then use this to mask the features hardware not support. So does other control fields. nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) { + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 - preempt_val_l1 0) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); } /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exit_reason = vmx-exit_reason; u32 vectoring_info = vmx-idt_vectoring_info; + int ret; /* If guest state is invalid, start emulating */ if (vmx-emulation_required) @@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) if (exit_reason kvm_vmx_max_exit_handlers
Re: [PATCH] kvm-unit-tests: VMX: Test suite for preemption timer
Hi Jan, Gleb and Paolo, It suddenly occurred to me that, if guest's PIN_PREEMPT disabled while EXI_SAVE_PREEMPT_VALUE enabled, what will happen? The preempt value in vmcs will not be affected, yes? This cases fails to test in this patch. Arthur On Wed, Sep 4, 2013 at 11:26 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Test cases for preemption timer in nested VMX. Two aspects are tested: 1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL 2. Test a relevant bug of KVM. The bug will not save preemption timer value if exit L2-L0 for some reason and enter L0-L2. Thus preemption timer will never trigger if the value is large enough. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |3 ++ x86/vmx_tests.c | 117 +++ 2 files changed, 120 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..ebc8cfd 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -210,6 +210,7 @@ enum Encoding { GUEST_ACTV_STATE= 0x4826ul, GUEST_SMBASE= 0x4828ul, GUEST_SYSENTER_CS = 0x482aul, + PREEMPT_TIMER_VALUE = 0x482eul, /* 32-Bit Host State Fields */ HOST_SYSENTER_CS= 0x4c00ul, @@ -331,6 +332,7 @@ enum Ctrl_exi { EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, EXI_LOAD_EFER = 1UL 21, + EXI_SAVE_PREEMPT= 1UL 22, }; enum Ctrl_ent { @@ -342,6 +344,7 @@ enum Ctrl_pin { PIN_EXTINT = 1ul 0, PIN_NMI = 1ul 3, PIN_VIRT_NMI= 1ul 5, + PIN_PREEMPT = 1ul 6, }; enum Ctrl0 { diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..d358148 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,30 @@ #include vmx.h +#include msr.h +#include processor.h + +volatile u32 stage; + +static inline void vmcall() +{ + asm volatile(vmcall); +} + +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} void basic_init() { @@ -76,6 +102,95 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +u32 preempt_scale; +volatile unsigned long long tsc_val; +volatile u32 preempt_val; + +void preemption_timer_init() +{ + u32 ctrl_pin; + + ctrl_pin = vmcs_read(PIN_CONTROLS) | PIN_PREEMPT; + ctrl_pin = ctrl_pin_rev.clr; + vmcs_write(PIN_CONTROLS, ctrl_pin); + preempt_val = 1000; + vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); + preempt_scale = rdmsr(MSR_IA32_VMX_MISC) 0x1F; +} + +void preemption_timer_main() +{ + tsc_val = rdtsc(); + if (!(ctrl_pin_rev.clr PIN_PREEMPT)) { + printf(\tPreemption timer is not supported\n); + return; + } + if (!(ctrl_exit_rev.clr EXI_SAVE_PREEMPT)) + printf(\tSave preemption value is not supported\n); + else { + set_stage(0); + vmcall(); + if (get_stage() == 1) + vmcall(); + } + while (1) { + if (((rdtsc() - tsc_val) preempt_scale) +10 * preempt_val) { + report(Preemption timer, 0); + break; + } + } +} + +int preemption_timer_exit_handler() +{ + u64 guest_rip; + ulong reason; + u32 insn_len; + u32 ctrl_exit; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + insn_len = vmcs_read(EXI_INST_LEN); + switch (reason) { + case VMX_PREEMPT: + if (((rdtsc() - tsc_val) preempt_scale) preempt_val) + report(Preemption timer, 0); + else + report(Preemption timer, 1); + return VMX_TEST_VMEXIT; + case VMX_VMCALL: + switch (get_stage()) { + case 0: + if (vmcs_read(PREEMPT_TIMER_VALUE) != preempt_val) + report(Save preemption value, 0); + else { + set_stage(get_stage() + 1); + ctrl_exit = (vmcs_read(EXI_CONTROLS) | + EXI_SAVE_PREEMPT) ctrl_exit_rev.clr; + vmcs_write(EXI_CONTROLS, ctrl_exit); + } + break; + case 1: + if (vmcs_read(PREEMPT_TIMER_VALUE) = preempt_val) + report(Save preemption value, 0); + else
Re: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer
On Thu, Sep 5, 2013 at 5:24 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote: Arthur Chunqi Li wrote on 2013-09-05: On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote: Arthur Chunqi Li wrote on 2013-09-04: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- This series depends on queue. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 51 ++--- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..870caa8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); The following logic is more clearly: if(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) nested_vmx_exit_ctls_high |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Here I have such consideration: this logic is wrong if CPU support PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does occurs. So the codes above reads the MSR and reserves the features it supports, and here I just check if these two features are supported simultaneously. No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on PIN_BASED_VMX_PREEMPTION_TIMER. PIN_BASED_VMX_PREEMPTION_TIMER is an independent feature You remind that this piece of codes can write like this: if (!(nested_vmx_pin_based_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) || !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { nested_vmx_exit_ctls_high =(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); nested_vmx_pinbased_ctls_high = (~PIN_BASED_VMX_PREEMPTION_TIMER); } This may reflect the logic I describe above that these two flags should support simultaneously, and brings less confusion. BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's capability when expose those vmx features(not just preemption timer) to L1. The codes just above here, when setting pinbased control for nested vmx, it firstly rdmsr MSR_IA32_VMX_PINBASED_CTLS, then use this to mask the features hardware not support. So does other control fields. Yes, I saw it. nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) { + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 - preempt_val_l1 0) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); } /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx
Re: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer
On Thu, Sep 5, 2013 at 7:05 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote: Arthur Chunqi Li wrote on 2013-09-05: Arthur Chunqi Li wrote on 2013-09-05: On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote: Arthur Chunqi Li wrote on 2013-09-04: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- This series depends on queue. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 51 ++--- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..870caa8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); The following logic is more clearly: if(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) nested_vmx_exit_ctls_high |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Here I have such consideration: this logic is wrong if CPU support PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does occurs. So the codes above reads the MSR and reserves the features it supports, and here I just check if these two features are supported simultaneously. No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on PIN_BASED_VMX_PREEMPTION_TIMER. PIN_BASED_VMX_PREEMPTION_TIMER is an independent feature You remind that this piece of codes can write like this: if (!(nested_vmx_pin_based_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) || !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { nested_vmx_exit_ctls_high =(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); nested_vmx_pinbased_ctls_high = (~PIN_BASED_VMX_PREEMPTION_TIMER); } This may reflect the logic I describe above that these two flags should support simultaneously, and brings less confusion. BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's capability when expose those vmx features(not just preemption timer) to L1. The codes just above here, when setting pinbased control for nested vmx, it firstly rdmsr MSR_IA32_VMX_PINBASED_CTLS, then use this to mask the features hardware not support. So does other control fields. Yes, I saw it. nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) { + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 - preempt_val_l1 0) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); } /* * The guest has exited. See if we can fix it or if we need userspace
[PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- ChangeLog to v3: Move nested_adjust_preemption_timer to the latest place just before vmenter. Some minor changes. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 49 +++-- 2 files changed, 48 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..f364d16 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -374,6 +374,8 @@ struct nested_vmx { */ struct page *apic_access_page; u64 msr_ia32_feature_control; + /* Set if vmexit is L2-L1 */ + bool nested_vmx_exit; }; #define POSTED_INTR_ON 0 @@ -2204,7 +2206,17 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high + PIN_BASED_VMX_PREEMPTION_TIMER) || + !(nested_vmx_exit_ctls_high + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); + } nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6719,24 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 = preempt_val_l1) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); +} + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6736,9 +6766,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) vmx-nested.nested_run_pending = 0; if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { + vmx-nested.nested_vmx_exit = true; nested_vmx_vmexit(vcpu); return 1; } + vmx-nested.nested_vmx_exit = false; if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY; @@ -7132,6 +7164,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) debugctlmsr = get_debugctlmsr(); vmx-__launched = vmx-loaded_vmcs-launched; + if (is_guest_mode(vcpu) !(vmx-nested.nested_vmx_exit)) + nested_adjust_preemption_timer(vcpu); asm( /* Store host registers */ push %% _ASM_DX ; push %% _ASM_BP ; @@ -7518,6 +7552,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7691,7 +7726,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control
[PATCH v2] kvm-unit-tests: VMX: Test suite for preemption timer
Test cases for preemption timer in nested VMX. Two aspects are tested: 1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL 2. Test a relevant bug of KVM. The bug will not save preemption timer value if exit L2-L0 for some reason and enter L0-L2. Thus preemption timer will never trigger if the value is large enough. 3. Some other aspects are tested, e.g. preempt without save, preempt when value is 0. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- ChangeLog to v1: 1. Add test of EXI_SAVE_PREEMPT enable and PIN_PREEMPT disable 2. Add test of PIN_PREEMPT enable and EXI_SAVE_PREEMPT enable/disable 3. Add test of preemption value is 0 x86/vmx.h |3 + x86/vmx_tests.c | 175 +++ 2 files changed, 178 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..ebc8cfd 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -210,6 +210,7 @@ enum Encoding { GUEST_ACTV_STATE= 0x4826ul, GUEST_SMBASE= 0x4828ul, GUEST_SYSENTER_CS = 0x482aul, + PREEMPT_TIMER_VALUE = 0x482eul, /* 32-Bit Host State Fields */ HOST_SYSENTER_CS= 0x4c00ul, @@ -331,6 +332,7 @@ enum Ctrl_exi { EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, EXI_LOAD_EFER = 1UL 21, + EXI_SAVE_PREEMPT= 1UL 22, }; enum Ctrl_ent { @@ -342,6 +344,7 @@ enum Ctrl_pin { PIN_EXTINT = 1ul 0, PIN_NMI = 1ul 3, PIN_VIRT_NMI= 1ul 5, + PIN_PREEMPT = 1ul 6, }; enum Ctrl0 { diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..2e32031 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,30 @@ #include vmx.h +#include msr.h +#include processor.h + +volatile u32 stage; + +static inline void vmcall() +{ + asm volatile(vmcall); +} + +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} void basic_init() { @@ -76,6 +102,153 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +u32 preempt_scale; +volatile unsigned long long tsc_val; +volatile u32 preempt_val; + +void preemption_timer_init() +{ + u32 ctrl_exit; + + // Enable EXI_SAVE_PREEMPT with PIN_PREEMPT dieabled + ctrl_exit = (vmcs_read(EXI_CONTROLS) | + EXI_SAVE_PREEMPT) ctrl_exit_rev.clr; + vmcs_write(EXI_CONTROLS, ctrl_exit); + preempt_val = 1000; + vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); + set_stage(0); + preempt_scale = rdmsr(MSR_IA32_VMX_MISC) 0x1F; +} + +void preemption_timer_main() +{ + int i, j; + + if (!(ctrl_pin_rev.clr PIN_PREEMPT)) { + printf(\tPreemption timer is not supported\n); + return; + } + if (!(ctrl_exit_rev.clr EXI_SAVE_PREEMPT)) + printf(\tSave preemption value is not supported\n); + else { + // Test EXI_SAVE_PREEMPT enable and PIN_PREEMPT disable + set_stage(0); + // Consume enough time to let L2-L0-L2 occurs + for(i = 0; i 10; i++) + for (j = 0; j 1; j++); + vmcall(); + // Test PIN_PREEMPT enable and EXI_SAVE_PREEMPT enable/disable + set_stage(1); + vmcall(); + // Test both enable + if (get_stage() == 2) + vmcall(); + } + // Test the bug of reseting preempt value when L2-L0-L2 + set_stage(3); + vmcall(); + tsc_val = rdtsc(); + while (1) { + if (((rdtsc() - tsc_val) preempt_scale) +10 * preempt_val) { + report(Preemption timer timeout, 0); + break; + } + if (get_stage() == 4) + break; + } + // Test preempt val is 0 + set_stage(4); + report(Preemption timer, val=0, 0); +} + +int preemption_timer_exit_handler() +{ + u64 guest_rip; + ulong reason; + u32 insn_len; + u32 ctrl_exit; + u32 ctrl_pin; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + insn_len = vmcs_read(EXI_INST_LEN); + switch (reason) { + case VMX_PREEMPT: + switch (get_stage()) { + case 3: + if (((rdtsc() - tsc_val) preempt_scale) preempt_val) + report(Preemption timer timeout, 0); + else + report(Preemption timer timeout, 1); + set_stage(get_stage() + 1); + break; + case 4
[PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- This series depends on queue. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 51 ++--- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..870caa8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 - preempt_val_l1 0) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); +} /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exit_reason = vmx-exit_reason; u32 vectoring_info = vmx-idt_vectoring_info; + int ret; /* If guest state is invalid, start emulating */ if (vmx-emulation_required) @@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) if (exit_reason kvm_vmx_max_exit_handlers kvm_vmx_exit_handlers[exit_reason]) - return kvm_vmx_exit_handlers[exit_reason](vcpu); + ret = kvm_vmx_exit_handlers[exit_reason](vcpu); else { vcpu-run-exit_reason = KVM_EXIT_UNKNOWN; vcpu-run-hw.hardware_exit_reason = exit_reason; + ret = 0; } - return 0; + if (is_guest_mode(vcpu)) + nested_adjust_preemption_timer(vcpu); + return ret; } static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -7518,6 +7546,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7691,7 +7720,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER
[PATCH] kvm-unit-tests: VMX: Test suite for preemption timer
Test cases for preemption timer in nested VMX. Two aspects are tested: 1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL 2. Test a relevant bug of KVM. The bug will not save preemption timer value if exit L2-L0 for some reason and enter L0-L2. Thus preemption timer will never trigger if the value is large enough. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |3 ++ x86/vmx_tests.c | 117 +++ 2 files changed, 120 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..ebc8cfd 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -210,6 +210,7 @@ enum Encoding { GUEST_ACTV_STATE= 0x4826ul, GUEST_SMBASE= 0x4828ul, GUEST_SYSENTER_CS = 0x482aul, + PREEMPT_TIMER_VALUE = 0x482eul, /* 32-Bit Host State Fields */ HOST_SYSENTER_CS= 0x4c00ul, @@ -331,6 +332,7 @@ enum Ctrl_exi { EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, EXI_LOAD_EFER = 1UL 21, + EXI_SAVE_PREEMPT= 1UL 22, }; enum Ctrl_ent { @@ -342,6 +344,7 @@ enum Ctrl_pin { PIN_EXTINT = 1ul 0, PIN_NMI = 1ul 3, PIN_VIRT_NMI= 1ul 5, + PIN_PREEMPT = 1ul 6, }; enum Ctrl0 { diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..d358148 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,30 @@ #include vmx.h +#include msr.h +#include processor.h + +volatile u32 stage; + +static inline void vmcall() +{ + asm volatile(vmcall); +} + +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} void basic_init() { @@ -76,6 +102,95 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +u32 preempt_scale; +volatile unsigned long long tsc_val; +volatile u32 preempt_val; + +void preemption_timer_init() +{ + u32 ctrl_pin; + + ctrl_pin = vmcs_read(PIN_CONTROLS) | PIN_PREEMPT; + ctrl_pin = ctrl_pin_rev.clr; + vmcs_write(PIN_CONTROLS, ctrl_pin); + preempt_val = 1000; + vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); + preempt_scale = rdmsr(MSR_IA32_VMX_MISC) 0x1F; +} + +void preemption_timer_main() +{ + tsc_val = rdtsc(); + if (!(ctrl_pin_rev.clr PIN_PREEMPT)) { + printf(\tPreemption timer is not supported\n); + return; + } + if (!(ctrl_exit_rev.clr EXI_SAVE_PREEMPT)) + printf(\tSave preemption value is not supported\n); + else { + set_stage(0); + vmcall(); + if (get_stage() == 1) + vmcall(); + } + while (1) { + if (((rdtsc() - tsc_val) preempt_scale) +10 * preempt_val) { + report(Preemption timer, 0); + break; + } + } +} + +int preemption_timer_exit_handler() +{ + u64 guest_rip; + ulong reason; + u32 insn_len; + u32 ctrl_exit; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + insn_len = vmcs_read(EXI_INST_LEN); + switch (reason) { + case VMX_PREEMPT: + if (((rdtsc() - tsc_val) preempt_scale) preempt_val) + report(Preemption timer, 0); + else + report(Preemption timer, 1); + return VMX_TEST_VMEXIT; + case VMX_VMCALL: + switch (get_stage()) { + case 0: + if (vmcs_read(PREEMPT_TIMER_VALUE) != preempt_val) + report(Save preemption value, 0); + else { + set_stage(get_stage() + 1); + ctrl_exit = (vmcs_read(EXI_CONTROLS) | + EXI_SAVE_PREEMPT) ctrl_exit_rev.clr; + vmcs_write(EXI_CONTROLS, ctrl_exit); + } + break; + case 1: + if (vmcs_read(PREEMPT_TIMER_VALUE) = preempt_val) + report(Save preemption value, 0); + else + report(Save preemption value, 1); + break; + default: + printf(Invalid stage.\n); + print_vmexit_info(); + return VMX_TEST_VMEXIT; + } + vmcs_write(GUEST_RIP, guest_rip + insn_len); + return VMX_TEST_RESUME; + default: + printf(Unknown exit reason, %d\n, reason); + print_vmexit_info
Re: [PATCH] kvm-unit-tests: VMX: Add the framework of EPT
Hi Xiao Guangrong, Jun Nakajima, Yang Zhang, Gleb and Paolo, If you have any ideas of how and which aspects should nested EPT be tested, please tell me and I will write relevant test cases. Besides, I'm so happy if you can help me review this patch or propose other suggestions. Thanks very mush, Arthur On Mon, Sep 2, 2013 at 5:38 PM, Arthur Chunqi Li yzt...@gmail.com wrote: There must have some minor revisions to be done in this patch, so this is mainly a RFC mail. Besides, I'm not quite clear what we should test in nested EPT modules, and I bet writers of nested EPT must have ideas to continue and refine this testing part. Any suggestions of which part and how to test nested EPT is welcome. Please help me CC EPT-related guys if anyone knows. Thanks, Arthur On Mon, Sep 2, 2013 at 5:26 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add a framework of EPT in nested VMX testing, including a set of functions to construct and read EPT paging structures and a simple read/write test of EPT remapping from guest to host. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 132 -- x86/vmx.h | 76 +++ x86/vmx_tests.c | 156 +++ 3 files changed, 360 insertions(+), 4 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..a156b71 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -143,6 +143,132 @@ asm( call hypercall\n\t ); +/* EPT paging structure related functions */ +/* install_ept_entry : Install a page to a given level in EPT + @pml4 : addr of pml4 table + @pte_level : level of PTE to set + @guest_addr : physical address of guest + @pte : pte value to set + @pt_page : address of page table, NULL for a new page + */ +void install_ept_entry(unsigned long *pml4, + int pte_level, + unsigned long guest_addr, + unsigned long pte, + unsigned long *pt_page) +{ + int level; + unsigned long *pt = pml4; + unsigned offset; + + for (level = EPT_PAGE_LEVEL; level pte_level; --level) { + offset = (guest_addr ((level-1) * EPT_PGDIR_WIDTH + 12)) +EPT_PGDIR_MASK; + if (!(pt[offset] (EPT_RA | EPT_WA | EPT_EA))) { + unsigned long *new_pt = pt_page; + if (!new_pt) + new_pt = alloc_page(); + else + pt_page = 0; + memset(new_pt, 0, PAGE_SIZE); + pt[offset] = virt_to_phys(new_pt) + | EPT_RA | EPT_WA | EPT_EA; + } + pt = phys_to_virt(pt[offset] 0xff000ull); + } + offset = ((unsigned long)guest_addr ((level-1) * + EPT_PGDIR_WIDTH + 12)) EPT_PGDIR_MASK; + pt[offset] = pte; +} + +/* Map a page, @perm is the permission of the page */ +void install_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 1, guest_addr, (phys PAGE_MASK) | perm, 0); +} + +/* Map a 1G-size page */ +void install_1g_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 3, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* Map a 2M-size page */ +void install_2m_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 2, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging structure. + @start : start address of guest page + @len : length of address to be mapped + @map_1g : whether 1G page map is used + @map_2m : whether 2M page map is used + @perm : permission for every page + */ +int setup_ept_range(unsigned long *pml4, unsigned long start, + unsigned long len, int map_1g, int map_2m, u64 perm) +{ + u64 phys = start; + u64 max = (u64)len + (u64)start; + + if (map_1g) { + while (phys + PAGE_SIZE_1G = max) { + install_1g_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_1G; + } + } + if (map_2m) { + while (phys + PAGE_SIZE_2M = max) { + install_2m_ept(pml4, phys, phys, perm); + phys
Re: Corner cases of I/O bitmap
On Tue, Sep 3, 2013 at 7:19 PM, Gleb Natapov g...@redhat.com wrote: On Mon, Aug 12, 2013 at 08:35:57PM +0800, Arthur Chunqi Li wrote: Hi Gleb and Paolo, There are some corner cases when testing I/O bitmaps, and I don't know the exact action of HW. A little bit late but... A little earlier mail, but you are warming up quickly, maybe it's a tough time in the past week ;) 1. If we set bit of 0x4000 in bitmap and call inl(0x3) or inl(0x4000) in guest, what will get of exit information? Spec says; execution of an I/O instruction causes a VM exit if any bit in the I/O bitmaps corresponding to a port it accesses is 1. Note any here. The exit will have address that instruction used, otherwise how can we be able to emulate it properly. 2. What will we get when calling inl(0x) in guest with/without “unconditional I/O exiting” VM-execution control and “use I/O bitmaps” VM-execution control? In other words are you asking what happens if you do inl(0x) on real HW? The result of an attempt to address beyond the I/O address space limit of H is implementation-specific I test the two cases in nested env. For the first one, I got normal exit if any of the port accessed is masked in bitmap. For the second, it will acts the same as other ports. And the SDM says If an I/O operation “wraps around” the 16-bit I/O-port space (accesses ports H and H), the I/O instruction causes a VM exit. I cannot find the exact reaction of this case. What do you mean by exact reaction? As to my understanding, any wrap around access to 0x will cause VM exit even though mask of 0x is not set, but this is only my guess. I cannot get what inl(0x) will result described in SDM. But as what you said above, we do not need to test inl(0x) because we are not expected to get a determined result. Arthur Do you have any ideas about these? Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corner cases of I/O bitmap
On Tue, Sep 3, 2013 at 7:48 PM, Gleb Natapov g...@redhat.com wrote: On Tue, Sep 03, 2013 at 07:45:47PM +0800, Arthur Chunqi Li wrote: On Tue, Sep 3, 2013 at 7:19 PM, Gleb Natapov g...@redhat.com wrote: On Mon, Aug 12, 2013 at 08:35:57PM +0800, Arthur Chunqi Li wrote: Hi Gleb and Paolo, There are some corner cases when testing I/O bitmaps, and I don't know the exact action of HW. A little bit late but... A little earlier mail, but you are warming up quickly, maybe it's a tough time in the past week ;) 1. If we set bit of 0x4000 in bitmap and call inl(0x3) or inl(0x4000) in guest, what will get of exit information? Spec says; execution of an I/O instruction causes a VM exit if any bit in the I/O bitmaps corresponding to a port it accesses is 1. Note any here. The exit will have address that instruction used, otherwise how can we be able to emulate it properly. 2. What will we get when calling inl(0x) in guest with/without “unconditional I/O exiting” VM-execution control and “use I/O bitmaps” VM-execution control? In other words are you asking what happens if you do inl(0x) on real HW? The result of an attempt to address beyond the I/O address space limit of H is implementation-specific I test the two cases in nested env. For the first one, I got normal exit if any of the port accessed is masked in bitmap. For the second, it will acts the same as other ports. And the SDM says If an I/O operation “wraps around” the 16-bit I/O-port space (accesses ports H and H), the I/O instruction causes a VM exit. I cannot find the exact reaction of this case. What do you mean by exact reaction? As to my understanding, any wrap around access to 0x will cause VM exit even though mask of 0x is not set, but this is only my guess. I cannot get what inl(0x) will result described in SDM. But as what you said above, we do not need to test inl(0x) because we are not expected to get a determined result. Implementation-specific behaviour is only for what happens on real HW. In non root operation spec says VM exit should happen and we should test for that. I have read the patch I have committed, and found that I have tested inl(0x). Does access to 0x0 also cause VM exit in any cases of non-root operation? Arthur -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Information of EPT violation VMEXIT
Hi there, When I testing EPT violation VMEXIT, I get some confusions in bit 78 in Exit Qualification for EPT Violations (Table 27-7 in SDM). Bit 7 means Set if the guest linear-address field is valid. In which occasion will bit 7 clear? I don't quite understand the following statements in SDM The guest linear-address field is valid for all EPT violations except those resulting from an attempt to load the guest PDPTEs as part of the execution of the MOV CR instruction. Bit 8 means the causes of EPT violation. But I don't understand what it is means when set and clear. I always get the exit qualification with this bit set, how to design a violation with this bit clear? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/4] kvm-unit-tests: Add a series of test cases
Hi Gleb, Paolo and Jan, Would you please review this series of codes when you can spare time? Jan has review it and, of course, further suggestions are welcomed. Arthur On Thu, Aug 15, 2013 at 7:45 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add a series of test cases for nested VMX in kvm-unit-tests. Arthur Chunqi Li (4): kvm-unit-tests: VMX: Add test cases for PAT and EFER kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing kvm-unit-tests: VMX: Add test cases for I/O bitmaps kvm-unit-tests: VMX: Add test cases for instruction interception lib/x86/vm.h|4 + x86/vmx.c |3 +- x86/vmx.h | 20 +- x86/vmx_tests.c | 714 +++ 4 files changed, 736 insertions(+), 5 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-unit-tests: VMX: Add the framework of EPT
Add a framework of EPT in nested VMX testing, including a set of functions to construct and read EPT paging structures and a simple read/write test of EPT remapping from guest to host. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 132 -- x86/vmx.h | 76 +++ x86/vmx_tests.c | 156 +++ 3 files changed, 360 insertions(+), 4 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..a156b71 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -143,6 +143,132 @@ asm( call hypercall\n\t ); +/* EPT paging structure related functions */ +/* install_ept_entry : Install a page to a given level in EPT + @pml4 : addr of pml4 table + @pte_level : level of PTE to set + @guest_addr : physical address of guest + @pte : pte value to set + @pt_page : address of page table, NULL for a new page + */ +void install_ept_entry(unsigned long *pml4, + int pte_level, + unsigned long guest_addr, + unsigned long pte, + unsigned long *pt_page) +{ + int level; + unsigned long *pt = pml4; + unsigned offset; + + for (level = EPT_PAGE_LEVEL; level pte_level; --level) { + offset = (guest_addr ((level-1) * EPT_PGDIR_WIDTH + 12)) +EPT_PGDIR_MASK; + if (!(pt[offset] (EPT_RA | EPT_WA | EPT_EA))) { + unsigned long *new_pt = pt_page; + if (!new_pt) + new_pt = alloc_page(); + else + pt_page = 0; + memset(new_pt, 0, PAGE_SIZE); + pt[offset] = virt_to_phys(new_pt) + | EPT_RA | EPT_WA | EPT_EA; + } + pt = phys_to_virt(pt[offset] 0xff000ull); + } + offset = ((unsigned long)guest_addr ((level-1) * + EPT_PGDIR_WIDTH + 12)) EPT_PGDIR_MASK; + pt[offset] = pte; +} + +/* Map a page, @perm is the permission of the page */ +void install_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 1, guest_addr, (phys PAGE_MASK) | perm, 0); +} + +/* Map a 1G-size page */ +void install_1g_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 3, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* Map a 2M-size page */ +void install_2m_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 2, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging structure. + @start : start address of guest page + @len : length of address to be mapped + @map_1g : whether 1G page map is used + @map_2m : whether 2M page map is used + @perm : permission for every page + */ +int setup_ept_range(unsigned long *pml4, unsigned long start, + unsigned long len, int map_1g, int map_2m, u64 perm) +{ + u64 phys = start; + u64 max = (u64)len + (u64)start; + + if (map_1g) { + while (phys + PAGE_SIZE_1G = max) { + install_1g_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_1G; + } + } + if (map_2m) { + while (phys + PAGE_SIZE_2M = max) { + install_2m_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_2M; + } + } + while (phys + PAGE_SIZE = max) { + install_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE; + } + return 0; +} + +/* get_ept_pte : Get the PTE of a given level in EPT, +@level == 1 means get the latest level*/ +unsigned long get_ept_pte(unsigned long *pml4, + unsigned long guest_addr, int level) +{ + int l; + unsigned long *pt = pml4, pte; + unsigned offset; + + for (l = EPT_PAGE_LEVEL; l 1; --l) { + offset = (guest_addr (((l-1) * EPT_PGDIR_WIDTH) + 12)) +EPT_PGDIR_MASK; + pte = pt[offset]; + if (!(pte (EPT_RA | EPT_WA | EPT_EA))) + return 0; + if (l == level) + return pte; + if (l 4 (pte EPT_LARGE_PAGE)) + return pte; + pt = (unsigned long
Re: [PATCH] kvm-unit-tests: VMX: Add the framework of EPT
There must have some minor revisions to be done in this patch, so this is mainly a RFC mail. Besides, I'm not quite clear what we should test in nested EPT modules, and I bet writers of nested EPT must have ideas to continue and refine this testing part. Any suggestions of which part and how to test nested EPT is welcome. Please help me CC EPT-related guys if anyone knows. Thanks, Arthur On Mon, Sep 2, 2013 at 5:26 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add a framework of EPT in nested VMX testing, including a set of functions to construct and read EPT paging structures and a simple read/write test of EPT remapping from guest to host. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 132 -- x86/vmx.h | 76 +++ x86/vmx_tests.c | 156 +++ 3 files changed, 360 insertions(+), 4 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..a156b71 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -143,6 +143,132 @@ asm( call hypercall\n\t ); +/* EPT paging structure related functions */ +/* install_ept_entry : Install a page to a given level in EPT + @pml4 : addr of pml4 table + @pte_level : level of PTE to set + @guest_addr : physical address of guest + @pte : pte value to set + @pt_page : address of page table, NULL for a new page + */ +void install_ept_entry(unsigned long *pml4, + int pte_level, + unsigned long guest_addr, + unsigned long pte, + unsigned long *pt_page) +{ + int level; + unsigned long *pt = pml4; + unsigned offset; + + for (level = EPT_PAGE_LEVEL; level pte_level; --level) { + offset = (guest_addr ((level-1) * EPT_PGDIR_WIDTH + 12)) +EPT_PGDIR_MASK; + if (!(pt[offset] (EPT_RA | EPT_WA | EPT_EA))) { + unsigned long *new_pt = pt_page; + if (!new_pt) + new_pt = alloc_page(); + else + pt_page = 0; + memset(new_pt, 0, PAGE_SIZE); + pt[offset] = virt_to_phys(new_pt) + | EPT_RA | EPT_WA | EPT_EA; + } + pt = phys_to_virt(pt[offset] 0xff000ull); + } + offset = ((unsigned long)guest_addr ((level-1) * + EPT_PGDIR_WIDTH + 12)) EPT_PGDIR_MASK; + pt[offset] = pte; +} + +/* Map a page, @perm is the permission of the page */ +void install_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 1, guest_addr, (phys PAGE_MASK) | perm, 0); +} + +/* Map a 1G-size page */ +void install_1g_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 3, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* Map a 2M-size page */ +void install_2m_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 2, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging structure. + @start : start address of guest page + @len : length of address to be mapped + @map_1g : whether 1G page map is used + @map_2m : whether 2M page map is used + @perm : permission for every page + */ +int setup_ept_range(unsigned long *pml4, unsigned long start, + unsigned long len, int map_1g, int map_2m, u64 perm) +{ + u64 phys = start; + u64 max = (u64)len + (u64)start; + + if (map_1g) { + while (phys + PAGE_SIZE_1G = max) { + install_1g_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_1G; + } + } + if (map_2m) { + while (phys + PAGE_SIZE_2M = max) { + install_2m_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_2M; + } + } + while (phys + PAGE_SIZE = max) { + install_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE; + } + return 0; +} + +/* get_ept_pte : Get the PTE of a given level in EPT, +@level == 1 means get the latest level*/ +unsigned long get_ept_pte(unsigned long *pml4, + unsigned
Some questions about nested EPT
Hi there, When I test nested EPT (enable EPT of L2-L1 address translation), it occurred some questions when query IA32_VMX_EPT_VPID_CAP. 1. It show that bit 16 and 17 (support for 1G and 2M page) are disabled in nested IA32_VMX_EPT_VPID_CAP. Why nested EPT fails to support these? Are there any difficulties? 2. Can the bit 6 (support for a page-walk length of 4) of IA32_VMX_EPT_VPID_CAP is 0? That is to say if I can design a paging structure 4 or 4 levels? Cause I don't know who is the original author of nested EPT, I send this mail to the whole list. If anyone knows please tell me and CC the authors for more detailed discussion. Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer
On Mon, Aug 26, 2013 at 3:23 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 17:26, Arthur Chunqi Li wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 49 - 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..6aa320e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_fix_preempt(struct kvm_vcpu *vcpu) nested_adjust_preemption_timer - just preempt can be misleading. +{ + u64 delta_guest_tsc; + u32 preempt_val, preempt_bit, delta_preempt_val; + + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC) 0x1F; This is rather preemption_timer_scale. And if there is no symbolic value for the bitmask, please introduce one. + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + delta_preempt_val = delta_guest_tsc preempt_bit; + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + if (preempt_val - delta_preempt_val 0) + preempt_val = 0; + else + preempt_val -= delta_preempt_val; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val); The rest unfortunately wrong. It has to be split into two parts: Part one, the calculation of L1's TSC value and its storing in nested_vmx, has to be done on vmexit. Part two, reading the current TSC, calculating the time spent in L0 and converting it into L1 TSC time, this has to be done right before vmentry of L2. As what we discussed yesterday, the calculation of L1's TSC value is not saved in nested_vmx, however, to avoid adding codes to the hot patch of vmexit. Instead, we use vcpu-arch.last_guest_tsc as the value stored on vmexit (which has been done already). And the value of part two is calculated in nested_fix_preempt() above (see variant delta_guest_tsc, which stores the consumed TSC value in L0). Since vmx_handle_exit is the last function called in vmexit path, I think it's OK to put part two here. Arthur, please make sure that your test case detects the current breakage of preemption timer emulation properly, both /wrt to missing save/restore and also regarding missing L0 time compensation, and then check that your KVM patch fixes it based on the unit test results. OK, I will commit a patch of kvm-unit-tests to test these changes. Arthur Jan +} /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) else vmx-nested.nested_run_pending = 0; - if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { - nested_vmx_vmexit(vcpu); - return 1; + if (is_guest_mode(vcpu)) { + if (nested_vmx_exit_handled(vcpu)) { + nested_vmx_vmexit(vcpu); + return 1; + } else + nested_fix_preempt(vcpu); } if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. /* * Whether page-faults are trapped is determined by a combination of @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + else + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); Let's prepare the value for VM_EXIT_CONTROLS in a local variable first, then write it to the vmcs. /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support + * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. + */ + if ((vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); + return 1; + } Nope, the guest is free to run the preemption timer without saving on exits. It may have a valid use case for this, e.g. that it will always reprogram it on entry. Here !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware support save preemption timer feature, which means if L2 supports pinbased vmx preemption timer, host must support save preemption timer feature. Though nested_vmx_exit_ctls_* is used for nested env, but it can also used to reflect the host's feature. Here is what I discuss with you yesterday, and we can also get the feature via rdmsr here to avoid the confusion. Arthur + + /* * We're finally done with prerequisite checking, and can start with * the nested entry. */ Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 09:24, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. Besides, we need to test that in the absence of PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen according to Intel SDM. @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. /* * Whether page-faults are trapped is determined by a combination of @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + else + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); Let's prepare the value for VM_EXIT_CONTROLS in a local variable first, then write it to the vmcs. /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support + * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. + */ + if ((vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); + return 1; + } Nope, the guest is free to run the preemption timer without saving on exits. It may have a valid use case for this, e.g. that it will always reprogram it on entry. Here !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware support save preemption timer feature, which means if L2 supports pinbased vmx preemption timer, host must support save preemption timer feature. Sorry, parsed the code incorrectly. Though nested_vmx_exit_ctls_* is used for nested env, but it can also used to reflect the host's feature. Here is what I discuss with you yesterday, and we can also get the feature via rdmsr here to avoid the confusion. Yes. The point is that we
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:37 PM, Abel Gordon ab...@il.ibm.com wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. My solution is setting save preemption value feature of L2 if L2 sets vmx preemption timer feature, thus external interrupts (or others) will save the exact value in L2's vmcs, and the resume of L2 will load the value in L2's vmcs. Thus cycles of handling these vmexit in L0 will not affect L2's preemption value. Arthur -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 09:37, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 09:24, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. Besides, we need to test that in the absence of PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen according to Intel SDM. If the SDM guarantees this for us, we don't need such a safety measure. Otherwise, it should be added, yes. The SDM has such description (see 26.2.1.2): If “activate VMX-preemption timer” VM-execution control is 0, the “save VMX-preemption timer value” VM-exit control must also be 0. It doesn't tell us if these two flags are consistent when getting them from related MSR (IA32_VMX_PINBASED_CTLS and IA32_VMX_EXIT_CTLS). So I think the check is needed here. Arthur Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. Sorry, I previously misunderstand your comments. But why should we need to exclude cycles in L0 from L2 preemption value? These cycles are not spent by L2 and it should not be on L2. Arthur -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Yes, your description is right. But I'm also thinking about my previous consideration, why should we consider such X cycles as what L2 spent. For nested VMX. external interrupt is not provided by L1, it is triggered from L0 and want to cause periodically exit to L1, L2
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 10:41, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Yes, your description is right. But I'm also thinking about my previous consideration, why should we consider such X cycles as what L2 spent. For nested VMX. external interrupt is not provided by L1, it is triggered from L0 and want
[PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 49 - 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..6aa320e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_fix_preempt(struct kvm_vcpu *vcpu) +{ + u64 delta_guest_tsc; + u32 preempt_val, preempt_bit, delta_preempt_val; + + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC) 0x1F; + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + delta_preempt_val = delta_guest_tsc preempt_bit; + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + if (preempt_val - delta_preempt_val 0) + preempt_val = 0; + else + preempt_val -= delta_preempt_val; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val); +} /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) else vmx-nested.nested_run_pending = 0; - if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { - nested_vmx_vmexit(vcpu); - return 1; + if (is_guest_mode(vcpu)) { + if (nested_vmx_exit_handled(vcpu)) { + nested_vmx_vmexit(vcpu); + return 1; + } else + nested_fix_preempt(vcpu); } if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -8089,6 +8119,15 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12-guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } + /* * In some cases (usually, nested EPT), L2 is allowed to change its * own CR3 without exiting. If it has changed it, we must keep it. -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More
Re: [PATCH 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing
On Thu, Aug 15, 2013 at 3:30 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add testing for CR0/4 shadowing. A few sentences on the test strategy would be good. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- lib/x86/vm.h|4 + x86/vmx_tests.c | 218 +++ 2 files changed, 222 insertions(+) diff --git a/lib/x86/vm.h b/lib/x86/vm.h index eff6f72..6e0ce2b 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -17,9 +17,13 @@ #define PTE_ADDR(0xff000ull) #define X86_CR0_PE 0x0001 +#define X86_CR0_MP 0x0002 +#define X86_CR0_TS 0x0008 #define X86_CR0_WP 0x0001 #define X86_CR0_PG 0x8000 #define X86_CR4_VMXE 0x0001 +#define X86_CR4_TSD 0x0004 +#define X86_CR4_DE 0x0008 #define X86_CR4_PSE 0x0010 #define X86_CR4_PAE 0x0020 #define X86_CR4_PCIDE 0x0002 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 61b0cef..44be3f4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -5,12 +5,18 @@ u64 ia32_pat; u64 ia32_efer; +u32 stage; static inline void vmcall() { asm volatile(vmcall); } +static inline void set_stage(u32 s) +{ + asm volatile(mov %0, stage\n\t::r(s):memory, cc); +} + Why do we need state = s as assembler instruction? This is due to assembler optimization. If we simply use state = s, assembler will sometimes optimize it and state may not be set indeed. void basic_init() { } @@ -257,6 +263,216 @@ static int test_ctrl_efer_exit_handler() return VMX_TEST_VMEXIT; } +u32 guest_cr0, guest_cr4; + +static void cr_shadowing_main() +{ + u32 cr0, cr4, tmp; + + // Test read through + set_stage(0); + guest_cr0 = read_cr0(); + if (stage == 1) + report(Read through CR0, 0); + else + vmcall(); + set_stage(1); + guest_cr4 = read_cr4(); + if (stage == 2) + report(Read through CR4, 0); + else + vmcall(); + // Test write through + guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP); + guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE); + set_stage(2); + write_cr0(guest_cr0); + if (stage == 3) + report(Write throuth CR0, 0); + else + vmcall(); + set_stage(3); + write_cr4(guest_cr4); + if (stage == 4) + report(Write through CR4, 0); + else + vmcall(); + // Test read shadow + set_stage(4); + vmcall(); + cr0 = read_cr0(); + if (stage != 5) { + if (cr0 == guest_cr0) + report(Read shadowing CR0, 1); + else + report(Read shadowing CR0, 0); + } + set_stage(5); + cr4 = read_cr4(); + if (stage != 6) { + if (cr4 == guest_cr4) + report(Read shadowing CR4, 1); + else + report(Read shadowing CR4, 0); + } + // Test write shadow (same value with shadow) + set_stage(6); + write_cr0(guest_cr0); + if (stage == 7) + report(Write shadowing CR0 (same value with shadow), 0); + else + vmcall(); + set_stage(7); + write_cr4(guest_cr4); + if (stage == 8) + report(Write shadowing CR4 (same value with shadow), 0); + else + vmcall(); + // Test write shadow (different value) + set_stage(8); + tmp = guest_cr0 ^ X86_CR0_TS; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 9) + report(Write shadowing different X86_CR0_TS, 0); + else + report(Write shadowing different X86_CR0_TS, 1); + set_stage(9); + tmp = guest_cr0 ^ X86_CR0_MP; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 10) + report(Write shadowing different X86_CR0_MP, 0); + else + report(Write shadowing different X86_CR0_MP, 1); + set_stage(10); + tmp = guest_cr4 ^ X86_CR4_TSD; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr4\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 11) + report(Write shadowing different X86_CR4_TSD, 0); + else + report(Write shadowing different X86_CR4_TSD, 1); + set_stage(11); + tmp = guest_cr4 ^ X86_CR4_DE; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr4\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 12) + report(Write shadowing different X86_CR4_DE, 0); + else + report(Write shadowing different X86_CR4_DE, 1
Re: [PATCH 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER
On Thu, Aug 15, 2013 at 3:17 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT, EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit control fields. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |7 +++ x86/vmx_tests.c | 185 +++ 2 files changed, 192 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..18961f1 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -152,10 +152,12 @@ enum Encoding { GUEST_DEBUGCTL = 0x2802ul, GUEST_DEBUGCTL_HI = 0x2803ul, GUEST_EFER = 0x2806ul, + GUEST_PAT = 0x2804ul, GUEST_PERF_GLOBAL_CTRL = 0x2808ul, GUEST_PDPTE = 0x280aul, /* 64-Bit Host State */ + HOST_PAT= 0x2c00ul, HOST_EFER = 0x2c02ul, HOST_PERF_GLOBAL_CTRL = 0x2c04ul, @@ -330,11 +332,15 @@ enum Ctrl_exi { EXI_HOST_64 = 1UL 9, EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, + EXI_SAVE_PAT= 1UL 18, + EXI_LOAD_PAT= 1UL 19, + EXI_SAVE_EFER = 1UL 20, EXI_LOAD_EFER = 1UL 21, }; enum Ctrl_ent { ENT_GUEST_64= 1UL 9, + ENT_LOAD_PAT= 1UL 14, ENT_LOAD_EFER = 1UL 15, }; @@ -354,6 +360,7 @@ enum Ctrl0 { CPU_NMI_WINDOW = 1ul 22, CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, + CPU_MSR_BITMAP = 1ul 28, CPU_SECONDARY = 1ul 31, }; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..61b0cef 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,15 @@ #include vmx.h +#include msr.h +#include processor.h +#include vm.h + +u64 ia32_pat; +u64 ia32_efer; + +static inline void vmcall() +{ + asm volatile(vmcall); +} void basic_init() { @@ -76,6 +87,176 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +void msr_bmp_init() +{ + void *msr_bitmap; + u32 ctrl_cpu0; + + msr_bitmap = alloc_page(); + memset(msr_bitmap, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_MSR_BITMAP; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(MSR_BITMAP, (u64)msr_bitmap); +} Better safe this function for the test case where you actually stress the bitmap. What do you mean by safe? Arthur Jan + +static void test_ctrl_pat_init() +{ + u64 ctrl_ent; + u64 ctrl_exi; + + msr_bmp_init(); + ctrl_ent = vmcs_read(ENT_CONTROLS); + ctrl_exi = vmcs_read(EXI_CONTROLS); + vmcs_write(ENT_CONTROLS, ctrl_ent | ENT_LOAD_PAT); + vmcs_write(EXI_CONTROLS, ctrl_exi | (EXI_SAVE_PAT | EXI_LOAD_PAT)); + ia32_pat = rdmsr(MSR_IA32_CR_PAT); + vmcs_write(GUEST_PAT, 0x0); + vmcs_write(HOST_PAT, ia32_pat); +} + +static void test_ctrl_pat_main() +{ + u64 guest_ia32_pat; + + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT); + if (!(ctrl_enter_rev.clr ENT_LOAD_PAT)) + printf(\tENT_LOAD_PAT is not supported.\n); + else { + if (guest_ia32_pat != 0) { + report(Entry load PAT, 0); + return; + } + } + wrmsr(MSR_IA32_CR_PAT, 0x6); + vmcall(); + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT); + if (ctrl_enter_rev.clr ENT_LOAD_PAT) { + if (guest_ia32_pat != ia32_pat) { + report(Entry load PAT, 0); + return; + } + report(Entry load PAT, 1); + } +} + +static int test_ctrl_pat_exit_handler() +{ + u64 guest_rip; + ulong reason; + u64 guest_pat; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + switch (reason) { + case VMX_VMCALL: + guest_pat = vmcs_read(GUEST_PAT); + if (!(ctrl_exit_rev.clr EXI_SAVE_PAT)) { + printf(\tEXI_SAVE_PAT is not supported\n); + vmcs_write(GUEST_PAT, 0x6); + } else { + if (guest_pat == 0x6) + report(Exit save PAT, 1); + else + report(Exit save PAT, 0); + } + if (!(ctrl_exit_rev.clr EXI_LOAD_PAT)) + printf(\tEXI_LOAD_PAT is not supported\n); + else { + if (rdmsr(MSR_IA32_CR_PAT) == ia32_pat) + report(Exit load PAT, 1); + else + report(Exit load PAT, 0); + } + vmcs_write(GUEST_PAT, ia32_pat); + vmcs_write(GUEST_RIP
Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps
On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for I/O bitmaps, including corner cases. Would be good to briefly list the corner cases here. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |6 +- x86/vmx_tests.c | 167 +++ 2 files changed, 170 insertions(+), 3 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index 18961f1..dba8b20 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -417,15 +417,15 @@ enum Ctrl1 { popf\n\t #define VMX_IO_SIZE_MASK 0x7 -#define _VMX_IO_BYTE 1 -#define _VMX_IO_WORD 2 +#define _VMX_IO_BYTE 0 +#define _VMX_IO_WORD 1 #define _VMX_IO_LONG 3 #define VMX_IO_DIRECTION_MASK(1ul 3) #define VMX_IO_IN(1ul 3) #define VMX_IO_OUT 0 #define VMX_IO_STRING(1ul 4) #define VMX_IO_REP (1ul 5) -#define VMX_IO_OPRAND_DX (1ul 6) +#define VMX_IO_OPRAND_IMM(1ul 6) #define VMX_IO_PORT_MASK 0x #define VMX_IO_PORT_SHIFT16 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 44be3f4..ad28c4c 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -2,10 +2,13 @@ #include msr.h #include processor.h #include vm.h +#include io.h u64 ia32_pat; u64 ia32_efer; u32 stage; +void *io_bitmap_a, *io_bitmap_b; +u16 ioport; static inline void vmcall() { @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler() return VMX_TEST_VMEXIT; } +static void iobmp_init() +{ + u32 ctrl_cpu0; + + io_bitmap_a = alloc_page(); + io_bitmap_a = alloc_page(); + memset(io_bitmap_a, 0x0, PAGE_SIZE); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO_BITMAP; + ctrl_cpu0 = (~CPU_IO); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a); + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b); +} + +static void iobmp_main() +{ +/* + data = (u8 *)io_bitmap_b; + ioport = 0x; + data[(ioport - 0x8000) /8] |= (1 (ioport % 8)); + inb(ioport); + outb(0, ioport); +*/ Forgotten debug code? + // stage 0, test IO pass + set_stage(0); + inb(0x5000); + outb(0x0, 0x5000); + if (stage != 0) + report(I/O bitmap - I/O pass, 0); + else + report(I/O bitmap - I/O pass, 1); + // test IO width, in/out + ((u8 *)io_bitmap_a)[0] = 0xFF; + set_stage(2); + inb(0x0); + if (stage != 3) + report(I/O bitmap - trap in, 0); + else + report(I/O bitmap - trap in, 1); + set_stage(3); + outw(0x0, 0x0); + if (stage != 4) + report(I/O bitmap - trap out, 0); + else + report(I/O bitmap - trap out, 1); + set_stage(4); + inl(0x0); Forgot to check the progress? + // test low/high IO port + set_stage(5); + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1 (0x5000 % 8)); + inb(0x5000); + if (stage == 6) + report(I/O bitmap - I/O port, low part, 1); + else + report(I/O bitmap - I/O port, low part, 0); + set_stage(6); + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1 (0x1000 % 8)); + inb(0x9000); + if (stage == 7) + report(I/O bitmap - I/O port, high part, 1); + else + report(I/O bitmap - I/O port, high part, 0); + // test partial pass + set_stage(7); + inl(0x4FFF); + if (stage == 8) + report(I/O bitmap - partial pass, 1); + else + report(I/O bitmap - partial pass, 0); + // test overrun + set_stage(8); + memset(io_bitmap_b, 0xFF, PAGE_SIZE); + inl(0x); Let's check the expected stage also here. The check is below if (stage == 9), the following memset is just used to prevent I/O mask to printf. + memset(io_bitmap_b, 0x0, PAGE_SIZE); Note that you still have io_bitmap_a[0] != 0 here. You probably want to clear it in order to have a clean setup. + if (stage == 9) + report(I/O bitmap - overrun, 1); + else + report(I/O bitmap - overrun, 0); + + return; +} + +static int iobmp_exit_handler() +{ + u64 guest_rip; + ulong reason, exit_qual; + u32 insn_len; + //u32 ctrl_cpu0; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + exit_qual = vmcs_read(EXI_QUALIFICATION); + insn_len = vmcs_read(EXI_INST_LEN); + switch (reason) { + case VMX_IO: + switch (stage) { + case 2: + if ((exit_qual VMX_IO_SIZE_MASK) != _VMX_IO_BYTE
Re: [PATCH 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing
On Thu, Aug 15, 2013 at 3:47 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 09:40, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 3:30 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add testing for CR0/4 shadowing. A few sentences on the test strategy would be good. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- lib/x86/vm.h|4 + x86/vmx_tests.c | 218 +++ 2 files changed, 222 insertions(+) diff --git a/lib/x86/vm.h b/lib/x86/vm.h index eff6f72..6e0ce2b 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -17,9 +17,13 @@ #define PTE_ADDR(0xff000ull) #define X86_CR0_PE 0x0001 +#define X86_CR0_MP 0x0002 +#define X86_CR0_TS 0x0008 #define X86_CR0_WP 0x0001 #define X86_CR0_PG 0x8000 #define X86_CR4_VMXE 0x0001 +#define X86_CR4_TSD 0x0004 +#define X86_CR4_DE 0x0008 #define X86_CR4_PSE 0x0010 #define X86_CR4_PAE 0x0020 #define X86_CR4_PCIDE 0x0002 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 61b0cef..44be3f4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -5,12 +5,18 @@ u64 ia32_pat; u64 ia32_efer; +u32 stage; static inline void vmcall() { asm volatile(vmcall); } +static inline void set_stage(u32 s) +{ + asm volatile(mov %0, stage\n\t::r(s):memory, cc); +} + Why do we need state = s as assembler instruction? This is due to assembler optimization. If we simply use state = s, assembler will sometimes optimize it and state may not be set indeed. volatile u32 stage? And we have barrier() to avoid reordering. Reordering here is not a big deal here, though it is actually needed here. I occurred the following problem: stage = 1; do something that causes vmexit; stage = 2; Then the compiler will optimize stage = 1 and stage = 2 to one instruction stage =2, since instructions between them don't use stage. Can volatile solve this problem? Arthur void basic_init() { } @@ -257,6 +263,216 @@ static int test_ctrl_efer_exit_handler() return VMX_TEST_VMEXIT; } +u32 guest_cr0, guest_cr4; + +static void cr_shadowing_main() +{ + u32 cr0, cr4, tmp; + + // Test read through + set_stage(0); + guest_cr0 = read_cr0(); + if (stage == 1) + report(Read through CR0, 0); + else + vmcall(); + set_stage(1); + guest_cr4 = read_cr4(); + if (stage == 2) + report(Read through CR4, 0); + else + vmcall(); + // Test write through + guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP); + guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE); + set_stage(2); + write_cr0(guest_cr0); + if (stage == 3) + report(Write throuth CR0, 0); + else + vmcall(); + set_stage(3); + write_cr4(guest_cr4); + if (stage == 4) + report(Write through CR4, 0); + else + vmcall(); + // Test read shadow + set_stage(4); + vmcall(); + cr0 = read_cr0(); + if (stage != 5) { + if (cr0 == guest_cr0) + report(Read shadowing CR0, 1); + else + report(Read shadowing CR0, 0); + } + set_stage(5); + cr4 = read_cr4(); + if (stage != 6) { + if (cr4 == guest_cr4) + report(Read shadowing CR4, 1); + else + report(Read shadowing CR4, 0); + } + // Test write shadow (same value with shadow) + set_stage(6); + write_cr0(guest_cr0); + if (stage == 7) + report(Write shadowing CR0 (same value with shadow), 0); + else + vmcall(); + set_stage(7); + write_cr4(guest_cr4); + if (stage == 8) + report(Write shadowing CR4 (same value with shadow), 0); + else + vmcall(); + // Test write shadow (different value) + set_stage(8); + tmp = guest_cr0 ^ X86_CR0_TS; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 9) + report(Write shadowing different X86_CR0_TS, 0); + else + report(Write shadowing different X86_CR0_TS, 1); + set_stage(9); + tmp = guest_cr0 ^ X86_CR0_MP; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 10) + report(Write shadowing different X86_CR0_MP, 0); + else + report(Write shadowing different X86_CR0_MP, 1); + set_stage(10); + tmp = guest_cr4 ^ X86_CR4_TSD; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr4\n\t + ::m(tmp) + :rsi, memory, cc
Re: [PATCH 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER
On Thu, Aug 15, 2013 at 3:48 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 09:41, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 3:17 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT, EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit control fields. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |7 +++ x86/vmx_tests.c | 185 +++ 2 files changed, 192 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..18961f1 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -152,10 +152,12 @@ enum Encoding { GUEST_DEBUGCTL = 0x2802ul, GUEST_DEBUGCTL_HI = 0x2803ul, GUEST_EFER = 0x2806ul, + GUEST_PAT = 0x2804ul, GUEST_PERF_GLOBAL_CTRL = 0x2808ul, GUEST_PDPTE = 0x280aul, /* 64-Bit Host State */ + HOST_PAT= 0x2c00ul, HOST_EFER = 0x2c02ul, HOST_PERF_GLOBAL_CTRL = 0x2c04ul, @@ -330,11 +332,15 @@ enum Ctrl_exi { EXI_HOST_64 = 1UL 9, EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, + EXI_SAVE_PAT= 1UL 18, + EXI_LOAD_PAT= 1UL 19, + EXI_SAVE_EFER = 1UL 20, EXI_LOAD_EFER = 1UL 21, }; enum Ctrl_ent { ENT_GUEST_64= 1UL 9, + ENT_LOAD_PAT= 1UL 14, ENT_LOAD_EFER = 1UL 15, }; @@ -354,6 +360,7 @@ enum Ctrl0 { CPU_NMI_WINDOW = 1ul 22, CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, + CPU_MSR_BITMAP = 1ul 28, CPU_SECONDARY = 1ul 31, }; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..61b0cef 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,15 @@ #include vmx.h +#include msr.h +#include processor.h +#include vm.h + +u64 ia32_pat; +u64 ia32_efer; + +static inline void vmcall() +{ + asm volatile(vmcall); +} void basic_init() { @@ -76,6 +87,176 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +void msr_bmp_init() +{ + void *msr_bitmap; + u32 ctrl_cpu0; + + msr_bitmap = alloc_page(); + memset(msr_bitmap, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_MSR_BITMAP; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(MSR_BITMAP, (u64)msr_bitmap); +} Better safe this function for the test case where you actually stress the bitmap. What do you mean by safe? I meant the other save: This function serves no purpose here. Let's only introduce it when that changes, i.e. when you actually test the MSR bitmap. No, the function is meaningful here. We need directly access to MSRs in guest and if msr bitmap is not set, any access to MSRs will cause vmexit. Here we just let all rdmsr/wrmsr pass in guest. Arthur Jan -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps
On Thu, Aug 15, 2013 at 3:58 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 09:51, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for I/O bitmaps, including corner cases. Would be good to briefly list the corner cases here. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |6 +- x86/vmx_tests.c | 167 +++ 2 files changed, 170 insertions(+), 3 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index 18961f1..dba8b20 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -417,15 +417,15 @@ enum Ctrl1 { popf\n\t #define VMX_IO_SIZE_MASK 0x7 -#define _VMX_IO_BYTE 1 -#define _VMX_IO_WORD 2 +#define _VMX_IO_BYTE 0 +#define _VMX_IO_WORD 1 #define _VMX_IO_LONG 3 #define VMX_IO_DIRECTION_MASK(1ul 3) #define VMX_IO_IN(1ul 3) #define VMX_IO_OUT 0 #define VMX_IO_STRING(1ul 4) #define VMX_IO_REP (1ul 5) -#define VMX_IO_OPRAND_DX (1ul 6) +#define VMX_IO_OPRAND_IMM(1ul 6) #define VMX_IO_PORT_MASK 0x #define VMX_IO_PORT_SHIFT16 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 44be3f4..ad28c4c 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -2,10 +2,13 @@ #include msr.h #include processor.h #include vm.h +#include io.h u64 ia32_pat; u64 ia32_efer; u32 stage; +void *io_bitmap_a, *io_bitmap_b; +u16 ioport; static inline void vmcall() { @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler() return VMX_TEST_VMEXIT; } +static void iobmp_init() +{ + u32 ctrl_cpu0; + + io_bitmap_a = alloc_page(); + io_bitmap_a = alloc_page(); + memset(io_bitmap_a, 0x0, PAGE_SIZE); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO_BITMAP; + ctrl_cpu0 = (~CPU_IO); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a); + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b); +} + +static void iobmp_main() +{ +/* + data = (u8 *)io_bitmap_b; + ioport = 0x; + data[(ioport - 0x8000) /8] |= (1 (ioport % 8)); + inb(ioport); + outb(0, ioport); +*/ Forgotten debug code? + // stage 0, test IO pass + set_stage(0); + inb(0x5000); + outb(0x0, 0x5000); + if (stage != 0) + report(I/O bitmap - I/O pass, 0); + else + report(I/O bitmap - I/O pass, 1); + // test IO width, in/out + ((u8 *)io_bitmap_a)[0] = 0xFF; + set_stage(2); + inb(0x0); + if (stage != 3) + report(I/O bitmap - trap in, 0); + else + report(I/O bitmap - trap in, 1); + set_stage(3); + outw(0x0, 0x0); + if (stage != 4) + report(I/O bitmap - trap out, 0); + else + report(I/O bitmap - trap out, 1); + set_stage(4); + inl(0x0); Forgot to check the progress? + // test low/high IO port + set_stage(5); + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1 (0x5000 % 8)); + inb(0x5000); + if (stage == 6) + report(I/O bitmap - I/O port, low part, 1); + else + report(I/O bitmap - I/O port, low part, 0); + set_stage(6); + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1 (0x1000 % 8)); + inb(0x9000); + if (stage == 7) + report(I/O bitmap - I/O port, high part, 1); + else + report(I/O bitmap - I/O port, high part, 0); + // test partial pass + set_stage(7); + inl(0x4FFF); + if (stage == 8) + report(I/O bitmap - partial pass, 1); + else + report(I/O bitmap - partial pass, 0); + // test overrun + set_stage(8); + memset(io_bitmap_b, 0xFF, PAGE_SIZE); + inl(0x); Let's check the expected stage also here. The check is below if (stage == 9), the following memset is just used to prevent I/O mask to printf. Right, there is an i/o instruction missing below after the second memset - or I cannot follow what you are trying to test. The above inl would always trigger, independent of the wrap-around. Only if you clear both bitmaps, we get to the interesting scenario. So something is still wrong here, no? Yes, we need to memset io_bit_map_a to 0 here. The above inl and the test if (stage == 9) are cooperatively used to test I/O overrun: test 4 bits width in to 0x. Arthur + memset(io_bitmap_b, 0x0, PAGE_SIZE); Note that you still have io_bitmap_a[0] != 0 here. You probably want to clear it in order to have a clean setup. + if (stage == 9) + report(I/O bitmap - overrun, 1
Re: [PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception
On Thu, Aug 15, 2013 at 4:06 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for instruction interception, including three types: 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/ RDPMC/RDTSC/MONITOR/PAUSE) 2. Secondary Processor-Based VM-Execution Controls (WBINVD) 3. No control flag (CPUID/INVD) Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c |3 +- x86/vmx.h |7 x86/vmx_tests.c | 117 +++ 3 files changed, 125 insertions(+), 2 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..c346070 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -336,8 +336,7 @@ static void init_vmx(void) : MSR_IA32_VMX_ENTRY_CTLS); ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC : MSR_IA32_VMX_PROCBASED_CTLS); - if (ctrl_cpu_rev[0].set CPU_SECONDARY) - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); if (ctrl_cpu_rev[1].set CPU_EPT || ctrl_cpu_rev[1].set CPU_VPID) ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP); diff --git a/x86/vmx.h b/x86/vmx.h index dba8b20..d81d25d 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -354,6 +354,9 @@ enum Ctrl0 { CPU_INTR_WINDOW = 1ul 2, CPU_HLT = 1ul 7, CPU_INVLPG = 1ul 9, + CPU_MWAIT = 1ul 10, + CPU_RDPMC = 1ul 11, + CPU_RDTSC = 1ul 12, CPU_CR3_LOAD= 1ul 15, CPU_CR3_STORE = 1ul 16, CPU_TPR_SHADOW = 1ul 21, @@ -361,6 +364,8 @@ enum Ctrl0 { CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, CPU_MSR_BITMAP = 1ul 28, + CPU_MONITOR = 1ul 29, + CPU_PAUSE = 1ul 30, CPU_SECONDARY = 1ul 31, }; @@ -368,6 +373,8 @@ enum Ctrl1 { CPU_EPT = 1ul 1, CPU_VPID= 1ul 5, CPU_URG = 1ul 7, + CPU_WBINVD = 1ul 6, + CPU_RDRAND = 1ul 11, }; #define SAVE_GPR \ diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index ad28c4c..66187f4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -20,6 +20,13 @@ static inline void set_stage(u32 s) asm volatile(mov %0, stage\n\t::r(s):memory, cc); } +static inline u32 get_stage() +{ + u32 s; + asm volatile(mov stage, %0\n\t:=r(s)::memory, cc); + return s; +} Tagging stage volatile will obsolete this special assembly. + void basic_init() { } @@ -638,6 +645,114 @@ static int iobmp_exit_handler() return VMX_TEST_VMEXIT; } +asm( + insn_hlt: hlt;ret\n\t + insn_invlpg: invlpg 0x12345678;ret\n\t + insn_mwait: mwait;ret\n\t + insn_rdpmc: rdpmc;ret\n\t + insn_rdtsc: rdtsc;ret\n\t + insn_monitor: monitor;ret\n\t + insn_pause: pause;ret\n\t + insn_wbinvd: wbinvd;ret\n\t + insn_cpuid: cpuid;ret\n\t + insn_invd: invd;ret\n\t +); +extern void insn_hlt(); +extern void insn_invlpg(); +extern void insn_mwait(); +extern void insn_rdpmc(); +extern void insn_rdtsc(); +extern void insn_monitor(); +extern void insn_pause(); +extern void insn_wbinvd(); +extern void insn_cpuid(); +extern void insn_invd(); + +u32 cur_insn; + +struct insn_table { + const char *name; + u32 flag; + void (*insn_func)(); + u32 type; What do the type values mean? For intercepted instructions we have three type: controlled by Primary Processor-Based VM-Execution Controls, controlled by Secondary Controls and always intercepted. The testing process is different for different types. + u32 reason; + ulong exit_qual; + u32 insn_info; For none of the instructions you test, EXI_INST_INFO will have valid content on exit. So you must not check it anyway. Actually , RDRAND uses EXI_INST_INFO though it is not supported now. Since for all intercepts these three vmcs fields are enough to determine everything, I put it here for future use. +}; + +static struct insn_table insn_table[] = { + // Flags for Primary Processor-Based VM-Execution Controls + {HLT, CPU_HLT, insn_hlt, 0, 12, 0, 0}, + {INVLPG, CPU_INVLPG, insn_invlpg, 0, 14, 0x12345678, 0}, + {MWAIT, CPU_MWAIT, insn_mwait, 0, 36, 0, 0}, + {RDPMC, CPU_RDPMC, insn_rdpmc, 0, 15, 0, 0}, + {RDTSC, CPU_RDTSC, insn_rdtsc, 0, 16, 0, 0}, + {MONITOR, CPU_MONITOR, insn_monitor, 0, 39, 0, 0}, + {PAUSE, CPU_PAUSE, insn_pause, 0, 40, 0, 0}, + // Flags for Secondary Processor-Based VM-Execution Controls + {WBINVD, CPU_WBINVD, insn_wbinvd, 1, 54, 0, 0}, + // Flags for Non-Processor-Based + {CPUID
Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps
On Thu, Aug 15, 2013 at 4:13 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 10:09, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 3:58 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 09:51, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for I/O bitmaps, including corner cases. Would be good to briefly list the corner cases here. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |6 +- x86/vmx_tests.c | 167 +++ 2 files changed, 170 insertions(+), 3 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index 18961f1..dba8b20 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -417,15 +417,15 @@ enum Ctrl1 { popf\n\t #define VMX_IO_SIZE_MASK 0x7 -#define _VMX_IO_BYTE 1 -#define _VMX_IO_WORD 2 +#define _VMX_IO_BYTE 0 +#define _VMX_IO_WORD 1 #define _VMX_IO_LONG 3 #define VMX_IO_DIRECTION_MASK(1ul 3) #define VMX_IO_IN(1ul 3) #define VMX_IO_OUT 0 #define VMX_IO_STRING(1ul 4) #define VMX_IO_REP (1ul 5) -#define VMX_IO_OPRAND_DX (1ul 6) +#define VMX_IO_OPRAND_IMM(1ul 6) #define VMX_IO_PORT_MASK 0x #define VMX_IO_PORT_SHIFT16 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 44be3f4..ad28c4c 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -2,10 +2,13 @@ #include msr.h #include processor.h #include vm.h +#include io.h u64 ia32_pat; u64 ia32_efer; u32 stage; +void *io_bitmap_a, *io_bitmap_b; +u16 ioport; static inline void vmcall() { @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler() return VMX_TEST_VMEXIT; } +static void iobmp_init() +{ + u32 ctrl_cpu0; + + io_bitmap_a = alloc_page(); + io_bitmap_a = alloc_page(); + memset(io_bitmap_a, 0x0, PAGE_SIZE); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO_BITMAP; + ctrl_cpu0 = (~CPU_IO); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a); + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b); +} + +static void iobmp_main() +{ +/* + data = (u8 *)io_bitmap_b; + ioport = 0x; + data[(ioport - 0x8000) /8] |= (1 (ioport % 8)); + inb(ioport); + outb(0, ioport); +*/ Forgotten debug code? + // stage 0, test IO pass + set_stage(0); + inb(0x5000); + outb(0x0, 0x5000); + if (stage != 0) + report(I/O bitmap - I/O pass, 0); + else + report(I/O bitmap - I/O pass, 1); + // test IO width, in/out + ((u8 *)io_bitmap_a)[0] = 0xFF; + set_stage(2); + inb(0x0); + if (stage != 3) + report(I/O bitmap - trap in, 0); + else + report(I/O bitmap - trap in, 1); + set_stage(3); + outw(0x0, 0x0); + if (stage != 4) + report(I/O bitmap - trap out, 0); + else + report(I/O bitmap - trap out, 1); + set_stage(4); + inl(0x0); Forgot to check the progress? + // test low/high IO port + set_stage(5); + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1 (0x5000 % 8)); + inb(0x5000); + if (stage == 6) + report(I/O bitmap - I/O port, low part, 1); + else + report(I/O bitmap - I/O port, low part, 0); + set_stage(6); + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1 (0x1000 % 8)); + inb(0x9000); + if (stage == 7) + report(I/O bitmap - I/O port, high part, 1); + else + report(I/O bitmap - I/O port, high part, 0); + // test partial pass + set_stage(7); + inl(0x4FFF); + if (stage == 8) + report(I/O bitmap - partial pass, 1); + else + report(I/O bitmap - partial pass, 0); + // test overrun + set_stage(8); + memset(io_bitmap_b, 0xFF, PAGE_SIZE); + inl(0x); Let's check the expected stage also here. The check is below if (stage == 9), the following memset is just used to prevent I/O mask to printf. Right, there is an i/o instruction missing below after the second memset - or I cannot follow what you are trying to test. The above inl would always trigger, independent of the wrap-around. Only if you clear both bitmaps, we get to the interesting scenario. So something is still wrong here, no? Yes, we need to memset io_bit_map_a to 0 here. The above inl and the test if (stage == 9) are cooperatively used to test I/O overrun: test 4 bits width in to 0x. The point is that, according to our understanding of the SDM, we should even see a trap in this wrap-around scenario if both
Re: [PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception
On Thu, Aug 15, 2013 at 4:20 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 10:16, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 4:06 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for instruction interception, including three types: 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/ RDPMC/RDTSC/MONITOR/PAUSE) 2. Secondary Processor-Based VM-Execution Controls (WBINVD) 3. No control flag (CPUID/INVD) Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c |3 +- x86/vmx.h |7 x86/vmx_tests.c | 117 +++ 3 files changed, 125 insertions(+), 2 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..c346070 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -336,8 +336,7 @@ static void init_vmx(void) : MSR_IA32_VMX_ENTRY_CTLS); ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC : MSR_IA32_VMX_PROCBASED_CTLS); - if (ctrl_cpu_rev[0].set CPU_SECONDARY) - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); if (ctrl_cpu_rev[1].set CPU_EPT || ctrl_cpu_rev[1].set CPU_VPID) ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP); diff --git a/x86/vmx.h b/x86/vmx.h index dba8b20..d81d25d 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -354,6 +354,9 @@ enum Ctrl0 { CPU_INTR_WINDOW = 1ul 2, CPU_HLT = 1ul 7, CPU_INVLPG = 1ul 9, + CPU_MWAIT = 1ul 10, + CPU_RDPMC = 1ul 11, + CPU_RDTSC = 1ul 12, CPU_CR3_LOAD= 1ul 15, CPU_CR3_STORE = 1ul 16, CPU_TPR_SHADOW = 1ul 21, @@ -361,6 +364,8 @@ enum Ctrl0 { CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, CPU_MSR_BITMAP = 1ul 28, + CPU_MONITOR = 1ul 29, + CPU_PAUSE = 1ul 30, CPU_SECONDARY = 1ul 31, }; @@ -368,6 +373,8 @@ enum Ctrl1 { CPU_EPT = 1ul 1, CPU_VPID= 1ul 5, CPU_URG = 1ul 7, + CPU_WBINVD = 1ul 6, + CPU_RDRAND = 1ul 11, }; #define SAVE_GPR \ diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index ad28c4c..66187f4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -20,6 +20,13 @@ static inline void set_stage(u32 s) asm volatile(mov %0, stage\n\t::r(s):memory, cc); } +static inline u32 get_stage() +{ + u32 s; + asm volatile(mov stage, %0\n\t:=r(s)::memory, cc); + return s; +} Tagging stage volatile will obsolete this special assembly. + void basic_init() { } @@ -638,6 +645,114 @@ static int iobmp_exit_handler() return VMX_TEST_VMEXIT; } +asm( + insn_hlt: hlt;ret\n\t + insn_invlpg: invlpg 0x12345678;ret\n\t + insn_mwait: mwait;ret\n\t + insn_rdpmc: rdpmc;ret\n\t + insn_rdtsc: rdtsc;ret\n\t + insn_monitor: monitor;ret\n\t + insn_pause: pause;ret\n\t + insn_wbinvd: wbinvd;ret\n\t + insn_cpuid: cpuid;ret\n\t + insn_invd: invd;ret\n\t +); +extern void insn_hlt(); +extern void insn_invlpg(); +extern void insn_mwait(); +extern void insn_rdpmc(); +extern void insn_rdtsc(); +extern void insn_monitor(); +extern void insn_pause(); +extern void insn_wbinvd(); +extern void insn_cpuid(); +extern void insn_invd(); + +u32 cur_insn; + +struct insn_table { + const char *name; + u32 flag; + void (*insn_func)(); + u32 type; What do the type values mean? For intercepted instructions we have three type: controlled by Primary Processor-Based VM-Execution Controls, controlled by Secondary Controls and always intercepted. The testing process is different for different types. This was a rhetorical questions. ;) Could you make the values symbolic? OK. It's better to rename it to ctrl_field and define some macros such as CTRL_CPU0, CTRL_CPU1, CTRL_NONE to make it more readable. + u32 reason; + ulong exit_qual; + u32 insn_info; For none of the instructions you test, EXI_INST_INFO will have valid content on exit. So you must not check it anyway. Actually , RDRAND uses EXI_INST_INFO though it is not supported now. Since for all intercepts these three vmcs fields are enough to determine everything, I put it here for future use. OK, but don't test its value when it's undefined - like in all cases implemented here. Only test the used field will make it more complex because we need to define which field is used in insn_table. Besides, if any of these three fields is unused, it will be set to 0; and I think writing like this is OK since we just write a test case
Re: [PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception
On Thu, Aug 15, 2013 at 4:40 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 10:35, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 4:20 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 10:16, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 4:06 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for instruction interception, including three types: 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/ RDPMC/RDTSC/MONITOR/PAUSE) 2. Secondary Processor-Based VM-Execution Controls (WBINVD) 3. No control flag (CPUID/INVD) Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c |3 +- x86/vmx.h |7 x86/vmx_tests.c | 117 +++ 3 files changed, 125 insertions(+), 2 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..c346070 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -336,8 +336,7 @@ static void init_vmx(void) : MSR_IA32_VMX_ENTRY_CTLS); ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC : MSR_IA32_VMX_PROCBASED_CTLS); - if (ctrl_cpu_rev[0].set CPU_SECONDARY) - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); if (ctrl_cpu_rev[1].set CPU_EPT || ctrl_cpu_rev[1].set CPU_VPID) ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP); diff --git a/x86/vmx.h b/x86/vmx.h index dba8b20..d81d25d 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -354,6 +354,9 @@ enum Ctrl0 { CPU_INTR_WINDOW = 1ul 2, CPU_HLT = 1ul 7, CPU_INVLPG = 1ul 9, + CPU_MWAIT = 1ul 10, + CPU_RDPMC = 1ul 11, + CPU_RDTSC = 1ul 12, CPU_CR3_LOAD= 1ul 15, CPU_CR3_STORE = 1ul 16, CPU_TPR_SHADOW = 1ul 21, @@ -361,6 +364,8 @@ enum Ctrl0 { CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, CPU_MSR_BITMAP = 1ul 28, + CPU_MONITOR = 1ul 29, + CPU_PAUSE = 1ul 30, CPU_SECONDARY = 1ul 31, }; @@ -368,6 +373,8 @@ enum Ctrl1 { CPU_EPT = 1ul 1, CPU_VPID= 1ul 5, CPU_URG = 1ul 7, + CPU_WBINVD = 1ul 6, + CPU_RDRAND = 1ul 11, }; #define SAVE_GPR \ diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index ad28c4c..66187f4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -20,6 +20,13 @@ static inline void set_stage(u32 s) asm volatile(mov %0, stage\n\t::r(s):memory, cc); } +static inline u32 get_stage() +{ + u32 s; + asm volatile(mov stage, %0\n\t:=r(s)::memory, cc); + return s; +} Tagging stage volatile will obsolete this special assembly. + void basic_init() { } @@ -638,6 +645,114 @@ static int iobmp_exit_handler() return VMX_TEST_VMEXIT; } +asm( + insn_hlt: hlt;ret\n\t + insn_invlpg: invlpg 0x12345678;ret\n\t + insn_mwait: mwait;ret\n\t + insn_rdpmc: rdpmc;ret\n\t + insn_rdtsc: rdtsc;ret\n\t + insn_monitor: monitor;ret\n\t + insn_pause: pause;ret\n\t + insn_wbinvd: wbinvd;ret\n\t + insn_cpuid: cpuid;ret\n\t + insn_invd: invd;ret\n\t +); +extern void insn_hlt(); +extern void insn_invlpg(); +extern void insn_mwait(); +extern void insn_rdpmc(); +extern void insn_rdtsc(); +extern void insn_monitor(); +extern void insn_pause(); +extern void insn_wbinvd(); +extern void insn_cpuid(); +extern void insn_invd(); + +u32 cur_insn; + +struct insn_table { + const char *name; + u32 flag; + void (*insn_func)(); + u32 type; What do the type values mean? For intercepted instructions we have three type: controlled by Primary Processor-Based VM-Execution Controls, controlled by Secondary Controls and always intercepted. The testing process is different for different types. This was a rhetorical questions. ;) Could you make the values symbolic? OK. It's better to rename it to ctrl_field and define some macros such as CTRL_CPU0, CTRL_CPU1, CTRL_NONE to make it more readable. + u32 reason; + ulong exit_qual; + u32 insn_info; For none of the instructions you test, EXI_INST_INFO will have valid content on exit. So you must not check it anyway. Actually , RDRAND uses EXI_INST_INFO though it is not supported now. Since for all intercepts these three vmcs fields are enough to determine everything, I put it here for future use. OK, but don't test its value when it's undefined - like in all cases implemented here. Only test the used field will make it more complex because we need to define which field is used in insn_table. Besides, if any
Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps
On Thu, Aug 15, 2013 at 4:23 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 10:20, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 4:13 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 10:09, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 3:58 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-15 09:51, Arthur Chunqi Li wrote: On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-13 17:56, Arthur Chunqi Li wrote: Add test cases for I/O bitmaps, including corner cases. Would be good to briefly list the corner cases here. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |6 +- x86/vmx_tests.c | 167 +++ 2 files changed, 170 insertions(+), 3 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index 18961f1..dba8b20 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -417,15 +417,15 @@ enum Ctrl1 { popf\n\t #define VMX_IO_SIZE_MASK 0x7 -#define _VMX_IO_BYTE 1 -#define _VMX_IO_WORD 2 +#define _VMX_IO_BYTE 0 +#define _VMX_IO_WORD 1 #define _VMX_IO_LONG 3 #define VMX_IO_DIRECTION_MASK(1ul 3) #define VMX_IO_IN(1ul 3) #define VMX_IO_OUT 0 #define VMX_IO_STRING(1ul 4) #define VMX_IO_REP (1ul 5) -#define VMX_IO_OPRAND_DX (1ul 6) +#define VMX_IO_OPRAND_IMM(1ul 6) #define VMX_IO_PORT_MASK 0x #define VMX_IO_PORT_SHIFT16 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 44be3f4..ad28c4c 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -2,10 +2,13 @@ #include msr.h #include processor.h #include vm.h +#include io.h u64 ia32_pat; u64 ia32_efer; u32 stage; +void *io_bitmap_a, *io_bitmap_b; +u16 ioport; static inline void vmcall() { @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler() return VMX_TEST_VMEXIT; } +static void iobmp_init() +{ + u32 ctrl_cpu0; + + io_bitmap_a = alloc_page(); + io_bitmap_a = alloc_page(); + memset(io_bitmap_a, 0x0, PAGE_SIZE); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO_BITMAP; + ctrl_cpu0 = (~CPU_IO); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a); + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b); +} + +static void iobmp_main() +{ +/* + data = (u8 *)io_bitmap_b; + ioport = 0x; + data[(ioport - 0x8000) /8] |= (1 (ioport % 8)); + inb(ioport); + outb(0, ioport); +*/ Forgotten debug code? + // stage 0, test IO pass + set_stage(0); + inb(0x5000); + outb(0x0, 0x5000); + if (stage != 0) + report(I/O bitmap - I/O pass, 0); + else + report(I/O bitmap - I/O pass, 1); + // test IO width, in/out + ((u8 *)io_bitmap_a)[0] = 0xFF; + set_stage(2); + inb(0x0); + if (stage != 3) + report(I/O bitmap - trap in, 0); + else + report(I/O bitmap - trap in, 1); + set_stage(3); + outw(0x0, 0x0); + if (stage != 4) + report(I/O bitmap - trap out, 0); + else + report(I/O bitmap - trap out, 1); + set_stage(4); + inl(0x0); Forgot to check the progress? + // test low/high IO port + set_stage(5); + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1 (0x5000 % 8)); + inb(0x5000); + if (stage == 6) + report(I/O bitmap - I/O port, low part, 1); + else + report(I/O bitmap - I/O port, low part, 0); + set_stage(6); + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1 (0x1000 % 8)); + inb(0x9000); + if (stage == 7) + report(I/O bitmap - I/O port, high part, 1); + else + report(I/O bitmap - I/O port, high part, 0); + // test partial pass + set_stage(7); + inl(0x4FFF); + if (stage == 8) + report(I/O bitmap - partial pass, 1); + else + report(I/O bitmap - partial pass, 0); + // test overrun + set_stage(8); + memset(io_bitmap_b, 0xFF, PAGE_SIZE); + inl(0x); Let's check the expected stage also here. The check is below if (stage == 9), the following memset is just used to prevent I/O mask to printf. Right, there is an i/o instruction missing below after the second memset - or I cannot follow what you are trying to test. The above inl would always trigger, independent of the wrap-around. Only if you clear both bitmaps, we get to the interesting scenario. So something is still wrong here, no? Yes, we need to memset io_bit_map_a to 0 here. The above inl and the test if (stage == 9) are cooperatively used to test I/O overrun: test 4 bits width in to 0x. The point
[PATCH v2 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps
Add test cases for I/O bitmaps, including corner cases. Test includes: pass trap, in out, different I/O width, low high I/O bitmap, partial I/O pass, overrun (inl 0x). Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |6 +-- x86/vmx_tests.c | 159 +++ 2 files changed, 162 insertions(+), 3 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index 18961f1..dba8b20 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -417,15 +417,15 @@ enum Ctrl1 { popf\n\t #define VMX_IO_SIZE_MASK 0x7 -#define _VMX_IO_BYTE 1 -#define _VMX_IO_WORD 2 +#define _VMX_IO_BYTE 0 +#define _VMX_IO_WORD 1 #define _VMX_IO_LONG 3 #define VMX_IO_DIRECTION_MASK (1ul 3) #define VMX_IO_IN (1ul 3) #define VMX_IO_OUT 0 #define VMX_IO_STRING (1ul 4) #define VMX_IO_REP (1ul 5) -#define VMX_IO_OPRAND_DX (1ul 6) +#define VMX_IO_OPRAND_IMM (1ul 6) #define VMX_IO_PORT_MASK 0x #define VMX_IO_PORT_SHIFT 16 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index a5cc353..cd4dd99 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -2,10 +2,13 @@ #include msr.h #include processor.h #include vm.h +#include io.h u64 ia32_pat; u64 ia32_efer; volatile u32 stage; +void *io_bitmap_a, *io_bitmap_b; +u16 ioport; static inline void vmcall() { @@ -473,6 +476,160 @@ static int cr_shadowing_exit_handler() return VMX_TEST_VMEXIT; } +static void iobmp_init() +{ + u32 ctrl_cpu0; + + io_bitmap_a = alloc_page(); + io_bitmap_a = alloc_page(); + memset(io_bitmap_a, 0x0, PAGE_SIZE); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO_BITMAP; + ctrl_cpu0 = (~CPU_IO); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a); + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b); +} + +static void iobmp_main() +{ + // stage 0, test IO pass + set_stage(0); + inb(0x5000); + outb(0x0, 0x5000); + if (stage != 0) + report(I/O bitmap - I/O pass, 0); + else + report(I/O bitmap - I/O pass, 1); + // test IO width, in/out + ((u8 *)io_bitmap_a)[0] = 0xFF; + set_stage(2); + inb(0x0); + if (stage != 3) + report(I/O bitmap - trap in, 0); + else + report(I/O bitmap - trap in, 1); + set_stage(3); + outw(0x0, 0x0); + if (stage != 4) + report(I/O bitmap - trap out, 0); + else + report(I/O bitmap - trap out, 1); + set_stage(4); + inl(0x0); + if (stage != 5) + report(I/O bitmap - I/O width, long, 0); + // test low/high IO port + set_stage(5); + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1 (0x5000 % 8)); + inb(0x5000); + if (stage == 6) + report(I/O bitmap - I/O port, low part, 1); + else + report(I/O bitmap - I/O port, low part, 0); + set_stage(6); + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1 (0x1000 % 8)); + inb(0x9000); + if (stage == 7) + report(I/O bitmap - I/O port, high part, 1); + else + report(I/O bitmap - I/O port, high part, 0); + // test partial pass + set_stage(7); + inl(0x4FFF); + if (stage == 8) + report(I/O bitmap - partial pass, 1); + else + report(I/O bitmap - partial pass, 0); + // test overrun + set_stage(8); + memset(io_bitmap_a, 0x0, PAGE_SIZE); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + inl(0x); + if (stage == 9) + report(I/O bitmap - overrun, 1); + else + report(I/O bitmap - overrun, 0); + + return; +} + +static int iobmp_exit_handler() +{ + u64 guest_rip; + ulong reason, exit_qual; + u32 insn_len; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + exit_qual = vmcs_read(EXI_QUALIFICATION); + insn_len = vmcs_read(EXI_INST_LEN); + switch (reason) { + case VMX_IO: + switch (stage) { + case 2: + if ((exit_qual VMX_IO_SIZE_MASK) != _VMX_IO_BYTE) + report(I/O bitmap - I/O width, byte, 0); + else + report(I/O bitmap - I/O width, byte, 1); + if (!(exit_qual VMX_IO_IN)) + report(I/O bitmap - I/O direction, in, 0); + else + report(I/O bitmap - I/O direction, in, 1); + set_stage(stage + 1
[PATCH v2 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER
Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT, EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit control fields. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |7 +++ x86/vmx_tests.c | 185 +++ 2 files changed, 192 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..18961f1 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -152,10 +152,12 @@ enum Encoding { GUEST_DEBUGCTL = 0x2802ul, GUEST_DEBUGCTL_HI = 0x2803ul, GUEST_EFER = 0x2806ul, + GUEST_PAT = 0x2804ul, GUEST_PERF_GLOBAL_CTRL = 0x2808ul, GUEST_PDPTE = 0x280aul, /* 64-Bit Host State */ + HOST_PAT= 0x2c00ul, HOST_EFER = 0x2c02ul, HOST_PERF_GLOBAL_CTRL = 0x2c04ul, @@ -330,11 +332,15 @@ enum Ctrl_exi { EXI_HOST_64 = 1UL 9, EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, + EXI_SAVE_PAT= 1UL 18, + EXI_LOAD_PAT= 1UL 19, + EXI_SAVE_EFER = 1UL 20, EXI_LOAD_EFER = 1UL 21, }; enum Ctrl_ent { ENT_GUEST_64= 1UL 9, + ENT_LOAD_PAT= 1UL 14, ENT_LOAD_EFER = 1UL 15, }; @@ -354,6 +360,7 @@ enum Ctrl0 { CPU_NMI_WINDOW = 1ul 22, CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, + CPU_MSR_BITMAP = 1ul 28, CPU_SECONDARY = 1ul 31, }; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..61b0cef 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,15 @@ #include vmx.h +#include msr.h +#include processor.h +#include vm.h + +u64 ia32_pat; +u64 ia32_efer; + +static inline void vmcall() +{ + asm volatile(vmcall); +} void basic_init() { @@ -76,6 +87,176 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +void msr_bmp_init() +{ + void *msr_bitmap; + u32 ctrl_cpu0; + + msr_bitmap = alloc_page(); + memset(msr_bitmap, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_MSR_BITMAP; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(MSR_BITMAP, (u64)msr_bitmap); +} + +static void test_ctrl_pat_init() +{ + u64 ctrl_ent; + u64 ctrl_exi; + + msr_bmp_init(); + ctrl_ent = vmcs_read(ENT_CONTROLS); + ctrl_exi = vmcs_read(EXI_CONTROLS); + vmcs_write(ENT_CONTROLS, ctrl_ent | ENT_LOAD_PAT); + vmcs_write(EXI_CONTROLS, ctrl_exi | (EXI_SAVE_PAT | EXI_LOAD_PAT)); + ia32_pat = rdmsr(MSR_IA32_CR_PAT); + vmcs_write(GUEST_PAT, 0x0); + vmcs_write(HOST_PAT, ia32_pat); +} + +static void test_ctrl_pat_main() +{ + u64 guest_ia32_pat; + + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT); + if (!(ctrl_enter_rev.clr ENT_LOAD_PAT)) + printf(\tENT_LOAD_PAT is not supported.\n); + else { + if (guest_ia32_pat != 0) { + report(Entry load PAT, 0); + return; + } + } + wrmsr(MSR_IA32_CR_PAT, 0x6); + vmcall(); + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT); + if (ctrl_enter_rev.clr ENT_LOAD_PAT) { + if (guest_ia32_pat != ia32_pat) { + report(Entry load PAT, 0); + return; + } + report(Entry load PAT, 1); + } +} + +static int test_ctrl_pat_exit_handler() +{ + u64 guest_rip; + ulong reason; + u64 guest_pat; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + switch (reason) { + case VMX_VMCALL: + guest_pat = vmcs_read(GUEST_PAT); + if (!(ctrl_exit_rev.clr EXI_SAVE_PAT)) { + printf(\tEXI_SAVE_PAT is not supported\n); + vmcs_write(GUEST_PAT, 0x6); + } else { + if (guest_pat == 0x6) + report(Exit save PAT, 1); + else + report(Exit save PAT, 0); + } + if (!(ctrl_exit_rev.clr EXI_LOAD_PAT)) + printf(\tEXI_LOAD_PAT is not supported\n); + else { + if (rdmsr(MSR_IA32_CR_PAT) == ia32_pat) + report(Exit load PAT, 1); + else + report(Exit load PAT, 0); + } + vmcs_write(GUEST_PAT, ia32_pat); + vmcs_write(GUEST_RIP, guest_rip + 3); + return VMX_TEST_RESUME; + default: + printf(ERROR : Undefined exit reason, reason = %d.\n, reason); + break; + } + return
[PATCH v2 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing
Add testing for CR0/4 shadowing. Two types of flags in CR0/4 are tested: flags owned and shadowed by L1. They are treated differently in KVM. We test one flag of both types in CR0 (TS and MP) and CR4 (DE and TSD) with read through, read shadow, write through, write shadow (same as and different from shadowed value). Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- lib/x86/vm.h|4 + x86/vmx_tests.c | 218 +++ 2 files changed, 222 insertions(+) diff --git a/lib/x86/vm.h b/lib/x86/vm.h index eff6f72..6e0ce2b 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -17,9 +17,13 @@ #define PTE_ADDR(0xff000ull) #define X86_CR0_PE 0x0001 +#define X86_CR0_MP 0x0002 +#define X86_CR0_TS 0x0008 #define X86_CR0_WP 0x0001 #define X86_CR0_PG 0x8000 #define X86_CR4_VMXE 0x0001 +#define X86_CR4_TSD 0x0004 +#define X86_CR4_DE 0x0008 #define X86_CR4_PSE 0x0010 #define X86_CR4_PAE 0x0020 #define X86_CR4_PCIDE 0x0002 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 61b0cef..a5cc353 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -5,12 +5,20 @@ u64 ia32_pat; u64 ia32_efer; +volatile u32 stage; static inline void vmcall() { asm volatile(vmcall); } +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + void basic_init() { } @@ -257,6 +265,214 @@ static int test_ctrl_efer_exit_handler() return VMX_TEST_VMEXIT; } +u32 guest_cr0, guest_cr4; + +static void cr_shadowing_main() +{ + u32 cr0, cr4, tmp; + + // Test read through + set_stage(0); + guest_cr0 = read_cr0(); + if (stage == 1) + report(Read through CR0, 0); + else + vmcall(); + set_stage(1); + guest_cr4 = read_cr4(); + if (stage == 2) + report(Read through CR4, 0); + else + vmcall(); + // Test write through + guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP); + guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE); + set_stage(2); + write_cr0(guest_cr0); + if (stage == 3) + report(Write throuth CR0, 0); + else + vmcall(); + set_stage(3); + write_cr4(guest_cr4); + if (stage == 4) + report(Write through CR4, 0); + else + vmcall(); + // Test read shadow + set_stage(4); + vmcall(); + cr0 = read_cr0(); + if (stage != 5) { + if (cr0 == guest_cr0) + report(Read shadowing CR0, 1); + else + report(Read shadowing CR0, 0); + } + set_stage(5); + cr4 = read_cr4(); + if (stage != 6) { + if (cr4 == guest_cr4) + report(Read shadowing CR4, 1); + else + report(Read shadowing CR4, 0); + } + // Test write shadow (same value with shadow) + set_stage(6); + write_cr0(guest_cr0); + if (stage == 7) + report(Write shadowing CR0 (same value with shadow), 0); + else + vmcall(); + set_stage(7); + write_cr4(guest_cr4); + if (stage == 8) + report(Write shadowing CR4 (same value with shadow), 0); + else + vmcall(); + // Test write shadow (different value) + set_stage(8); + tmp = guest_cr0 ^ X86_CR0_TS; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 9) + report(Write shadowing different X86_CR0_TS, 0); + else + report(Write shadowing different X86_CR0_TS, 1); + set_stage(9); + tmp = guest_cr0 ^ X86_CR0_MP; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 10) + report(Write shadowing different X86_CR0_MP, 0); + else + report(Write shadowing different X86_CR0_MP, 1); + set_stage(10); + tmp = guest_cr4 ^ X86_CR4_TSD; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr4\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 11) + report(Write shadowing different X86_CR4_TSD, 0); + else + report(Write shadowing different X86_CR4_TSD, 1); + set_stage(11); + tmp = guest_cr4 ^ X86_CR4_DE; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr4\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 12) + report(Write shadowing different X86_CR4_DE, 0); + else + report(Write shadowing different X86_CR4_DE, 1); +} + +static int
[PATCH v2 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception
Add test cases for instruction interception, including four types: 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/ RDPMC/RDTSC/MONITOR/PAUSE) 2. Secondary Processor-Based VM-Execution Controls (WBINVD) 3. No control flag, always trap (CPUID/INVD) 4. Instructions always pass Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c |3 +- x86/vmx.h |7 +++ x86/vmx_tests.c | 152 +++ 3 files changed, 160 insertions(+), 2 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..c346070 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -336,8 +336,7 @@ static void init_vmx(void) : MSR_IA32_VMX_ENTRY_CTLS); ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC : MSR_IA32_VMX_PROCBASED_CTLS); - if (ctrl_cpu_rev[0].set CPU_SECONDARY) - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); if (ctrl_cpu_rev[1].set CPU_EPT || ctrl_cpu_rev[1].set CPU_VPID) ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP); diff --git a/x86/vmx.h b/x86/vmx.h index dba8b20..2784ac6 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -354,12 +354,17 @@ enum Ctrl0 { CPU_INTR_WINDOW = 1ul 2, CPU_HLT = 1ul 7, CPU_INVLPG = 1ul 9, + CPU_MWAIT = 1ul 10, + CPU_RDPMC = 1ul 11, + CPU_RDTSC = 1ul 12, CPU_CR3_LOAD= 1ul 15, CPU_CR3_STORE = 1ul 16, CPU_TPR_SHADOW = 1ul 21, CPU_NMI_WINDOW = 1ul 22, CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, + CPU_MONITOR = 1ul 29, + CPU_PAUSE = 1ul 30, CPU_MSR_BITMAP = 1ul 28, CPU_SECONDARY = 1ul 31, }; @@ -368,6 +373,8 @@ enum Ctrl1 { CPU_EPT = 1ul 1, CPU_VPID= 1ul 5, CPU_URG = 1ul 7, + CPU_WBINVD = 1ul 6, + CPU_RDRAND = 1ul 11, }; #define SAVE_GPR \ diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index cd4dd99..be3e3b4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -22,6 +22,16 @@ static inline void set_stage(u32 s) barrier(); } +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} + void basic_init() { } @@ -630,6 +640,146 @@ static int iobmp_exit_handler() return VMX_TEST_VMEXIT; } +#define INSN_CPU0 0 +#define INSN_CPU1 1 +#define INSN_ALWAYS_TRAP 2 +#define INSN_NEVER_TRAP3 + +#define FIELD_EXIT_QUAL0 +#define FIELD_INSN_INFO1 + +asm( + insn_hlt: hlt;ret\n\t + insn_invlpg: invlpg 0x12345678;ret\n\t + insn_mwait: mwait;ret\n\t + insn_rdpmc: rdpmc;ret\n\t + insn_rdtsc: rdtsc;ret\n\t + insn_monitor: monitor;ret\n\t + insn_pause: pause;ret\n\t + insn_wbinvd: wbinvd;ret\n\t + insn_cpuid: cpuid;ret\n\t + insn_invd: invd;ret\n\t +); +extern void insn_hlt(); +extern void insn_invlpg(); +extern void insn_mwait(); +extern void insn_rdpmc(); +extern void insn_rdtsc(); +extern void insn_monitor(); +extern void insn_pause(); +extern void insn_wbinvd(); +extern void insn_cpuid(); +extern void insn_invd(); + +u32 cur_insn; + +struct insn_table { + const char *name; + u32 flag; + void (*insn_func)(); + u32 type; + u32 reason; + ulong exit_qual; + u32 insn_info; + // Use FIELD_EXIT_QUAL and FIELD_INSN_INFO to efines + // which field need to be tested, reason is always tested + u32 test_field; +}; + +static struct insn_table insn_table[] = { + // Flags for Primary Processor-Based VM-Execution Controls + {HLT, CPU_HLT, insn_hlt, INSN_CPU0, 12, 0, 0, 0}, + {INVLPG, CPU_INVLPG, insn_invlpg, INSN_CPU0, 14, + 0x12345678, 0, FIELD_EXIT_QUAL}, + {MWAIT, CPU_MWAIT, insn_mwait, INSN_CPU0, 36, 0, 0, 0}, + {RDPMC, CPU_RDPMC, insn_rdpmc, INSN_CPU0, 15, 0, 0, 0}, + {RDTSC, CPU_RDTSC, insn_rdtsc, INSN_CPU0, 16, 0, 0, 0}, + {MONITOR, CPU_MONITOR, insn_monitor, INSN_CPU0, 39, 0, 0, 0}, + {PAUSE, CPU_PAUSE, insn_pause, INSN_CPU0, 40, 0, 0, 0}, + // Flags for Secondary Processor-Based VM-Execution Controls + {WBINVD, CPU_WBINVD, insn_wbinvd, INSN_CPU1, 54, 0, 0, 0}, + // Instructions always trap + {CPUID, 0, insn_cpuid, INSN_ALWAYS_TRAP, 10, 0, 0, 0}, + {INVD, 0, insn_invd, INSN_ALWAYS_TRAP, 13, 0, 0, 0}, + // Instructions never trap + {NULL}, +}; + +static void insn_intercept_init() +{ + u32 ctrl_cpu[2
[PATCH v2 0/4] kvm-unit-tests: Add a series of test cases
Add a series of test cases for nested VMX in kvm-unit-tests. Arthur Chunqi Li (4): kvm-unit-tests: VMX: Add test cases for PAT and EFER kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing kvm-unit-tests: VMX: Add test cases for I/O bitmaps kvm-unit-tests: VMX: Add test cases for instruction interception lib/x86/vm.h|4 + x86/vmx.c |3 +- x86/vmx.h | 20 +- x86/vmx_tests.c | 714 +++ 4 files changed, 736 insertions(+), 5 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] kvm-unit-tests: Add a series of test cases
Add a series of test cases for nested VMX in kvm-unit-tests. Arthur Chunqi Li (4): kvm-unit-tests: VMX: Add test cases for PAT and EFER kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing kvm-unit-tests: VMX: Add test cases for I/O bitmaps kvm-unit-tests: VMX: Add test cases for instruction interception lib/x86/vm.h|4 + x86/vmx.c |3 +- x86/vmx.h | 20 +- x86/vmx_tests.c | 687 +++ 4 files changed, 709 insertions(+), 5 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps
Add test cases for I/O bitmaps, including corner cases. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |6 +- x86/vmx_tests.c | 167 +++ 2 files changed, 170 insertions(+), 3 deletions(-) diff --git a/x86/vmx.h b/x86/vmx.h index 18961f1..dba8b20 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -417,15 +417,15 @@ enum Ctrl1 { popf\n\t #define VMX_IO_SIZE_MASK 0x7 -#define _VMX_IO_BYTE 1 -#define _VMX_IO_WORD 2 +#define _VMX_IO_BYTE 0 +#define _VMX_IO_WORD 1 #define _VMX_IO_LONG 3 #define VMX_IO_DIRECTION_MASK (1ul 3) #define VMX_IO_IN (1ul 3) #define VMX_IO_OUT 0 #define VMX_IO_STRING (1ul 4) #define VMX_IO_REP (1ul 5) -#define VMX_IO_OPRAND_DX (1ul 6) +#define VMX_IO_OPRAND_IMM (1ul 6) #define VMX_IO_PORT_MASK 0x #define VMX_IO_PORT_SHIFT 16 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 44be3f4..ad28c4c 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -2,10 +2,13 @@ #include msr.h #include processor.h #include vm.h +#include io.h u64 ia32_pat; u64 ia32_efer; u32 stage; +void *io_bitmap_a, *io_bitmap_b; +u16 ioport; static inline void vmcall() { @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler() return VMX_TEST_VMEXIT; } +static void iobmp_init() +{ + u32 ctrl_cpu0; + + io_bitmap_a = alloc_page(); + io_bitmap_a = alloc_page(); + memset(io_bitmap_a, 0x0, PAGE_SIZE); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO_BITMAP; + ctrl_cpu0 = (~CPU_IO); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a); + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b); +} + +static void iobmp_main() +{ +/* + data = (u8 *)io_bitmap_b; + ioport = 0x; + data[(ioport - 0x8000) /8] |= (1 (ioport % 8)); + inb(ioport); + outb(0, ioport); +*/ + // stage 0, test IO pass + set_stage(0); + inb(0x5000); + outb(0x0, 0x5000); + if (stage != 0) + report(I/O bitmap - I/O pass, 0); + else + report(I/O bitmap - I/O pass, 1); + // test IO width, in/out + ((u8 *)io_bitmap_a)[0] = 0xFF; + set_stage(2); + inb(0x0); + if (stage != 3) + report(I/O bitmap - trap in, 0); + else + report(I/O bitmap - trap in, 1); + set_stage(3); + outw(0x0, 0x0); + if (stage != 4) + report(I/O bitmap - trap out, 0); + else + report(I/O bitmap - trap out, 1); + set_stage(4); + inl(0x0); + // test low/high IO port + set_stage(5); + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1 (0x5000 % 8)); + inb(0x5000); + if (stage == 6) + report(I/O bitmap - I/O port, low part, 1); + else + report(I/O bitmap - I/O port, low part, 0); + set_stage(6); + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1 (0x1000 % 8)); + inb(0x9000); + if (stage == 7) + report(I/O bitmap - I/O port, high part, 1); + else + report(I/O bitmap - I/O port, high part, 0); + // test partial pass + set_stage(7); + inl(0x4FFF); + if (stage == 8) + report(I/O bitmap - partial pass, 1); + else + report(I/O bitmap - partial pass, 0); + // test overrun + set_stage(8); + memset(io_bitmap_b, 0xFF, PAGE_SIZE); + inl(0x); + memset(io_bitmap_b, 0x0, PAGE_SIZE); + if (stage == 9) + report(I/O bitmap - overrun, 1); + else + report(I/O bitmap - overrun, 0); + + return; +} + +static int iobmp_exit_handler() +{ + u64 guest_rip; + ulong reason, exit_qual; + u32 insn_len; + //u32 ctrl_cpu0; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + exit_qual = vmcs_read(EXI_QUALIFICATION); + insn_len = vmcs_read(EXI_INST_LEN); + switch (reason) { + case VMX_IO: + switch (stage) { + case 2: + if ((exit_qual VMX_IO_SIZE_MASK) != _VMX_IO_BYTE) + report(I/O bitmap - I/O width, byte, 0); + else + report(I/O bitmap - I/O width, byte, 1); + if (!(exit_qual VMX_IO_IN)) + report(I/O bitmap - I/O direction, in, 0); + else + report(I/O bitmap - I/O direction, in, 1); + set_stage(stage + 1
[PATCH 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER
Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT, EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit control fields. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |7 +++ x86/vmx_tests.c | 185 +++ 2 files changed, 192 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..18961f1 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -152,10 +152,12 @@ enum Encoding { GUEST_DEBUGCTL = 0x2802ul, GUEST_DEBUGCTL_HI = 0x2803ul, GUEST_EFER = 0x2806ul, + GUEST_PAT = 0x2804ul, GUEST_PERF_GLOBAL_CTRL = 0x2808ul, GUEST_PDPTE = 0x280aul, /* 64-Bit Host State */ + HOST_PAT= 0x2c00ul, HOST_EFER = 0x2c02ul, HOST_PERF_GLOBAL_CTRL = 0x2c04ul, @@ -330,11 +332,15 @@ enum Ctrl_exi { EXI_HOST_64 = 1UL 9, EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, + EXI_SAVE_PAT= 1UL 18, + EXI_LOAD_PAT= 1UL 19, + EXI_SAVE_EFER = 1UL 20, EXI_LOAD_EFER = 1UL 21, }; enum Ctrl_ent { ENT_GUEST_64= 1UL 9, + ENT_LOAD_PAT= 1UL 14, ENT_LOAD_EFER = 1UL 15, }; @@ -354,6 +360,7 @@ enum Ctrl0 { CPU_NMI_WINDOW = 1ul 22, CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, + CPU_MSR_BITMAP = 1ul 28, CPU_SECONDARY = 1ul 31, }; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..61b0cef 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,15 @@ #include vmx.h +#include msr.h +#include processor.h +#include vm.h + +u64 ia32_pat; +u64 ia32_efer; + +static inline void vmcall() +{ + asm volatile(vmcall); +} void basic_init() { @@ -76,6 +87,176 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +void msr_bmp_init() +{ + void *msr_bitmap; + u32 ctrl_cpu0; + + msr_bitmap = alloc_page(); + memset(msr_bitmap, 0x0, PAGE_SIZE); + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_MSR_BITMAP; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + vmcs_write(MSR_BITMAP, (u64)msr_bitmap); +} + +static void test_ctrl_pat_init() +{ + u64 ctrl_ent; + u64 ctrl_exi; + + msr_bmp_init(); + ctrl_ent = vmcs_read(ENT_CONTROLS); + ctrl_exi = vmcs_read(EXI_CONTROLS); + vmcs_write(ENT_CONTROLS, ctrl_ent | ENT_LOAD_PAT); + vmcs_write(EXI_CONTROLS, ctrl_exi | (EXI_SAVE_PAT | EXI_LOAD_PAT)); + ia32_pat = rdmsr(MSR_IA32_CR_PAT); + vmcs_write(GUEST_PAT, 0x0); + vmcs_write(HOST_PAT, ia32_pat); +} + +static void test_ctrl_pat_main() +{ + u64 guest_ia32_pat; + + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT); + if (!(ctrl_enter_rev.clr ENT_LOAD_PAT)) + printf(\tENT_LOAD_PAT is not supported.\n); + else { + if (guest_ia32_pat != 0) { + report(Entry load PAT, 0); + return; + } + } + wrmsr(MSR_IA32_CR_PAT, 0x6); + vmcall(); + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT); + if (ctrl_enter_rev.clr ENT_LOAD_PAT) { + if (guest_ia32_pat != ia32_pat) { + report(Entry load PAT, 0); + return; + } + report(Entry load PAT, 1); + } +} + +static int test_ctrl_pat_exit_handler() +{ + u64 guest_rip; + ulong reason; + u64 guest_pat; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + switch (reason) { + case VMX_VMCALL: + guest_pat = vmcs_read(GUEST_PAT); + if (!(ctrl_exit_rev.clr EXI_SAVE_PAT)) { + printf(\tEXI_SAVE_PAT is not supported\n); + vmcs_write(GUEST_PAT, 0x6); + } else { + if (guest_pat == 0x6) + report(Exit save PAT, 1); + else + report(Exit save PAT, 0); + } + if (!(ctrl_exit_rev.clr EXI_LOAD_PAT)) + printf(\tEXI_LOAD_PAT is not supported\n); + else { + if (rdmsr(MSR_IA32_CR_PAT) == ia32_pat) + report(Exit load PAT, 1); + else + report(Exit load PAT, 0); + } + vmcs_write(GUEST_PAT, ia32_pat); + vmcs_write(GUEST_RIP, guest_rip + 3); + return VMX_TEST_RESUME; + default: + printf(ERROR : Undefined exit reason, reason = %d.\n, reason); + break; + } + return
[PATCH 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing
Add testing for CR0/4 shadowing. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- lib/x86/vm.h|4 + x86/vmx_tests.c | 218 +++ 2 files changed, 222 insertions(+) diff --git a/lib/x86/vm.h b/lib/x86/vm.h index eff6f72..6e0ce2b 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -17,9 +17,13 @@ #define PTE_ADDR(0xff000ull) #define X86_CR0_PE 0x0001 +#define X86_CR0_MP 0x0002 +#define X86_CR0_TS 0x0008 #define X86_CR0_WP 0x0001 #define X86_CR0_PG 0x8000 #define X86_CR4_VMXE 0x0001 +#define X86_CR4_TSD 0x0004 +#define X86_CR4_DE 0x0008 #define X86_CR4_PSE 0x0010 #define X86_CR4_PAE 0x0020 #define X86_CR4_PCIDE 0x0002 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 61b0cef..44be3f4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -5,12 +5,18 @@ u64 ia32_pat; u64 ia32_efer; +u32 stage; static inline void vmcall() { asm volatile(vmcall); } +static inline void set_stage(u32 s) +{ + asm volatile(mov %0, stage\n\t::r(s):memory, cc); +} + void basic_init() { } @@ -257,6 +263,216 @@ static int test_ctrl_efer_exit_handler() return VMX_TEST_VMEXIT; } +u32 guest_cr0, guest_cr4; + +static void cr_shadowing_main() +{ + u32 cr0, cr4, tmp; + + // Test read through + set_stage(0); + guest_cr0 = read_cr0(); + if (stage == 1) + report(Read through CR0, 0); + else + vmcall(); + set_stage(1); + guest_cr4 = read_cr4(); + if (stage == 2) + report(Read through CR4, 0); + else + vmcall(); + // Test write through + guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP); + guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE); + set_stage(2); + write_cr0(guest_cr0); + if (stage == 3) + report(Write throuth CR0, 0); + else + vmcall(); + set_stage(3); + write_cr4(guest_cr4); + if (stage == 4) + report(Write through CR4, 0); + else + vmcall(); + // Test read shadow + set_stage(4); + vmcall(); + cr0 = read_cr0(); + if (stage != 5) { + if (cr0 == guest_cr0) + report(Read shadowing CR0, 1); + else + report(Read shadowing CR0, 0); + } + set_stage(5); + cr4 = read_cr4(); + if (stage != 6) { + if (cr4 == guest_cr4) + report(Read shadowing CR4, 1); + else + report(Read shadowing CR4, 0); + } + // Test write shadow (same value with shadow) + set_stage(6); + write_cr0(guest_cr0); + if (stage == 7) + report(Write shadowing CR0 (same value with shadow), 0); + else + vmcall(); + set_stage(7); + write_cr4(guest_cr4); + if (stage == 8) + report(Write shadowing CR4 (same value with shadow), 0); + else + vmcall(); + // Test write shadow (different value) + set_stage(8); + tmp = guest_cr0 ^ X86_CR0_TS; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 9) + report(Write shadowing different X86_CR0_TS, 0); + else + report(Write shadowing different X86_CR0_TS, 1); + set_stage(9); + tmp = guest_cr0 ^ X86_CR0_MP; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr0\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 10) + report(Write shadowing different X86_CR0_MP, 0); + else + report(Write shadowing different X86_CR0_MP, 1); + set_stage(10); + tmp = guest_cr4 ^ X86_CR4_TSD; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr4\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 11) + report(Write shadowing different X86_CR4_TSD, 0); + else + report(Write shadowing different X86_CR4_TSD, 1); + set_stage(11); + tmp = guest_cr4 ^ X86_CR4_DE; + asm volatile(mov %0, %%rsi\n\t + mov %%rsi, %%cr4\n\t + ::m(tmp) + :rsi, memory, cc); + if (stage != 12) + report(Write shadowing different X86_CR4_DE, 0); + else + report(Write shadowing different X86_CR4_DE, 1); +} + +static int cr_shadowing_exit_handler() +{ + u64 guest_rip; + ulong reason; + u32 insn_len; + u32 exit_qual; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + insn_len = vmcs_read(EXI_INST_LEN); + exit_qual = vmcs_read
[PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception
Add test cases for instruction interception, including three types: 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/ RDPMC/RDTSC/MONITOR/PAUSE) 2. Secondary Processor-Based VM-Execution Controls (WBINVD) 3. No control flag (CPUID/INVD) Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c |3 +- x86/vmx.h |7 x86/vmx_tests.c | 117 +++ 3 files changed, 125 insertions(+), 2 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..c346070 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -336,8 +336,7 @@ static void init_vmx(void) : MSR_IA32_VMX_ENTRY_CTLS); ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC : MSR_IA32_VMX_PROCBASED_CTLS); - if (ctrl_cpu_rev[0].set CPU_SECONDARY) - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2); if (ctrl_cpu_rev[1].set CPU_EPT || ctrl_cpu_rev[1].set CPU_VPID) ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP); diff --git a/x86/vmx.h b/x86/vmx.h index dba8b20..d81d25d 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -354,6 +354,9 @@ enum Ctrl0 { CPU_INTR_WINDOW = 1ul 2, CPU_HLT = 1ul 7, CPU_INVLPG = 1ul 9, + CPU_MWAIT = 1ul 10, + CPU_RDPMC = 1ul 11, + CPU_RDTSC = 1ul 12, CPU_CR3_LOAD= 1ul 15, CPU_CR3_STORE = 1ul 16, CPU_TPR_SHADOW = 1ul 21, @@ -361,6 +364,8 @@ enum Ctrl0 { CPU_IO = 1ul 24, CPU_IO_BITMAP = 1ul 25, CPU_MSR_BITMAP = 1ul 28, + CPU_MONITOR = 1ul 29, + CPU_PAUSE = 1ul 30, CPU_SECONDARY = 1ul 31, }; @@ -368,6 +373,8 @@ enum Ctrl1 { CPU_EPT = 1ul 1, CPU_VPID= 1ul 5, CPU_URG = 1ul 7, + CPU_WBINVD = 1ul 6, + CPU_RDRAND = 1ul 11, }; #define SAVE_GPR \ diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index ad28c4c..66187f4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -20,6 +20,13 @@ static inline void set_stage(u32 s) asm volatile(mov %0, stage\n\t::r(s):memory, cc); } +static inline u32 get_stage() +{ + u32 s; + asm volatile(mov stage, %0\n\t:=r(s)::memory, cc); + return s; +} + void basic_init() { } @@ -638,6 +645,114 @@ static int iobmp_exit_handler() return VMX_TEST_VMEXIT; } +asm( + insn_hlt: hlt;ret\n\t + insn_invlpg: invlpg 0x12345678;ret\n\t + insn_mwait: mwait;ret\n\t + insn_rdpmc: rdpmc;ret\n\t + insn_rdtsc: rdtsc;ret\n\t + insn_monitor: monitor;ret\n\t + insn_pause: pause;ret\n\t + insn_wbinvd: wbinvd;ret\n\t + insn_cpuid: cpuid;ret\n\t + insn_invd: invd;ret\n\t +); +extern void insn_hlt(); +extern void insn_invlpg(); +extern void insn_mwait(); +extern void insn_rdpmc(); +extern void insn_rdtsc(); +extern void insn_monitor(); +extern void insn_pause(); +extern void insn_wbinvd(); +extern void insn_cpuid(); +extern void insn_invd(); + +u32 cur_insn; + +struct insn_table { + const char *name; + u32 flag; + void (*insn_func)(); + u32 type; + u32 reason; + ulong exit_qual; + u32 insn_info; +}; + +static struct insn_table insn_table[] = { + // Flags for Primary Processor-Based VM-Execution Controls + {HLT, CPU_HLT, insn_hlt, 0, 12, 0, 0}, + {INVLPG, CPU_INVLPG, insn_invlpg, 0, 14, 0x12345678, 0}, + {MWAIT, CPU_MWAIT, insn_mwait, 0, 36, 0, 0}, + {RDPMC, CPU_RDPMC, insn_rdpmc, 0, 15, 0, 0}, + {RDTSC, CPU_RDTSC, insn_rdtsc, 0, 16, 0, 0}, + {MONITOR, CPU_MONITOR, insn_monitor, 0, 39, 0, 0}, + {PAUSE, CPU_PAUSE, insn_pause, 0, 40, 0, 0}, + // Flags for Secondary Processor-Based VM-Execution Controls + {WBINVD, CPU_WBINVD, insn_wbinvd, 1, 54, 0, 0}, + // Flags for Non-Processor-Based + {CPUID, 0, insn_cpuid, 2, 10, 0, 0}, + {INVD, 0, insn_invd, 2, 13, 0, 0}, + {NULL}, +}; + +static void insn_intercept_init() +{ + u32 ctrl_cpu[2]; + + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu[0] |= CPU_HLT | CPU_INVLPG | CPU_MWAIT | CPU_RDPMC | CPU_RDTSC | + CPU_MONITOR | CPU_PAUSE | CPU_SECONDARY; + ctrl_cpu[0] = ctrl_cpu_rev[0].clr; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]); + ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1); + ctrl_cpu[1] |= CPU_WBINVD | CPU_RDRAND; + ctrl_cpu[1] = ctrl_cpu_rev[1].clr; + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]); +} + +static void insn_intercept_main() +{ + cur_insn = 0; + while(insn_table[cur_insn
Corner cases of I/O bitmap
Hi Gleb and Paolo, There are some corner cases when testing I/O bitmaps, and I don't know the exact action of HW. 1. If we set bit of 0x4000 in bitmap and call inl(0x3) or inl(0x4000) in guest, what will get of exit information? 2. What will we get when calling inl(0x) in guest with/without “unconditional I/O exiting” VM-execution control and “use I/O bitmaps” VM-execution control? I test the two cases in nested env. For the first one, I got normal exit if any of the port accessed is masked in bitmap. For the second, it will acts the same as other ports. And the SDM says If an I/O operation “wraps around” the 16-bit I/O-port space (accesses ports H and H), the I/O instruction causes a VM exit. I cannot find the exact reaction of this case. Do you have any ideas about these? Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] KVM: nVMX: Advertise IA32_PAT in VM exit control
Advertise VM_EXIT_SAVE_IA32_PAT and VM_EXIT_LOAD_IA32_PAT. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 396572d..c45adea 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2191,14 +2191,17 @@ static __init void nested_vmx_setup_ctls_msrs(void) * If bit 55 of VMX_BASIC is off, bits 0-8 and 10, 11, 13, 14, 16 and * 17 must be 1. */ + rdmsr(MSR_IA32_VMX_EXIT_CTLS, + nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high); nested_vmx_exit_ctls_low = VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; + nested_vmx_exit_ctls_high = + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | + VM_EXIT_HOST_ADDR_SPACE_SIZE; /* Note that guest use of VM_EXIT_ACK_INTR_ON_EXIT is not supported. */ -#ifdef CONFIG_X86_64 - nested_vmx_exit_ctls_high = VM_EXIT_HOST_ADDR_SPACE_SIZE; -#else - nested_vmx_exit_ctls_high = 0; +#ifndef CONFIG_X86_64 + nested_vmx_exit_ctls_high = (~VM_EXIT_HOST_ADDR_SPACE_SIZE); #endif - nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; /* entry controls */ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] nVMX: Keep arch.pat in sync on L1-L2 switches
On Sun, Aug 4, 2013 at 11:17 PM, Jan Kiszka jan.kis...@web.de wrote: From: Jan Kiszka jan.kis...@siemens.com When asking vmx to load the PAT MSR for us while switching from L1 to L2 or vice versa, we have to update arch.pat as well as it may later be used again to load or read out the MSR content. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Tested-by: Arthur Chunqi Li yzt...@gmail.com Should cooperate with patch http://www.mail-archive.com/kvm@vger.kernel.org/msg94349.html, VM_EXIT_SAVE_IA32_PAT and VM_EXIT_LOAD_IA32_PAT should be advertised. --- Arthur, please add your tested-by also officially. arch/x86/kvm/vmx.c |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 45fd70c..396572d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7535,9 +7535,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs_write32(VM_ENTRY_CONTROLS, vmcs12-vm_entry_controls | (vmcs_config.vmentry_ctrl ~VM_ENTRY_IA32E_MODE)); - if (vmcs12-vm_entry_controls VM_ENTRY_LOAD_IA32_PAT) + if (vmcs12-vm_entry_controls VM_ENTRY_LOAD_IA32_PAT) { vmcs_write64(GUEST_IA32_PAT, vmcs12-guest_ia32_pat); - else if (vmcs_config.vmentry_ctrl VM_ENTRY_LOAD_IA32_PAT) + vcpu-arch.pat = vmcs12-guest_ia32_pat; + } else if (vmcs_config.vmentry_ctrl VM_ENTRY_LOAD_IA32_PAT) vmcs_write64(GUEST_IA32_PAT, vmx-vcpu.arch.pat); @@ -8025,8 +8026,10 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, vmcs_writel(GUEST_IDTR_BASE, vmcs12-host_idtr_base); vmcs_writel(GUEST_GDTR_BASE, vmcs12-host_gdtr_base); - if (vmcs12-vm_exit_controls VM_EXIT_LOAD_IA32_PAT) + if (vmcs12-vm_exit_controls VM_EXIT_LOAD_IA32_PAT) { vmcs_write64(GUEST_IA32_PAT, vmcs12-host_ia32_pat); + vcpu-arch.pat = vmcs12-host_ia32_pat; + } if (vmcs12-vm_exit_controls VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, vmcs12-host_ia32_perf_global_ctrl); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html