from:"Arthur Chunqi Li"

Handle multiple interrupts injection in one vmexit

2014-05-26 Thread Arthur Chunqi Li

Hi there,

External interrupts are injected in function vcpu_enter_guest, with
checking KVM_REQ_EVENT. If there are more than one interrupts in one
vmexit (e.g. nmi and external events occur concurrently in one
vmexit), KVM will handle only one interrupt of the highest priority
(e.g. NMI), right? So only NMI is injected in this vmexit, thus when
will the other external events injected? I don't see any extra setting
of KVM_REQ_EVENT to handle the lower priority interrupts injection in
KVM.

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Handle multiple interrupts injection in one vmexit

2014-05-26 Thread Arthur Chunqi Li

Thanks Jan.

On Mon, May 26, 2014 at 6:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2014-05-26 15:51, Arthur Chunqi Li wrote:
 Hi there,

 External interrupts are injected in function vcpu_enter_guest, with
 checking KVM_REQ_EVENT. If there are more than one interrupts in one
 vmexit (e.g. nmi and external events occur concurrently in one
 vmexit), KVM will handle only one interrupt of the highest priority
 (e.g. NMI), right? So only NMI is injected in this vmexit, thus when
 will the other external events injected? I don't see any extra setting
 of KVM_REQ_EVENT to handle the lower priority interrupts injection in
 KVM.

 [you should mention that you are talking about x86 here]

 If both events are pending, inject_pending_event will try to inject the
 NMI. vcpu_enter_guest will then notice that there are still pending
 interrupts and request the interrupt window vmexit. If the NMI should be
 blocked, an NMI window exit is requested. But on NMI injection, another
 KVM_REQ_EVENT is send (see e.g. handle_nmi_window in vmx.c).

Yes, I see the Bit 2 (Interrupt-window exiting) and Bit 22 (NMI-window
exiting) on Primary Processor-Based VM-Execution Controls is used to
handle simultaneous (or contiguous) interrupt/NMI injection. See Intel
SDM Chapter 24.6.2 if any other guys need this information.

Arthur
 Jan





-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to disable IDE DMA in KVM or in guest OS

2014-05-15 Thread Arthur Chunqi Li

On Thu, May 15, 2014 at 2:39 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2014-05-15 07:54, Arthur Chunqi Li wrote:
 Hi Jan and there,

 I want to disable IDE BMDMA in Qemu/KVM and let guest OS uses only PIO
 mode. Are there any configurations in Qemu or KVM to disable the
 hardware support of DMA?

 Not that I know. These features are built into the chipsets we emulate,
 and there seems to be no option to disable them. Maybe the isapc will
 not expose DMA capabilities - but will also lack a lot of other things
 like PCI...

Well, if I boot guest Linux with ide-core.nodma=0.0 libata.dma=0
ide=nodma ide0=nodma, why are bmdma irqs (14 and 15) also triggered? I
think guest OS should only use PIO in this situation.

Arthur

 Jan


 I have tried to disable IDE DMA in guest OS booting params as follows:
 ide-core.nodma=0.0 libata.dma=0 ide=nodma ide0=nodma

 But I can also get the followings in dmesg:
 [0.533276] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14
 [0.533641] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15

 and I do tracked irq 14 and irq 15 in ioapic_deliver when read/write disk.

 How could I totally disable IDE BMDMA from guest's boot time?

 Thanks,
 Arthur






-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

How to disable IDE DMA in KVM or in guest OS

2014-05-14 Thread Arthur Chunqi Li

Hi Jan and there,

I want to disable IDE BMDMA in Qemu/KVM and let guest OS uses only PIO
mode. Are there any configurations in Qemu or KVM to disable the
hardware support of DMA?

I have tried to disable IDE DMA in guest OS booting params as follows:
ide-core.nodma=0.0 libata.dma=0 ide=nodma ide0=nodma

But I can also get the followings in dmesg:
[0.533276] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14
[0.533641] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15

and I do tracked irq 14 and irq 15 in ioapic_deliver when read/write disk.

How could I totally disable IDE BMDMA from guest's boot time?

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

CPUs support APIC virtualization

2014-04-21 Thread Arthur Chunqi Li

Hi there,

I have noticed in Intel SDM that some kinds of CPUs support APIC
virtualization (e.g. Virtual-interrupt delivery). I checked all my
Intel CPUs' MSR and found none of them support this. So do anybody
know which types of Intel CPU supports APIC virtualization? Or where
can I get related information?

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

The action of Accessed and Dirty bit for EPT

2014-03-06 Thread Arthur Chunqi Li

Hi there,

I write a piece of code to test the action of Accessed and Dirty bit
of EPT in Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz. Firstly I build a
totally new EPT paging structure with A/D logging on, then run some
operating system codes and log all the EPT violation (say trap log).
At some point I paused the OS, parse the EPT paging structure and log
all the entries built in the past period (say A/D log).

Here I get some interesting points:

1. Some EPT entries are built without either Accessed or Dirty bit
set, does this mean that CPU only construct these entries but doesn't
touch them?
2. Some entries only exist in A/D log. Does A/D logging module has
some bias or some mistake?

These two logs (trap log and A/D log) should be the same according to
my understanding, and when I tried in the previous CPU without A/D bit
supporting, these two logs are exactly the same, though I just parse
the EPT paging structure and cannot distinguish Accessed or Dirty in
it.


Thanks ahead,

Arthur


-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Guest VMs access a strange address

2014-03-04 Thread Arthur Chunqi Li

Hi there,

I have tried to log EPT construction status at VM startup, that is to
add some codes in function __direct_map (arch/x86/kvm/mmu.c).
__direct_map constructs the EPT paging structure when a guest page
firstly touched, and I can get related gfn and pfn here.

But I tracked a strange address, which is vcpu 0  pfn
0x8000  gfn 0xfebf1. Here pfn and gfn are the value of
the params in function __direct_map. How can pfn be
0x8000? Besides, I searched 0xfebf1 in kvm-memslots and
cannot get it in any memslots, but __direct_map catches this memory
access and build the mapping. Why should this happen?

Thanks ahead,
Arthur


A1. Here's my code in __direct_map:

for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
if (iterator.level == level) {
printk(KERN_NOTICE vcpu %d\tpfn 0x%llx\tgfn 0x%llx\n,
kvm-tm_turn, vcpu-vcpu_id, pfn, gfn);
}

mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
write, emulate, level, gfn, pfn,
prefault, map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu-stat.pf_fixed;
break;
}

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to get to know vcpu status from outside

2013-12-20 Thread Arthur Chunqi Li

Hi Paolo,

On Tue, Dec 17, 2013 at 8:28 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 17/12/2013 12:43, Arthur Chunqi Li ha scritto:
 Hi Paolo,

 Thanks very much. And...(see below)

 On Tue, Dec 17, 2013 at 7:21 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 17/12/2013 07:11, Arthur Chunqi Li ha scritto:
 Hi Paolo,

 Since VCPU is managed the same as a process in kernel, how can I know
 the status (running, sleeping etc.) of a vcpu in kernel? Is there a
 variant in struct kvm_vcpu or something else indicate this?

 waitqueue_active(vcpu-wq) means that the VCPU is sleeping in the
 kernel (i.e. in a halted state).

 vcpu-mode == IN_GUEST_MODE means that the VCPU is running.

 Anything else means that the host is running some kind of glue code
 (either kernel or userspace).

 Another question about scheduler. When I have 4 vcpus and the workload
 of VM is low, and I noticed that it tends to activate only 1 or 2
 vcpus. Does this mean the other 2 vcpus are scheduled out or into
 sleeping status?

 This depends on what the guest scheduler is doing.  The other 2 VCPUs
 are probably running for so little time (a few microseconds every
 1/100th of a second) that you do not see them, and they stay halted the
 rest of the time.

 Remember that KVM has no scheduler of its own.  What you see is the
 combined result of the guest and host schedulers.

 Besides, if vcpu1 is running on pcpu1, and a kernel thread running on
 pcpu0. Can the kernel thread send a message to force vcpu1 trap to
 VMM? How can I do this?

 Yes, with kvm_vcpu_kick.  KVM tracks internally which pcpu will run the
 vcpu in vcpu-cpu, and kvm_vcpu_kick sends either a wakeup (if the vcpu
 is sleeping) or an IPI (if it is running).

 What is vcpu's action if kvm_vcpu_kick(vcpu)? What is the exit_reason
 of the kicked vcpu?

 No exit reason, you just get a lightweight exit to the host kernel.  If
 you want a userspace exit, you'd need to set a bit in vcpu-requests
 before kvm_vcpu_kick (which you can do best with kvm_make_request), and
 change that to a userspace exit in vcpu_enter_guest.  There's already an
 example of that, search arch/x86/kvm/x86.c for KVM_REQ_TRIPLE_FAULT.

I failed to kvm_vcpu_kick inactive vcpus at the beginning of the boot
time (from power up to grub) of a VM. I think this may because other
vcpus are not activated by SMP system at boot time, right? How can I
distinguish vcpus in such status?

Thanks,
Arthur


 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to trace every memory access

2013-12-20 Thread Arthur Chunqi Li

Hi Paolo,

When using EPT in KVM, does every vcpu has an EPT paging structure or
all vcpus share one?

Thanks,
Arthur

On Wed, Nov 20, 2013 at 6:41 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/11/2013 08:55, Arthur Chunqi Li ha scritto:
 Hi Paolo,

 Currently I can trap every first write/read to a memory page from
 guest VM (add codes in tdp_page_fault). If I want to trace every
 memory access to a page, how can I achieve such goal in KVM?

 You don't. :)

 If you are looking for something like this, a dynamic recompilation
 engine (such as QEMU's TCG) probably ends up being faster.

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to trace every memory access

2013-12-20 Thread Arthur Chunqi Li

Hi Paolo,

I want to rebuild the EPT paging structure, so I use kvm_mmu_unload()
followed with kvm_mmu_reload(). But it seems fail because I cannot
trap EPT_VIOLATION I want after the rebuild.

How could I totally rebuild the EPT paging structure?

Arthur

On Fri, Dec 20, 2013 at 7:58 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/12/2013 10:15, Arthur Chunqi Li ha scritto:
 Hi Paolo,

 When using EPT in KVM, does every vcpu has an EPT paging structure or
 all vcpus share one?

 All MMU structures are in vcpu-arch.mmu and vcpu-arch.nested_mmu, so
 they're per-VCPU.

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to trace every memory access

2013-12-20 Thread Arthur Chunqi Li

On Fri, Dec 20, 2013 at 7:58 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/12/2013 10:15, Arthur Chunqi Li ha scritto:
 Hi Paolo,

 When using EPT in KVM, does every vcpu has an EPT paging structure or
 all vcpus share one?

 All MMU structures are in vcpu-arch.mmu and vcpu-arch.nested_mmu, so
 they're per-VCPU.
If an EPT entry is built by a VCPU, will this entry be propagated to
other VCPU's EPT paging structures?

Arthur

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

About preemption timer

2013-12-17 Thread Arthur Chunqi Li

Hi Jan and Paolo,

I've tried to use preemption timer in KVM to trap vcpu regularly, but
there's something unexpected. I run a VM with 4 vcpus and give them
the same preemption timer value (e.g. 100) with all bits set
(activate/save bits), then reset the value in preemption time-out
handler.

Thus I expected these vcpus trap regularly in some special turns. But
I found that when the VM is not busy, some vcpus are trapped much less
frequently than others. In Intel SDM, I noticed that preemption timer
is only related to TSC, and I think all the vcpus should trap in a
similar frequency.

Could u help me explain this phenomenon?

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: About preemption timer

2013-12-17 Thread Arthur Chunqi Li

Hi Jan,

On Tue, Dec 17, 2013 at 7:21 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2013-12-17 10:32, Arthur Chunqi Li wrote:
 Hi Jan and Paolo,

 I've tried to use preemption timer in KVM to trap vcpu regularly, but
 there's something unexpected. I run a VM with 4 vcpus and give them
 the same preemption timer value (e.g. 100) with all bits set
 (activate/save bits), then reset the value in preemption time-out
 handler.

 Thus I expected these vcpus trap regularly in some special turns. But
 I found that when the VM is not busy, some vcpus are trapped much less
 frequently than others. In Intel SDM, I noticed that preemption timer
 is only related to TSC, and I think all the vcpus should trap in a
 similar frequency.

 Could u help me explain this phenomenon?

 Are you on a CPU that has non-broken preemption timer support? Anything
 prior Haswell is known to tick with arbitrary frequencies.

My CPU is Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz.

Besides, what do you mean by arbitrary frequencies?

Arthur

 BTW, we will have to re-implement preemption timer support with the help
 of a regular host timer due to the breakage when halting L2 (see my test
 case).

 Jan

 --
 Siemens AG, Corporate Technology, CT RTC ITP SES-DE
 Corporate Competence Center Embedded Linux



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to get to know vcpu status from outside

2013-12-17 Thread Arthur Chunqi Li

Hi Paolo,

Thanks very much. And...(see below)

On Tue, Dec 17, 2013 at 7:21 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 17/12/2013 07:11, Arthur Chunqi Li ha scritto:
 Hi Paolo,

 Since VCPU is managed the same as a process in kernel, how can I know
 the status (running, sleeping etc.) of a vcpu in kernel? Is there a
 variant in struct kvm_vcpu or something else indicate this?

 waitqueue_active(vcpu-wq) means that the VCPU is sleeping in the
 kernel (i.e. in a halted state).

 vcpu-mode == IN_GUEST_MODE means that the VCPU is running.

 Anything else means that the host is running some kind of glue code
 (either kernel or userspace).

Another question about scheduler. When I have 4 vcpus and the workload
of VM is low, and I noticed that it tends to activate only 1 or 2
vcpus. Does this mean the other 2 vcpus are scheduled out or into
sleeping status?


 Besides, if vcpu1 is running on pcpu1, and a kernel thread running on
 pcpu0. Can the kernel thread send a message to force vcpu1 trap to
 VMM? How can I do this?

 Yes, with kvm_vcpu_kick.  KVM tracks internally which pcpu will run the
 vcpu in vcpu-cpu, and kvm_vcpu_kick sends either a wakeup (if the vcpu
 is sleeping) or an IPI (if it is running).

What is vcpu's action if kvm_vcpu_kick(vcpu)? What is the exit_reason
of the kicked vcpu?


 Paolo


Besides, can I pin a vcpu to a pcpu? That is to say, I assigned a pcpu
only for a vcpu and pcpu can only run this vcpu?


Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: About preemption timer

2013-12-17 Thread Arthur Chunqi Li

On Tue, Dec 17, 2013 at 8:43 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2013-12-17 12:31, Arthur Chunqi Li wrote:
 Hi Jan,

 On Tue, Dec 17, 2013 at 7:21 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2013-12-17 10:32, Arthur Chunqi Li wrote:
 Hi Jan and Paolo,

 I've tried to use preemption timer in KVM to trap vcpu regularly, but
 there's something unexpected. I run a VM with 4 vcpus and give them
 the same preemption timer value (e.g. 100) with all bits set
 (activate/save bits), then reset the value in preemption time-out
 handler.

 Thus I expected these vcpus trap regularly in some special turns. But
 I found that when the VM is not busy, some vcpus are trapped much less
 frequently than others. In Intel SDM, I noticed that preemption timer
 is only related to TSC, and I think all the vcpus should trap in a
 similar frequency.

 Could u help me explain this phenomenon?

 Are you on a CPU that has non-broken preemption timer support? Anything
 prior Haswell is known to tick with arbitrary frequencies.

 My CPU is Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz.

 Hmm, this one seems unaffected. Didn't find a specification update.
 Just like Paolo asked: Your original test case passes?


 Besides, what do you mean by arbitrary frequencies?

 On older CPUs, the tick rate of the preemption timer does not correlate
 with the TSC, definitely not in the way the spec defined.

 Back to your original question: Are we talking about native use of the
 preemption timer via a patched KVM or nested use inside a KVM virtual
 machine?

It is about the native use. I think it may due to the scheduling. When
vcpu is scheduled out of pcpu, will the preemption timer work still?

Oh, another problem, I use the released kernel 3.11, not the latest
one. Does this matter?

Arthur


 Jan

 --
 Siemens AG, Corporate Technology, CT RTC ITP SES-DE
 Corporate Competence Center Embedded Linux



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

How to get to know vcpu status from outside

2013-12-16 Thread Arthur Chunqi Li

Hi Paolo,

Since VCPU is managed the same as a process in kernel, how can I know
the status (running, sleeping etc.) of a vcpu in kernel? Is there a
variant in struct kvm_vcpu or something else indicate this?

Besides, if vcpu1 is running on pcpu1, and a kernel thread running on
pcpu0. Can the kernel thread send a message to force vcpu1 trap to
VMM? How can I do this?

Thanks very much,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

PMU in KVM

2013-11-26 Thread Arthur Chunqi Li

Hi Gleb,

I noticed that arch/x86/kvm/pmu.c is on your management and I have
some questions about PMU in KVM. Thanks ahead if you can spare time
answering these questions.

1. How could PMU cooperate with Intel VT? For example, I only find
flags in IA32_PERFEVTSELx MSRs to count in OS and USER mode (Ring 0
and other rings). What is the consequence when I open VMXON with PMU
enabled? Can I distinguish the counts in root and non-root mode? I
cannot find the related descriptions in Intel manual.

2. What is the current status of vPMU in KVM? Is it auto-enabled? And
how can I use (or enable/disable) it?

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

How to trace every memory access

2013-11-19 Thread Arthur Chunqi Li

Hi Paolo,

Currently I can trap every first write/read to a memory page from
guest VM (add codes in tdp_page_fault). If I want to trace every
memory access to a page, how can I achieve such goal in KVM?

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: EPT page fault procedure

2013-11-03 Thread Arthur Chunqi Li

Hi Paolo,

On Thu, Oct 31, 2013 at 6:54 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 31/10/2013 10:07, Arthur Chunqi Li ha scritto:
 Sorry to disturb you with so many trivial questions in KVM EPT memory
 management and thanks for your patience.

 No problem, please remain onlist though.  Adding back kvm@vger.kernel.org.


 I got confused in the EPT
 page fault processing function (tdp_page_fault). I think when Qemu
 registers the memory region for a VM, physical memory mapped to this
 PVA region isn't allocated indeed. So the page fault procedure of EPT
 violation which maps GFN to PFN should allocate the real physical
 memory and establish the real mapping from PVA to PFA in Qemu's page

 Do you mean HVA to PFN?  If so, you can look at function hva_to_pfn. :)

I mean in this procedure, how is physical memory actually allocated?
When qemu firstly initialized the mapping of its userspace memory
region to VM, the physical memory corresponding to this region are not
actually allocated. So I think KVM should do this allocation
somewhere.

 table. What is the point in tdp_page_fault() handling such mapping
 from PVA to PFA?

 The EPT page table entry is created in __direct_map using the pfn
 returned by try_async_pf.  try_async_pf itself gets the pfn from
 gfn_to_pfn_async and gfn_to_pfn_prot.  Both of them call __gfn_to_pfn
 with different arguments.  __gfn_to_pfn first goes from GFN to HVA using
 the memslots (gfn_to_memslot and, in __gfn_to_pfn_memslot,
 __gfn_to_hva_many), then it calls hva_to_pfn.

 Ultimately, hva_to_pfn_fast and hva_to_pfn_slow is where KVM calls
 functions from the kernel's get_user_page family.

What will KVM do if get_user_page() returns a page not really exists
in physical memory?

Thanks,
Arthur

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Calling to kvm_mmu_load

2013-10-31 Thread Arthur Chunqi Li

Hi Paolo,

On Tue, Oct 29, 2013 at 8:55 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 29/10/2013 06:39, Arthur Chunqi Li ha scritto:
 What is the dirty page tracking code path? I find a obsoleted flag
 dirty_page_log_all in the very previous codes, but I cannot get the
 most recent version of tracking dirty pages.

 Basically everything that accesses the dirty_bitmap field of struct
 kvm_memory_slot is involved.  It all starts when the
 KVM_SET_USER_MEMORY_REGION ioctl is called with the
 KVM_MEM_LOG_DIRTY_PAGES flag set.

I find the mechanism here is set all pages read-only to track all the
dirty pages. But EPT provides such a dirty bit in EPT paging
structures. Why don't we use this?

Arthur

 Besides, I noticed that memory management in KVM uses the mechanism
 with struct kvm_memory_slot. How is kvm_memory_slot used with the
 cooperation of Linux memory management?

 kvm_memory_slot just maps a host userspace address range to a guest
 physical address range.  Cooperation with Linux memory management is
 done with the Linux MMU notifiers.  MMU notifiers let KVM know that a
 page has been swapped out, and KVM reacts by invalidating the shadow
 page tables for the corresponding guest physical address.

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Calling to kvm_mmu_load

2013-10-30 Thread Arthur Chunqi Li

On Tue, Oct 29, 2013 at 8:55 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 29/10/2013 06:39, Arthur Chunqi Li ha scritto:
 What is the dirty page tracking code path? I find a obsoleted flag
 dirty_page_log_all in the very previous codes, but I cannot get the
 most recent version of tracking dirty pages.

 Basically everything that accesses the dirty_bitmap field of struct
 kvm_memory_slot is involved.  It all starts when the
 KVM_SET_USER_MEMORY_REGION ioctl is called with the
 KVM_MEM_LOG_DIRTY_PAGES flag set.

 Besides, I noticed that memory management in KVM uses the mechanism
 with struct kvm_memory_slot. How is kvm_memory_slot used with the
 cooperation of Linux memory management?

 kvm_memory_slot just maps a host userspace address range to a guest
 physical address range.  Cooperation with Linux memory management is
 done with the Linux MMU notifiers.  MMU notifiers let KVM know that a
 page has been swapped out, and KVM reacts by invalidating the shadow
 page tables for the corresponding guest physical address.
So for each VM, qemu need to register its memory region and KVM stores
this region of GPA to HVA mapping in kvm_memory_slot, and at the first
page fault KVM uses EPT to map GPA to HPA. Am I right?

In this way, how is ballooning mechanism implemented in KVM memory
management module?

Thanks,
Arthur

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Calling to kvm_mmu_load

2013-10-28 Thread Arthur Chunqi Li

Hi Paolo,

On Fri, Oct 25, 2013 at 8:43 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/10/2013 08:55, Arthur Chunqi Li ha scritto:
 Hi Paolo,

 Thanks for your reply.

 On Wed, Oct 23, 2013 at 2:21 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 21/10/2013 08:56, Arthur Chunqi Li ha scritto:
 Hi there,

 I noticed that kvm_mmu_reload() is called every time in vcpu enter,
 and kvm_mmu_load() is called in this function when root_hpa is
 INVALID_PAGE. I get confused why and when root_hpa can be set to
 INVALID_PAGE? I find one condition that if vcpu get request
 KVM_REQ_MMU_RELOAD, kvm_mmu_unload() is called to invalid root_hpa,
 but this condition cannot cover all occasions.

 Look also at mmu_free_roots, kvm_mmu_unload and kvm_mmu_reset_context.
 In normal cases and without EPT, it should be called when CR3 changes
 or when the paging mode changes (32-bit, PAE, 64-bit, no paging).  With
 EPT, this kind of change won't reset the MMU (CR3 changes won't cause a
 vmexit at all, in fact).

 When EPT is enabled, why will root_hpa be set to INVALID_PAGE when a
 VM boots?

 Because EPT page tables are only built lazily.  The EPT page tables
 start all-invalid, and are built as the guest accesses pages at new
 guest physical addresses (instead, shadow page tables are built as the
 guest accesses pages at new guest virtual addresses).

 I find that Qemu reset root_hpa with KVM_REQ_MMU_RELOAD
 request several time when booting a VM, why?

 This happens when the memory map changes.  A previously-valid guest
 physical address might become invalid now, and the EPT page tables have
 to be emptied.

 And will VM use EPT from the very beginning when booting?

 Yes.  But it's not the VM.  It's KVM that uses EPT.

 The VM only uses EPT if you're using nested virtualization, and EPT is
 enabled.  L1's KVM uses EPT, L2 doesn't (because it doesn't run KVM).

 With nested virtualization, roots are invalidated whenever kvm-arch.mmu
 changes meaning from L1-L0 or L2-L0 or vice versa (in the special case
 where EPT is disabled on L0, this is trivially because vmentry loads CR3
 from the vmcs02).

 Besides, in function tdp_page_fault(), I find two different execution
 flow which may not reach __direct_map() (which I think is the normal
 path to handle PF), they are fast_page_fault() and try_async_pf().
 When will these two paths called when handling EPT page fault?

 fast_page_fault() is called if you're using dirty page tracking.  It
 checks if we have a read-only page that is in a writeable memory slot
 (SPTE_HOST_WRITEABLE) and whose PTE allows writes (SPTE_MMU_WRITEABLE).
  If these conditions are satisfied, the page was read-only because of
 dirty page tracking; it is made read-write with a single cmpxchg and
 sets the bit for the page in the dirty bitmap.

What is the dirty page tracking code path? I find a obsoleted flag
dirty_page_log_all in the very previous codes, but I cannot get the
most recent version of tracking dirty pages.

Besides, I noticed that memory management in KVM uses the mechanism
with struct kvm_memory_slot. How is kvm_memory_slot used with the
cooperation of Linux memory management?

Thanks,
Arthur


 try_async_pf will inject a dummy pagefault instead of creating the EPT
 page table, and create the page table in the background.  The guest will
 do something else (run another task) until the EPT page table has been
 created; then a second dummy pagefault is injected.
 kvm_arch_async_page_not_present signals the first page fault,
 kvm_arch_async_page_present signals the second.  For this to happen, the
 guest must have enabled the asynchronous page fault feature with a write
 to a KVM-specific MSR.

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Calling to kvm_mmu_load

2013-10-24 Thread Arthur Chunqi Li

Hi Paolo,

Thanks for your reply.

On Wed, Oct 23, 2013 at 2:21 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 21/10/2013 08:56, Arthur Chunqi Li ha scritto:
 Hi there,

 I noticed that kvm_mmu_reload() is called every time in vcpu enter,
 and kvm_mmu_load() is called in this function when root_hpa is
 INVALID_PAGE. I get confused why and when root_hpa can be set to
 INVALID_PAGE? I find one condition that if vcpu get request
 KVM_REQ_MMU_RELOAD, kvm_mmu_unload() is called to invalid root_hpa,
 but this condition cannot cover all occasions.

 Look also at mmu_free_roots, kvm_mmu_unload and kvm_mmu_reset_context.
 In normal cases and without EPT, it should be called when CR3 changes
 or when the paging mode changes (32-bit, PAE, 64-bit, no paging).  With
 EPT, this kind of change won't reset the MMU (CR3 changes won't cause a
 vmexit at all, in fact).

When EPT is enabled, why will root_hpa be set to INVALID_PAGE when a
VM boots? I find that Qemu reset root_hpa with KVM_REQ_MMU_RELOAD
request several time when booting a VM, why?

And will VM use EPT from the very beginning when booting?


 With nested virtualization, roots are invalidated whenever kvm-arch.mmu
 changes meaning from L1-L0 or L2-L0 or vice versa (in the special case
 where EPT is disabled on L0, this is trivially because vmentry loads CR3
 from the vmcs02).

Besides, in function tdp_page_fault(), I find two different execution
flow which may not reach __direct_map() (which I think is the normal
path to handle PF), they are fast_page_fault() and try_async_pf().
When will these two paths called when handling EPT page fault?

Thanks,
Arthur

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Calling to kvm_mmu_load

2013-10-21 Thread Arthur Chunqi Li

Hi there,

I noticed that kvm_mmu_reload() is called every time in vcpu enter,
and kvm_mmu_load() is called in this function when root_hpa is
INVALID_PAGE. I get confused why and when root_hpa can be set to
INVALID_PAGE? I find one condition that if vcpu get request
KVM_REQ_MMU_RELOAD, kvm_mmu_unload() is called to invalid root_hpa,
but this condition cannot cover all occasions.

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-11 Thread Arthur Chunqi Li

Hi Jan,

On Fri, Oct 11, 2013 at 12:12 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2013-10-02 20:47, Jan Kiszka wrote:
 On 2013-09-30 11:08, Jan Kiszka wrote:
 On 2013-09-26 17:04, Paolo Bonzini wrote:
 Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
 ChangeLog to v4:
Format changes and remove a flag in nested_vmx.
  arch/x86/include/uapi/asm/msr-index.h |1 +
  arch/x86/kvm/vmx.c|   44 
 +++--
  2 files changed, 43 insertions(+), 2 deletions(-)

 Hi all,

 the test fails for me if the preemption timer value is set to a value
 that is above ~2000 (which means ~65000 TSC cycles on this machine).
 The preemption timer seems to count faster than what is expected, for
 example only up to 4 million cycles if you set it to one million.
 So, I am leaving the patch out of kvm/queue for now, until I can
 test it on more processors.

 I've done some measurements with the help of ftrace on the time it takes
 to let the preemption timer trigger (no adjustments via Arthur's patch
 were involved): On my Core i7-620M, the preemption timer seems to tick
 almost 10 times faster than spec and scale value (5) suggests. I've
 loaded a value of 10, and it took about 130 盜 until I got a vmexit
 with reason PREEMPTION_TIMER (no other exists in between).

  qemu-system-x86-13765 [003] 298562.966079: bprint:   
 prepare_vmcs02: preempt val 10
  qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0
  qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason 
 PREEMPTION_TIMER rip 0x401fea info 0 0

 That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz *
 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty
 clear on this. It just doesn't match reality. Very strange.

 ...but documented: I found an related errata for my processor (AAT59)
 and also for Xeon 5500 (AAK139). At least current Haswell generation is
 no affected. I can test the patch on a Haswell board I have at work
 later this week.

 To complete this story: Arthur's patch works fine on a non-broken CPU
 (here: i7-4770S).

 Arthur, find some fix-ups for your test case below. It avoids printing
 from within L2 as this could deadlock when the timer fires and L1 then
 tries to print something. Also, it disables the preemption timer on
 leave so that it cannot fire later on again. If you want to fold this
 into your patch, feel free. Otherwise I can post a separate patch on
 top.
I think this can be treated as a separate patch to our test suite. You
can post it on top.
I have tested it and it works fine.

Arthur

 Jan

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index 4372878..66a4201 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -141,6 +141,9 @@ void preemption_timer_init()
 preempt_val = 1000;
 vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
 preempt_scale = rdmsr(MSR_IA32_VMX_MISC)  0x1F;
 +
 +   if (!(ctrl_exit_rev.clr  EXI_SAVE_PREEMPT))
 +   printf(\tSave preemption value is not supported\n);
  }

  void preemption_timer_main()
 @@ -150,9 +153,7 @@ void preemption_timer_main()
 printf(\tPreemption timer is not supported\n);
 return;
 }
 -   if (!(ctrl_exit_rev.clr  EXI_SAVE_PREEMPT))
 -   printf(\tSave preemption value is not supported\n);
 -   else {
 +   if (ctrl_exit_rev.clr  EXI_SAVE_PREEMPT) {
 set_stage(0);
 vmcall();
 if (get_stage() == 1)
 @@ -161,8 +162,8 @@ void preemption_timer_main()
 while (1) {
 if (((rdtsc() - tsc_val)  preempt_scale)
  10 * preempt_val) {
 -   report(Preemption timer, 0);
 -   break;
 +   set_stage(2);
 +   vmcall();
 }
 }
  }
 @@ -183,7 +184,7 @@ int preemption_timer_exit_handler()
 report(Preemption timer, 0);
 else
 report(Preemption timer, 1);
 -   return VMX_TEST_VMEXIT;
 +   break;
 case VMX_VMCALL:
 switch (get_stage()) {
 case 0:
 @@ -195,24 +196,29 @@ int preemption_timer_exit_handler()
 EXI_SAVE_PREEMPT)  ctrl_exit_rev.clr;
 vmcs_write(EXI_CONTROLS, ctrl_exit);
 }
 -   break

[PATCH v2] kvm-unit-tests: VMX: Comments on the framework and writing test cases

2013-09-22 Thread Arthur Chunqi Li

Add some comments on the framework of nested VMX testing, and guides of
how to write new test cases.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.c   |   30 ++
 x86/vmx_tests.c |   13 +
 2 files changed, 43 insertions(+)

diff --git a/x86/vmx.c b/x86/vmx.c
index 9db4ef4..d5ae609 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -1,3 +1,33 @@
+/*
+ * x86/vmx.c : Framework for testing nested virtualization
+ * This is a framework to test nested VMX for KVM, which
+ * started as a project of GSoC 2013. All test cases should
+ * be located in x86/vmx_tests.c and framework related
+ * functions should be in this file.
+ *
+ * How to write test cases?
+ * Add callbacks of test suite in variant vmx_tests. You can
+ * write:
+ * 1. init function used for initializing test suite
+ * 2. main function for codes running in L2 guest, 
+ * 3. exit_handler to handle vmexit of L2 to L1
+ * 4. syscall handler to handle L2 syscall vmexit
+ * 5. vmenter fail handler to handle direct failure of vmenter
+ * 6. guest_regs is loaded when vmenter and saved when
+ * vmexit, you can read and set it in exit_handler
+ * If no special function is needed for a test suite, use
+ * coressponding basic_* functions as callback. More handlers
+ * can be added to vmx_tests, see details of struct vmx_test
+ * and function test_run().
+ *
+ * Currently, vmx test framework only set up one VCPU and one
+ * concurrent guest test environment with same paging for L2 and
+ * L1. For usage of EPT, only 1:1 mapped paging is used from VFN
+ * to PFN.
+ *
+ * Author : Arthur Chunqi Li yzt...@gmail.com
+ */
+
 #include libcflat.h
 #include processor.h
 #include vm.h
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 0759e10..5fc16a3 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,3 +1,8 @@
+/*
+ * All test cases of nested virtualization should be in this file
+ *
+ * Author : Arthur Chunqi Li yzt...@gmail.com
+ */
 #include vmx.h
 #include msr.h
 #include processor.h
@@ -782,6 +787,14 @@ struct insn_table {
u32 test_field;
 };
 
+/*
+ * Add more test cases of instruction intercept here. Elements in this
+ * table is:
+ * name/control flag/insn function/type/exit reason/exit qulification/
+ * instruction info/field to test
+ * The last field defines which fields (exit_qual and insn_info) need to be
+ * tested in exit handler. If set to 0, only reason is checked.
+ */
 static struct insn_table insn_table[] = {
// Flags for Primary Processor-Based VM-Execution Controls
{HLT,  CPU_HLT, insn_hlt, INSN_CPU0, 12, 0, 0, 0},
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-16 Thread Arthur Chunqi Li

This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2-L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of Save VMX-preemption timer value VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
ChangeLog to v4:
Format changes and remove a flag in nested_vmx.
 arch/x86/include/uapi/asm/msr-index.h |1 +
 arch/x86/kvm/vmx.c|   44 +++--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index bb04650..b93e09a 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -536,6 +536,7 @@
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
+#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
 /* AMD-V MSRs */
 
 #define MSR_VM_CR   0xc0010114
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f1da43..e1fa13a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER) ||
+   !(nested_vmx_exit_ctls_high  
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
+   nested_vmx_exit_ctls_high = ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   nested_vmx_pinbased_ctls_high = 
~PIN_BASED_VMX_PREEMPTION_TIMER;
+   }
nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  VM_EXIT_LOAD_IA32_EFER);
 
@@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
+static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
+{
+   u64 delta_tsc_l1;
+   u32 preempt_val_l1, preempt_val_l2, preempt_scale;
+
+   if (!(get_vmcs12(vcpu)-pin_based_vm_exec_control 
+   PIN_BASED_VMX_PREEMPTION_TIMER))
+   return;
+   preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
+   MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
+   preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
+   - vcpu-arch.last_guest_tsc;
+   preempt_val_l1 = delta_tsc_l1  preempt_scale;
+   if (preempt_val_l2 = preempt_val_l1)
+   preempt_val_l2 = 0;
+   else
+   preempt_val_l2 -= preempt_val_l1;
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
+}
+
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
atomic_switch_perf_msrs(vmx);
debugctlmsr = get_debugctlmsr();
 
+   if (is_guest_mode(vcpu)  !(vmx-nested.nested_run_pending))
+   nested_adjust_preemption_timer(vcpu);
vmx-__launched = vmx-loaded_vmcs-launched;
asm(
/* Store host registers */
@@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   u32 exit_control;
 
vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
@@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
 * bits are further modified by vmx_set_efer() below.
 */
-   vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
+   exit_control = vmcs_config.vmexit_ctrl;
+   if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
+   exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   vmcs_write32(VM_EXIT_CONTROLS, exit_control);
 
/* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
 * emulated by vmx_set_efer(), below.
@@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmcs12-guest_pending_dbg_exceptions =
vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
+   if ((vmcs12-pin_based_vm_exec_control 
+   PIN_BASED_VMX_PREEMPTION_TIMER) 
+   (vmcs12-vm_exit_controls 
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER

[PATCH] kvm-unit-tests: VMX: Comments on the framework and writing test cases

2013-09-16 Thread Arthur Chunqi Li

Add some comments on the framework of nested VMX testing, and guides of
how to write new test cases.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.c   |   25 +
 x86/vmx_tests.c |   13 +
 2 files changed, 38 insertions(+)

diff --git a/x86/vmx.c b/x86/vmx.c
index 9db4ef4..3aa8600 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -1,3 +1,28 @@
+/*
+ * x86/vmx.c : Framework for testing nested virtualization
+ * This is a framework to test nested VMX for KVM, which is
+ * a project of GSoC 2013. All test cases are located in
+ * vmx_tests, which is defined in x86/vmx_tests.c. All test
+ * cases should be located in x86/vmx_tests.c and framework
+ * related functions should be in this file.
+ *
+ * How to write test suite?
+ * Add functions of test suite in variant vmx_tests. You can
+ * write:
+ * init function used for initializing test suite
+ * main function for codes running in L2 guest, 
+ * exit_handler to handle vmexit of L2 to L1 (framework)
+ * syscall handler to handle L2 syscall vmexit
+ * vmenter fail handler to handle direct failure of vmenter
+ * init registers used to store register value in initialization
+ * If no special function is needed for a test suite, you can use
+ * basic_* series of functions. More handlers can be added to
+ * vmx_tests, see details of struct vmx_test and function
+ * test_run().
+ *
+ * Author : Arthur Chunqi Li yzt...@gmail.com
+ */
+
 #include libcflat.h
 #include processor.h
 #include vm.h
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 0759e10..5fc16a3 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,3 +1,8 @@
+/*
+ * All test cases of nested virtualization should be in this file
+ *
+ * Author : Arthur Chunqi Li yzt...@gmail.com
+ */
 #include vmx.h
 #include msr.h
 #include processor.h
@@ -782,6 +787,14 @@ struct insn_table {
u32 test_field;
 };
 
+/*
+ * Add more test cases of instruction intercept here. Elements in this
+ * table is:
+ * name/control flag/insn function/type/exit reason/exit qulification/
+ * instruction info/field to test
+ * The last field defines which fields (exit_qual and insn_info) need to be
+ * tested in exit handler. If set to 0, only reason is checked.
+ */
 static struct insn_table insn_table[] = {
// Flags for Primary Processor-Based VM-Execution Controls
{HLT,  CPU_HLT, insn_hlt, INSN_CPU0, 12, 0, 0, 0},
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-15 Thread Arthur Chunqi Li

On Sat, Sep 14, 2013 at 3:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-09-13 19:15, Paolo Bonzini wrote:
 Il 06/09/2013 04:04, Arthur Chunqi Li ha scritto:
 +preempt_val_l1 = delta_tsc_l1  preempt_scale;
 +if (preempt_val_l2 = preempt_val_l1)
 +preempt_val_l2 = 0;
 +else
 +preempt_val_l2 -= preempt_val_l1;
 +vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);

 Did you test that a value of 0 triggers an immediate exit, rather than
 counting down by 2^32?  Perhaps it's safer to limit the value to 1
 instead of 0.

 To my experience, 0 triggers immediate exists when the preemption timer
 is enabled.
Yes, L2 VM will exit immediately when the value is 0 with my patch.

Arthur

 Jan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-15 Thread Arthur Chunqi Li

On Sat, Sep 14, 2013 at 1:15 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 06/09/2013 04:04, Arthur Chunqi Li ha scritto:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
 ChangeLog to v3:
 Move nested_adjust_preemption_timer to the latest place just before vmenter.
 Some minor changes.

  arch/x86/include/uapi/asm/msr-index.h |1 +
  arch/x86/kvm/vmx.c|   49 
 +++--
  2 files changed, 48 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/include/uapi/asm/msr-index.h 
 b/arch/x86/include/uapi/asm/msr-index.h
 index bb04650..b93e09a 100644
 --- a/arch/x86/include/uapi/asm/msr-index.h
 +++ b/arch/x86/include/uapi/asm/msr-index.h
 @@ -536,6 +536,7 @@

  /* MSR_IA32_VMX_MISC bits */
  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
 +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
  /* AMD-V MSRs */

  #define MSR_VM_CR   0xc0010114
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 1f1da43..f364d16 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -374,6 +374,8 @@ struct nested_vmx {
*/
   struct page *apic_access_page;
   u64 msr_ia32_feature_control;
 + /* Set if vmexit is L2-L1 */
 + bool nested_vmx_exit;
  };

  #define POSTED_INTR_ON  0
 @@ -2204,7 +2206,17 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + if (!(nested_vmx_pinbased_ctls_high 
 + PIN_BASED_VMX_PREEMPTION_TIMER) ||
 + !(nested_vmx_exit_ctls_high 
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {

 Align this under the other !.  Also, I prefer to have one long line
 for the whole !(...  ...) || (and likewise below), but I don't know
 if Gleb agrees

 + nested_vmx_exit_ctls_high =
 + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);

 Please remove parentheses around ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, and
 likewise elsewhere in the patch.

 + nested_vmx_pinbased_ctls_high =
 + (~PIN_BASED_VMX_PREEMPTION_TIMER);
 + }
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 @@ -6707,6 +6719,24 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
 u64 *info1, u64 *info2)
   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
  }

 +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
 +{
 + u64 delta_tsc_l1;
 + u32 preempt_val_l1, preempt_val_l2, preempt_scale;

 Should this exit immediately if the preemption timer pin-based control
 is disabled?
Hi Paolo,
How can I get pin-based control here from struct kvm_vcpu *vcpu?

Arthur

 + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
 + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
 + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu,
 + native_read_tsc()) - vcpu-arch.last_guest_tsc;

 Please format this like:

 delta_tsc_l1 =
 kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc())
 - vcpu-arch.last_guest_tsc;

 + preempt_val_l1 = delta_tsc_l1  preempt_scale;
 + if (preempt_val_l2 = preempt_val_l1)
 + preempt_val_l2 = 0;
 + else
 + preempt_val_l2 -= preempt_val_l1;
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);

 Did you test that a value of 0 triggers an immediate exit, rather than
 counting down by 2^32?  Perhaps it's safer to limit the value to 1
 instead of 0.

 +}
 +
  /*
   * The guest has exited.  See if we can fix it or if we need userspace
   * assistance.
 @@ -6736,9 +6766,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
   vmx-nested.nested_run_pending = 0;

   if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
 + vmx-nested.nested_vmx_exit = true;

 I think this assignment should be in nested_vmx_vmexit, since it is
 called from other places as well.

   nested_vmx_vmexit(vcpu);
   return 1;
   }
 + vmx-nested.nested_vmx_exit = false;

   if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
   vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY;
 @@ -7132,6 +7164,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
 *vcpu

Re: [PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-15 Thread Arthur Chunqi Li

On Sun, Sep 15, 2013 at 8:31 PM, Gleb Natapov g...@redhat.com wrote:
 On Fri, Sep 06, 2013 at 10:04:51AM +0800, Arthur Chunqi Li wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
 ChangeLog to v3:
 Move nested_adjust_preemption_timer to the latest place just before vmenter.
 Some minor changes.

  arch/x86/include/uapi/asm/msr-index.h |1 +
  arch/x86/kvm/vmx.c|   49 
 +++--
  2 files changed, 48 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/include/uapi/asm/msr-index.h 
 b/arch/x86/include/uapi/asm/msr-index.h
 index bb04650..b93e09a 100644
 --- a/arch/x86/include/uapi/asm/msr-index.h
 +++ b/arch/x86/include/uapi/asm/msr-index.h
 @@ -536,6 +536,7 @@

  /* MSR_IA32_VMX_MISC bits */
  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
 +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
  /* AMD-V MSRs */

  #define MSR_VM_CR   0xc0010114
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 1f1da43..f364d16 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -374,6 +374,8 @@ struct nested_vmx {
*/
   struct page *apic_access_page;
   u64 msr_ia32_feature_control;
 + /* Set if vmexit is L2-L1 */
 + bool nested_vmx_exit;
 Do not see why it is needed, see bellow.

  };

  #define POSTED_INTR_ON  0
 @@ -2204,7 +2206,17 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + if (!(nested_vmx_pinbased_ctls_high 
 + PIN_BASED_VMX_PREEMPTION_TIMER) ||
 + !(nested_vmx_exit_ctls_high 
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_exit_ctls_high =
 + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + nested_vmx_pinbased_ctls_high =
 + (~PIN_BASED_VMX_PREEMPTION_TIMER);
 + }
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 @@ -6707,6 +6719,24 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
 u64 *info1, u64 *info2)
   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
  }

 +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
 +{
 + u64 delta_tsc_l1;
 + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
 +
 + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
 + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
 + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu,
 + native_read_tsc()) - vcpu-arch.last_guest_tsc;
 + preempt_val_l1 = delta_tsc_l1  preempt_scale;
 + if (preempt_val_l2 = preempt_val_l1)
 + preempt_val_l2 = 0;
 + else
 + preempt_val_l2 -= preempt_val_l1;
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
 +}
 +
  /*
   * The guest has exited.  See if we can fix it or if we need userspace
   * assistance.
 @@ -6736,9 +6766,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
   vmx-nested.nested_run_pending = 0;

   if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
 + vmx-nested.nested_vmx_exit = true;
   nested_vmx_vmexit(vcpu);
   return 1;
   }
 + vmx-nested.nested_vmx_exit = false;

   if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
   vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY;
 @@ -7132,6 +7164,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
 *vcpu)
   debugctlmsr = get_debugctlmsr();

   vmx-__launched = vmx-loaded_vmcs-launched;
 + if (is_guest_mode(vcpu)  !(vmx-nested.nested_vmx_exit))
 How is_guest_mode() and nested_vmx_exi can be both true? The only place
 nested_vmx_exit is set to true is just before call to
 nested_vmx_vmexit(). The firs thing nested_vmx_vmexit() does is makes
 is_guest_mode() false. To enter guest mode again at least one other
 vmexit from L1 to L0 is needed at which point nested_vmx_exit will be
 reset to false again.

 If you want to avoid calling nested_adjust_preemption_timer() during
 vmlaunch/vmresume emulation (and it looks like this is what you are
 trying to achieve here) you can check nested_run_pending.
Besides vmlaunch/vmresume emulation, every exit from L2-L1 should not
call

[RFC PATCH 1/2] kvm-unit-tests: VMX: Add vmentry failed handler to framework

2013-09-13 Thread Arthur Chunqi Li

Add vmentry failed handler to vmx framework to catch direct fail of
vmentry. When vmlaunch/vmresume directly fail to the next instruction,
a entry failed handler is used to handle this failure. Resume failure
from entry failed handler will cause entry double fail and directly
exit to L1.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 lib/x86/vm.h|3 +++
 x86/vmx.c   |   34 --
 x86/vmx.h   |   15 +--
 x86/vmx_tests.c |   31 +--
 4 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index 6e0ce2b..c8565b5 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -19,7 +19,10 @@
 #define X86_CR0_PE  0x0001
 #define X86_CR0_MP  0x0002
 #define X86_CR0_TS  0x0008
+#define X86_CR0_ET  0x0010
 #define X86_CR0_WP  0x0001
+#define X86_CR0_NW  0x2000
+#define X86_CR0_CD  0x4000
 #define X86_CR0_PG  0x8000
 #define X86_CR4_VMXE   0x0001
 #define X86_CR4_TSD 0x0004
diff --git a/x86/vmx.c b/x86/vmx.c
index 9db4ef4..6a2bf44 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -44,14 +44,6 @@ void report(const char *name, int result)
}
 }
 
-static int make_vmcs_current(struct vmcs *vmcs)
-{
-   bool ret;
-
-   asm volatile (vmptrld %1; setbe %0 : =q (ret) : m (vmcs) : cc);
-   return ret;
-}
-
 /* entry_sysenter */
 asm(
.align 4, 0x90\n\t
@@ -631,6 +623,7 @@ static int exit_handler()
 static int vmx_run()
 {
u32 ret = 0, fail = 0;
+   bool entry_double_fail = false;
 
while (1) {
asm volatile (
@@ -657,28 +650,41 @@ static int vmx_run()
 
);
if (fail)
-   ret = launched ? VMX_TEST_RESUME_ERR :
-   VMX_TEST_LAUNCH_ERR;
+   if (entry_double_fail)
+   ret = launched ? VMX_TEST_RESUME_ERR :
+   VMX_TEST_LAUNCH_ERR;
+   else {
+   ret = current-entry_failed_handler(launched);
+   if (ret == VMX_TEST_RESUME) {
+   entry_double_fail = true;
+   host_rflags = ~(X86_EFLAGS_ZF |
+   X86_EFLAGS_CF);
+   }
+   }
else {
launched = 1;
+   entry_double_fail = false;
ret = exit_handler();
}
if (ret != VMX_TEST_RESUME)
break;
+   ret = fail = 0;
}
launched = 0;
switch (ret) {
case VMX_TEST_VMEXIT:
return 0;
case VMX_TEST_LAUNCH_ERR:
-   printf(%s : vmlaunch failed.\n, __func__);
+   printf(%s : vmlaunch failed, entry_double_fail=%d.\n,
+   __func__, entry_double_fail);
if ((!(host_rflags  X86_EFLAGS_CF)  !(host_rflags  
X86_EFLAGS_ZF))
|| ((host_rflags  X86_EFLAGS_CF)  (host_rflags  
X86_EFLAGS_ZF)))
printf(\tvmlaunch set wrong flags\n);
report(test vmlaunch, 0);
break;
case VMX_TEST_RESUME_ERR:
-   printf(%s : vmresume failed.\n, __func__);
+   printf(%s : vmresume failed, entry_double_fail=%d.\n,
+   __func__, entry_double_fail);
if ((!(host_rflags  X86_EFLAGS_CF)  !(host_rflags  
X86_EFLAGS_ZF))
|| ((host_rflags  X86_EFLAGS_CF)  (host_rflags  
X86_EFLAGS_ZF)))
printf(\tvmresume set wrong flags\n);
@@ -700,12 +706,12 @@ static int test_run(struct vmx_test *test)
return 1;
}
init_vmcs((test-vmcs));
+   current = test;
/* Directly call test-init is ok here, init_vmcs has done
   vmcs init, vmclear and vmptrld*/
if (test-init)
-   test-init(test-vmcs);
+   test-init();
test-exits = 0;
-   current = test;
regs = test-guest_regs;
vmcs_write(GUEST_RFLAGS, regs.rflags | 0x2);
launched = 0;
diff --git a/x86/vmx.h b/x86/vmx.h
index dc1ebdf..469b4dc 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -4,7 +4,8 @@
 #include libcflat.h
 
 struct vmcs {
-   u32 revision_id; /* vmcs revision identifier */
+   u32 revision_id:31, /* vmcs revision identifier */
+   shadow:1; /* shadow-VMCS indicator */
u32 abort; /* VMX-abort indicator */
/* VMCS data */
char data[0];
@@ -32,10 +33,11 @@ struct regs {
 
 struct vmx_test {
const char *name;
-   void (*init)(struct vmcs *vmcs);
+   void (*init)();
void (*guest_main)();
int (*exit_handler)();
void

[RFC PATCH 2/2] kvm-unit-tests: VMX: Add test cases for vmentry checks

2013-09-13 Thread Arthur Chunqi Li

This patch design a framwork to check vmentry fields, then test
all supported features in Intel SDM 26.1 and 26.2. Unsupported
features are not tested, but listed in the code.

To add new tests for vmentry checks, just write functions for
test initialization and add related item in vmentry_cases.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.h   |   22 +-
 x86/vmx_tests.c |  647 +++
 2 files changed, 668 insertions(+), 1 deletion(-)

diff --git a/x86/vmx.h b/x86/vmx.h
index 469b4dc..aeee602 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -344,6 +344,8 @@ enum Ctrl_exi {
 
 enum Ctrl_ent {
ENT_GUEST_64= 1UL  9,
+   ENT_ENT_SMM = 1UL  10,
+   ENT_DEATV_DM= 1UL  11,
ENT_LOAD_PAT= 1UL  14,
ENT_LOAD_EFER   = 1UL  15,
 };
@@ -375,10 +377,13 @@ enum Ctrl0 {
 };
 
 enum Ctrl1 {
+   CPU_VIRT_APIC   = 1ul  0,
CPU_EPT = 1ul  1,
+   CPU_VIRT_X2APIC = 1ul  4,
CPU_VPID= 1ul  5,
-   CPU_URG = 1ul  7,
CPU_WBINVD  = 1ul  6,
+   CPU_URG = 1ul  7,
+   CPU_VIRT_INTR   = 1ul  9,
CPU_RDRAND  = 1ul  11,
CPU_SHADOW  = 1ul  14,
 };
@@ -453,6 +458,7 @@ enum Ctrl1 {
 #define HYPERCALL_VMEXIT   0x1
 
 #define EPTP_PG_WALK_LEN_SHIFT 3ul
+#define EPTP_PG_WALK_LEN_MASK  0x31
 #define EPTP_AD_FLAG   (1ul  6)
 
 #define EPT_MEM_TYPE_UC0ul
@@ -460,6 +466,7 @@ enum Ctrl1 {
 #define EPT_MEM_TYPE_WT4ul
 #define EPT_MEM_TYPE_WP5ul
 #define EPT_MEM_TYPE_WB6ul
+#define EPT_MEM_TYPE_MASK 0x7
 
 #define EPT_RA 1ul
 #define EPT_WA 2ul
@@ -506,6 +513,19 @@ enum Ctrl1 {
 #define INVEPT_SINGLE  1
 #define INVEPT_GLOBAL  2
 
+#define INTR_INFO_TYPE_MASK0x0700
+#define INTR_INFO_TYPE_SHIFT   8
+#define INTR_INFO_TYPE_EXT 0
+#define INTR_INFO_TYPE_REV 1
+#define INTR_INFO_TYPE_NMI 2
+#define INTR_INFO_TYPE_HARD_EXP3
+#define INTR_INFO_TYPE_SOFT_INTR   4
+#define INTR_INFO_TYPE_PSE 5
+#define INTR_INFO_TYPE_SOFT_EXP6
+#define INTR_INFO_TYPE_OTHER   7
+#define INTR_INFO_DELIVER_ERR  0x0800
+#define INTR_INFO_VALID0x8000
+
 extern struct regs regs;
 
 extern union vmx_basic basic;
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index e95e6b8..4372878 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -15,6 +15,13 @@ bool init_fail;
 unsigned long *pml4;
 u64 eptp;
 void *data_page1, *data_page2;
+static u32 cur_test;
+volatile static bool test_success;
+static u32 phy_addr_width;
+
+extern struct vmx_test *current;
+extern u64 host_rflags;
+extern bool launched;
 
 static inline void vmcall()
 {
@@ -1113,6 +1120,643 @@ static int ept_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+static int reset_vmstat(struct vmcs *vmcs)
+{
+   if (vmcs_clear(current-vmcs)) {
+   printf(\tERROR : %s : vmcs_clear failed.\n, __func__);
+   return -1;
+   }
+   if (make_vmcs_current(current-vmcs)) {
+   printf(\tERROR : %s : make_vmcs_current failed.\n, __func__);
+   return -1;
+   }
+   launched = 0;
+   return 0;
+}
+
+static int vmentry_vmcs_absence()
+{
+   vmcs_clear(current-vmcs);
+   return 0;
+}
+
+static int vmentry_vmlaunch_err()
+{
+   launched = 0;
+   return 0;
+}
+
+static int vmentry_vmresume_err()
+{
+   if (reset_vmstat(current-vmcs))
+   return -1;
+   launched = 1;
+   return 0;
+}
+
+static int vmentry_pin_ctrl()
+{
+   vmcs_write(PIN_CONTROLS, ~(ctrl_pin_rev.clr));
+   return 0;
+}
+
+static int vmentry_cpu0_ctrl()
+{
+   vmcs_write(CPU_EXEC_CTRL0, ~(ctrl_cpu_rev[0].clr));
+   return 0;
+}
+
+static int vmentry_cpu1_ctrl()
+{
+   u32 ctrl_cpu[2];
+   if (!(ctrl_cpu_rev[0].clr  CPU_SECONDARY)) {
+   printf(\t%s : Features are not supported for nested.\n, 
__func__);
+   test_success = true;
+   return 0;
+   }
+   ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0] | CPU_SECONDARY);
+   vmcs_write(CPU_EXEC_CTRL1, ~(ctrl_cpu_rev[1].clr));
+   return 0;
+}
+
+static int vmentry_cr3_target_count()
+{
+   vmcs_write(CR3_TARGET_COUNT, 5);
+   return 0;
+}
+
+static int vmentry_iobmp_invalid1()
+{
+   u32 ctrl_cpu0;
+   if (!(ctrl_cpu_rev[0].clr  CPU_IO_BITMAP)) {
+   printf(\t%s : Features are not supported for nested.\n, 
__func__);
+   test_success = true;
+   return 0;
+   }
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu0 |= CPU_IO_BITMAP;
+   ctrl_cpu0 = (~CPU_IO

[RFC PATCH 0/2] kvm-unit-tests: VMX: vmentry checks

2013-09-13 Thread Arthur Chunqi Li

This series implement a framework to capture early exit in vmenter, means 
vmenter
fails to next instruction instead of causing vmexit. Then all supported features
referred in Intel SDM 26.1 and 26.2 are tested.

Some test cases are commented since they may cause fatal error of KVM, thus will
crash test environment and affect the following tests. They are hoped to 
uncomment
after related bugs are fixed.

Arthur Chunqi Li (2):
  kvm-unit-tests: VMX: Add vmentry failed handler to framework
  kvm-unit-tests: VMX: Add test cases for vmentry checks

 lib/x86/vm.h|3 +
 x86/vmx.c   |   34 +--
 x86/vmx.h   |   37 ++-
 x86/vmx_tests.c |  678 ++-
 4 files changed, 725 insertions(+), 27 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/2] kvm-unit-tests: VMX: vmentry checks

2013-09-13 Thread Arthur Chunqi Li

Hi Gleb, Paolo and Jan,

There are indeed many checks that fail to check in vmentry, the result
in my computer is here:

Test suite : vmentry check
PASS: No current VMCS vmenter
PASS: VMLAUNCH with state not clear
PASS: VMRESUME with state not launched
PASS: Reserved bits in PIN_CONTROLS field
PASS: Reserved bits in primary CPU CONTROLS field
PASS: Reserved bits in secondary CPU CONTROLS field
FAIL: CR3 target count greater than 4
FAIL: I/O bitmap address invalid (aligned)
FAIL: I/O bitmap address invalid (exceed)
PASS: MSR bitmap address invalid (aligned)
FAIL: MSR bitmap address invalid (exceed)
FAIL: Consistency of NMI exiting and virtual NMIs
PASS: APIC-accesses address invalid (aligned)
FAIL: APIC-accesses address invalid (exceed)
FAIL: EPTP memory type
FAIL: EPTP page walk length
FAIL: EPTP page reserved bits (11:7)
PASS: Reserved bits in EXI_CONTROLS field
FAIL: Consistency of VMX-preemption timer (activate and save)
PASS: Reserved bits in ENT_CONTROLS field
PASS: Entry to SMM with processor not in SMM
PASS: Deactivate dual-monitor treatment with processor not in SMM
PASS: Invalid bits in host CR0
FAIL: Invalid bits in host CR4
FAIL: Invalid bits in host CR3
FAIL: Invalid host sysenter esp addr
FAIL: Invalid host sysenter eip addr
FAIL: Invalid CS selector field - TI flag
FAIL: Invalid TR selector field - TI flag
FAIL: Invalid CS selector field - H
FAIL: Invalid TR selector field - H
FAIL: Invalid base address of FS
FAIL: Invalid base address of GS
FAIL: Invalid base address of GDTR
FAIL: Invalid base address of IDTR
FAIL: Invalid base address of TR
FAIL: Consistency of EXI_HOST_64 and CR4.PAE

SUMMARY: 90 tests, 24 failures


Besides, all commented cases also fail, they are:
EPTP page reserved bits (63:N)
Invalid host PAT
Invalid host EFER - bits reserved
Invalid host EFER - LMA  LME
Invalid CS selector field - RPL
Invalid TR selector field - RPL

You can find detailed description of these cases in Intel SDM 26.1 and 26.2.

Arthur

On Fri, Sep 13, 2013 at 2:35 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 This series implement a framework to capture early exit in vmenter, means 
 vmenter
 fails to next instruction instead of causing vmexit. Then all supported 
 features
 referred in Intel SDM 26.1 and 26.2 are tested.

 Some test cases are commented since they may cause fatal error of KVM, thus 
 will
 crash test environment and affect the following tests. They are hoped to 
 uncomment
 after related bugs are fixed.

 Arthur Chunqi Li (2):
   kvm-unit-tests: VMX: Add vmentry failed handler to framework
   kvm-unit-tests: VMX: Add test cases for vmentry checks

  lib/x86/vm.h|3 +
  x86/vmx.c   |   34 +--
  x86/vmx.h   |   37 ++-
  x86/vmx_tests.c |  678 
 ++-
  4 files changed, 725 insertions(+), 27 deletions(-)

 --
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

The state of vmexit/vmenter MSR store/load in nested vmx

2013-09-12 Thread Arthur Chunqi Li

Hi Jan and maillist,

Does nest VMX support vmexit MSR store/load and vmenter MSR load now?
I tried to set VM-exit MSR-store address with valid address and set
VM-exit MSR-store count to 1, then the vmenter fails. Anything else
should I set to use these features?

Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

How to recreate MOV-SS blocking vmentry fail

2013-09-11 Thread Arthur Chunqi Li

Hi Gleb, Paolo and related folks,

I was trying to recreate MOV-SS blocking vmentry fail (Intel SDM 26.1,
5. a). Here the manual refers to Table 24-3, but later in 26.3.1.5
also describe it. I got confused how this scenario can be recreated.
Do you have any ideas?

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to recreate MOV-SS blocking vmentry fail

2013-09-11 Thread Arthur Chunqi Li

On Wed, Sep 11, 2013 at 8:53 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Sep 11, 2013 at 08:49:28PM +0800, Arthur Chunqi Li wrote:
 Hi Gleb, Paolo and related folks,

 I was trying to recreate MOV-SS blocking vmentry fail (Intel SDM 26.1,
 5. a). Here the manual refers to Table 24-3, but later in 26.3.1.5
 also describe it. I got confused how this scenario can be recreated.
 Do you have any ideas?

 mov $0, %ss
 vmlaunch
Should these two instructions execute sequentially?

Thanks,
Arthur

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to recreate MOV-SS blocking vmentry fail

2013-09-11 Thread Arthur Chunqi Li

On Wed, Sep 11, 2013 at 9:03 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Sep 11, 2013 at 03:01:07PM +0200, Paolo Bonzini wrote:
 Il 11/09/2013 14:53, Gleb Natapov ha scritto:
   I was trying to recreate MOV-SS blocking vmentry fail (Intel SDM 26.1,
   5. a). Here the manual refers to Table 24-3, but later in 26.3.1.5
   also describe it. I got confused how this scenario can be recreated.
   Do you have any ideas?
  
  mov $0, %ss
  vmlaunch

 Probably better to save %ss somewhere around these instructions... :)

 Details, details :) It can be:
 mov %ss, tmp
 mov tmp, $ss
 vmlaunch
Well, this seems hard to test this in our framework ;( vmlaunch is
surrounded with many instructions and we cannot add vmlaunch in exit
handler.

Thanks,
Arthur

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/6] KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in load_vmcs12_host_state

2013-09-10 Thread Arthur Chunqi Li

On Mon, Sep 2, 2013 at 4:21 PM, Gleb Natapov g...@redhat.com wrote:
 On Thu, Aug 08, 2013 at 04:26:28PM +0200, Jan Kiszka wrote:
 Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the
 Not a typo :) That what Avi asked for do during initial nested VMX
 review: http://markmail.org/message/hhidqyhbo2mrgxxc

 But there is at least one transition check that kvm_set_cr0() does that
 should not be done during vmexit emulation, namely CS.L bit check, so I
 tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
 it is. But can we skip other checks kvm_set_cr0() does? For instance
 what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
 during nested vmexit?  What _should_ prevent it is vmentry check from
 26.2.4

 If the host address-space size VM-exit control is 1, the following
 must hold:
  - Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.
Hi Jan and Gleb,
Our nested VMX testing framework may not support such testing modes.
Here we need to catch the failed result (ZF flag) close to vmresume,
but vmresume/vmlaunch is well encapsulated in our framework. If we
simply write a vmresume inline function, the VMX will act unexpectedly
when it doesn't cause vmresume fail.

Do you have any ideas about this?

Arthur

 But I do not see that we do that check on vmentry.

 What about NW/CD bit checks, or reserved bits checks? 27.5.1 says:
   The following bits are not modified:
For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64
architecture), 28:19, 17, and 15:6; and any bits that are fixed in
VMX operation (see Section 23.8).

 But again current vmexit code does not emulate this properly and just
 sets everything from host_cr0. vmentry should also preserve all those
 bit but it looks like it doesn't too.


 state transition that may prevent loading L1's cr0.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  arch/x86/kvm/vmx.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..d001b019 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -8185,7 +8185,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
 *vcpu,
* fpu_active (which may have changed).
* Note that vmx_set_cr0 refers to efer set above.
*/
 - kvm_set_cr0(vcpu, vmcs12-host_cr0);
 + vmx_set_cr0(vcpu, vmcs12-host_cr0);
   /*
* If we did fpu_activate()/fpu_deactivate() during L2's run, we need
* to apply the same changes to L1's vmcs. We just set cr0 correctly,
 --
 1.7.3.4

 --
 Gleb.



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm-unit-tests: VMX: Fix two minor bugs

2013-09-10 Thread Arthur Chunqi Li

This patch just contains two minor changes to EPT framwork.
1. Reorder macro definition
2. Fix bug of setting CPU_EPT without check.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.h   |2 +-
 x86/vmx_tests.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/x86/vmx.h b/x86/vmx.h
index e02183f..dc1ebdf 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -366,9 +366,9 @@ enum Ctrl0 {
CPU_NMI_WINDOW  = 1ul  22,
CPU_IO  = 1ul  24,
CPU_IO_BITMAP   = 1ul  25,
+   CPU_MSR_BITMAP  = 1ul  28,
CPU_MONITOR = 1ul  29,
CPU_PAUSE   = 1ul  30,
-   CPU_MSR_BITMAP  = 1ul  28,
CPU_SECONDARY   = 1ul  31,
 };
 
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index e891a9f..0759e10 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -925,7 +925,7 @@ static void ept_init()
ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
 ctrl_cpu_rev[1].clr;
vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
-   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
+   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]);
if (setup_ept())
init_fail = true;
data_page1 = alloc_page();
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm-unit-tests: VMX: Fix two minor bugs

2013-09-10 Thread Arthur Chunqi Li

Hi Paolo,

Sorry but I should trouble you merging these two minor changes to vmx branch.

Until now, all the commits in vmx branch seems fine (if others have no
comments). Because I have some patches to commit based on vmx branch,
should we merge this branch to master or I just commit patches based
on vmx?

Thanks,
Arthur

On Wed, Sep 11, 2013 at 11:11 AM, Arthur Chunqi Li yzt...@gmail.com wrote:
 This patch just contains two minor changes to EPT framwork.
 1. Reorder macro definition
 2. Fix bug of setting CPU_EPT without check.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |2 +-
  x86/vmx_tests.c |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index e02183f..dc1ebdf 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -366,9 +366,9 @@ enum Ctrl0 {
 CPU_NMI_WINDOW  = 1ul  22,
 CPU_IO  = 1ul  24,
 CPU_IO_BITMAP   = 1ul  25,
 +   CPU_MSR_BITMAP  = 1ul  28,
 CPU_MONITOR = 1ul  29,
 CPU_PAUSE   = 1ul  30,
 -   CPU_MSR_BITMAP  = 1ul  28,
 CPU_SECONDARY   = 1ul  31,
  };

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index e891a9f..0759e10 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -925,7 +925,7 @@ static void ept_init()
 ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
  ctrl_cpu_rev[1].clr;
 vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
 -   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
 +   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]);
 if (setup_ept())
 init_fail = true;
 data_page1 = alloc_page();
 --
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] kvm-unit-tests: VMX: Test nested EPT features

2013-09-09 Thread Arthur Chunqi Li

On Mon, Sep 9, 2013 at 3:17 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-09-09 06:57, Arthur Chunqi Li wrote:
 This series of patches provide the framework of nested EPT and some test
 cases for nested EPT features.

 Arthur Chunqi Li (2):
   kvm-unit-tests: VMX: The framework of EPT for nested VMX testing
   kvm-unit-tests: VMX: Test cases for nested EPT

  x86/vmx.c   |  159 -
  x86/vmx.h   |   76 
  x86/vmx_tests.c |  266 
 +++
  3 files changed, 497 insertions(+), 4 deletions(-)


 I suppose this is v2 of the previous patch? What is the delta? A meta
 changelog could go here.
Yes, v1 just provide the framework of EPT (similar to the first patch
of this series), and some more tests about nested EPT is added in this
series (the second patch).

Arthur

 Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm-unit-tests: VMX: Test suite for preemption timer

2013-09-09 Thread Arthur Chunqi Li

On Mon, Sep 9, 2013 at 8:51 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 05/09/2013 11:22, Arthur Chunqi Li ha scritto:
 Hi Jan, Gleb and Paolo,

 It suddenly occurred to me that, if guest's PIN_PREEMPT disabled while
 EXI_SAVE_PREEMPT_VALUE enabled, what will happen? The preempt value in
 vmcs will not be affected, yes?

 Indeed.  The VMX preemption timer will not count down, so it will remain
 set to the same value.  Saving it on exit will actually do nothing.

 This cases fails to test in this patch.

 You can add it as a follow up.
Yep. Because this patch was committed several weeks ago, I have
committed a second version, you can just review that patch.

Arthur

 Paolo




-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-09 Thread Arthur Chunqi Li

On Mon, Sep 9, 2013 at 9:56 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 09/09/2013 06:57, Arthur Chunqi Li ha scritto:
 Some test cases for nested EPT features, including:
 1. EPT basic framework tests: read, write and remap.
 2. EPT misconfigurations test cases: page permission mieconfiguration
 and memory type misconfiguration
 3. EPT violations test cases: page permission violation and paging
 structure violation

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx_tests.c |  266 
 +++
  1 file changed, 266 insertions(+)

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index c1b39f4..a0b9824 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -1,4 +1,36 @@
  #include vmx.h
 +#include processor.h
 +#include vm.h
 +#include msr.h
 +#include fwcfg.h
 +
 +volatile u32 stage;
 +volatile bool init_fail;

 Why volatile?
Because init_fail is only set but not used later in ept_init(), and if
I don't add volatile, compiler may optimize the setting to init_fail.

This occasion firstly occurred when I write set_stage/get_stage. If
one variant is set in a function but not used later, the compiler
usually optimizes this setting as redundant assignment and remove it.

Arthur

 The patch looks good.

 +unsigned long *pml4;
 +u64 eptp;
 +void *data_page1, *data_page2;
 +
 +static inline void set_stage(u32 s)
 +{
 + barrier();
 + stage = s;
 + barrier();
 +}
 +
 +static inline u32 get_stage()
 +{
 + u32 s;
 +
 + barrier();
 + s = stage;
 + barrier();
 + return s;
 +}
 +
 +static inline void vmcall()
 +{
 + asm volatile (vmcall);
 +}

  void basic_init()
  {
 @@ -76,6 +108,238 @@ int vmenter_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +static int setup_ept()
 +{
 + int support_2m;
 + unsigned long end_of_memory;
 +
 + if (!(ept_vpid.val  EPT_CAP_UC) 
 + !(ept_vpid.val  EPT_CAP_WB)) {
 + printf(\tEPT paging-structure memory type 
 + UCWB are not supported\n);
 + return 1;
 + }
 + if (ept_vpid.val  EPT_CAP_UC)
 + eptp = EPT_MEM_TYPE_UC;
 + else
 + eptp = EPT_MEM_TYPE_WB;
 + if (!(ept_vpid.val  EPT_CAP_PWL4)) {
 + printf(\tPWL4 is not supported\n);
 + return 1;
 + }
 + eptp |= (3  EPTP_PG_WALK_LEN_SHIFT);
 + pml4 = alloc_page();
 + memset(pml4, 0, PAGE_SIZE);
 + eptp |= virt_to_phys(pml4);
 + vmcs_write(EPTP, eptp);
 + support_2m = !!(ept_vpid.val  EPT_CAP_2M_PAGE);
 + end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE);
 + if (end_of_memory  (1ul  32))
 + end_of_memory = (1ul  32);
 + if (setup_ept_range(pml4, 0, end_of_memory,
 + 0, support_2m, EPT_WA | EPT_RA | EPT_EA)) {
 + printf(\tSet ept tables failed.\n);
 + return 1;
 + }
 + return 0;
 +}
 +
 +static void ept_init()
 +{
 + u32 ctrl_cpu[2];
 +
 + init_fail = false;
 + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
 + ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
 + ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY)
 +  ctrl_cpu_rev[0].clr;
 + ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
 +  ctrl_cpu_rev[1].clr;
 + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
 + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
 + if (setup_ept())
 + init_fail = true;
 + data_page1 = alloc_page();
 + data_page2 = alloc_page();
 + memset(data_page1, 0x0, PAGE_SIZE);
 + memset(data_page2, 0x0, PAGE_SIZE);
 + *((u32 *)data_page1) = MAGIC_VAL_1;
 + *((u32 *)data_page2) = MAGIC_VAL_2;
 + install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2,
 + EPT_RA | EPT_WA | EPT_EA);
 +}
 +
 +static void ept_main()
 +{
 + if (init_fail)
 + return;
 + if (!(ctrl_cpu_rev[0].clr  CPU_SECONDARY)
 +  !(ctrl_cpu_rev[1].clr  CPU_EPT)) {
 + printf(\tEPT is not supported);
 + return;
 + }
 + set_stage(0);
 + if (*((u32 *)data_page2) != MAGIC_VAL_1 
 + *((u32 *)data_page1) != MAGIC_VAL_1)
 + report(EPT basic framework - read, 0);
 + else {
 + *((u32 *)data_page2) = MAGIC_VAL_3;
 + vmcall();
 + if (get_stage() == 1) {
 + if (*((u32 *)data_page1) == MAGIC_VAL_3 
 + *((u32 *)data_page2) == MAGIC_VAL_2)
 + report(EPT basic framework, 1);
 + else
 + report(EPT basic framework - remap, 1);
 + }
 + }
 + // Test EPT Misconfigurations
 + set_stage(1);
 + vmcall();
 + *((u32 *)data_page1) = MAGIC_VAL_1;
 + if (get_stage() != 2) {
 + report(EPT misconfigurations, 0);
 + goto t1;
 + }
 + set_stage(2);
 + vmcall

Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-09 Thread Arthur Chunqi Li

On Mon, Sep 9, 2013 at 12:57 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 Some test cases for nested EPT features, including:
 1. EPT basic framework tests: read, write and remap.
 2. EPT misconfigurations test cases: page permission mieconfiguration
 and memory type misconfiguration
 3. EPT violations test cases: page permission violation and paging
 structure violation

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx_tests.c |  266 
 +++
  1 file changed, 266 insertions(+)

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index c1b39f4..a0b9824 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -1,4 +1,36 @@
  #include vmx.h
 +#include processor.h
 +#include vm.h
 +#include msr.h
 +#include fwcfg.h
 +
 +volatile u32 stage;
 +volatile bool init_fail;
 +unsigned long *pml4;
 +u64 eptp;
 +void *data_page1, *data_page2;
 +
 +static inline void set_stage(u32 s)
 +{
 +   barrier();
 +   stage = s;
 +   barrier();
 +}
 +
 +static inline u32 get_stage()
 +{
 +   u32 s;
 +
 +   barrier();
 +   s = stage;
 +   barrier();
 +   return s;
 +}
 +
 +static inline void vmcall()
 +{
 +   asm volatile (vmcall);
 +}

  void basic_init()
  {
 @@ -76,6 +108,238 @@ int vmenter_exit_handler()
 return VMX_TEST_VMEXIT;
  }

 +static int setup_ept()
 +{
 +   int support_2m;
 +   unsigned long end_of_memory;
 +
 +   if (!(ept_vpid.val  EPT_CAP_UC) 
 +   !(ept_vpid.val  EPT_CAP_WB)) {
 +   printf(\tEPT paging-structure memory type 
 +   UCWB are not supported\n);
 +   return 1;
 +   }
 +   if (ept_vpid.val  EPT_CAP_UC)
 +   eptp = EPT_MEM_TYPE_UC;
 +   else
 +   eptp = EPT_MEM_TYPE_WB;
 +   if (!(ept_vpid.val  EPT_CAP_PWL4)) {
 +   printf(\tPWL4 is not supported\n);
 +   return 1;
 +   }
 +   eptp |= (3  EPTP_PG_WALK_LEN_SHIFT);
 +   pml4 = alloc_page();
 +   memset(pml4, 0, PAGE_SIZE);
 +   eptp |= virt_to_phys(pml4);
 +   vmcs_write(EPTP, eptp);
 +   support_2m = !!(ept_vpid.val  EPT_CAP_2M_PAGE);
 +   end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE);
 +   if (end_of_memory  (1ul  32))
 +   end_of_memory = (1ul  32);
 +   if (setup_ept_range(pml4, 0, end_of_memory,
 +   0, support_2m, EPT_WA | EPT_RA | EPT_EA)) {
 +   printf(\tSet ept tables failed.\n);
 +   return 1;
 +   }
 +   return 0;
 +}
 +
 +static void ept_init()
 +{
 +   u32 ctrl_cpu[2];
 +
 +   init_fail = false;
 +   ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
 +   ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
 +   ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY)
 +ctrl_cpu_rev[0].clr;
 +   ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
 +ctrl_cpu_rev[1].clr;
 +   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
 +   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
 +   if (setup_ept())
 +   init_fail = true;
 +   data_page1 = alloc_page();
 +   data_page2 = alloc_page();
 +   memset(data_page1, 0x0, PAGE_SIZE);
 +   memset(data_page2, 0x0, PAGE_SIZE);
 +   *((u32 *)data_page1) = MAGIC_VAL_1;
 +   *((u32 *)data_page2) = MAGIC_VAL_2;
 +   install_ept(pml4, (unsigned long)data_page1, (unsigned 
 long)data_page2,
 +   EPT_RA | EPT_WA | EPT_EA);
 +}
 +
 +static void ept_main()
 +{
 +   if (init_fail)
 +   return;
 +   if (!(ctrl_cpu_rev[0].clr  CPU_SECONDARY)
 +!(ctrl_cpu_rev[1].clr  CPU_EPT)) {
 +   printf(\tEPT is not supported);
 +   return;
 +   }
 +   set_stage(0);
 +   if (*((u32 *)data_page2) != MAGIC_VAL_1 
 +   *((u32 *)data_page1) != MAGIC_VAL_1)
 +   report(EPT basic framework - read, 0);
 +   else {
 +   *((u32 *)data_page2) = MAGIC_VAL_3;
 +   vmcall();
 +   if (get_stage() == 1) {
 +   if (*((u32 *)data_page1) == MAGIC_VAL_3 
 +   *((u32 *)data_page2) == MAGIC_VAL_2)
 +   report(EPT basic framework, 1);
 +   else
 +   report(EPT basic framework - remap, 1);
 +   }
 +   }
 +   // Test EPT Misconfigurations
 +   set_stage(1);
 +   vmcall();
 +   *((u32 *)data_page1) = MAGIC_VAL_1;
 +   if (get_stage() != 2) {
 +   report(EPT misconfigurations, 0);
 +   goto t1;
 +   }
 +   set_stage(2);
 +   vmcall();
 +   *((u32 *)data_page1) = MAGIC_VAL_1;
 +   if (get_stage() != 3) {
 +   report(EPT misconfigurations, 0);
 +   goto t1;
 +   }
 +   report(EPT misconfigurations, 1);
 +t1:
 +   // Test EPT violation

[PATCH] kvm-unit-tests: VMX: Fix some nested EPT related bugs

2013-09-09 Thread Arthur Chunqi Li

This patch fix 3 bugs in VMX framework and EPT framework
1. Fix bug of setting default value of CPU_SECONDARY
2. Fix bug of reading MSR_IA32_VMX_PROCBASED_CTLS2 and
MSR_IA32_VMX_EPT_VPID_CAP
3. For EPT violation and misconfiguration reduced vmexit, vmcs field
VM-exit instruction length is not used and will return unexpected
value when read.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.c   |   13 ++---
 x86/vmx_tests.c |2 --
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index 87d1d55..9db4ef4 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -304,7 +304,8 @@ static void init_vmcs_ctrl(void)
/* Disable VMEXIT of IO instruction */
vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
if (ctrl_cpu_rev[0].set  CPU_SECONDARY) {
-   ctrl_cpu[1] |= ctrl_cpu_rev[1].set  ctrl_cpu_rev[1].clr;
+   ctrl_cpu[1] = (ctrl_cpu[1] | ctrl_cpu_rev[1].set) 
+   ctrl_cpu_rev[1].clr;
vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]);
}
vmcs_write(CR3_TARGET_COUNT, 0);
@@ -489,8 +490,14 @@ static void init_vmx(void)
: MSR_IA32_VMX_ENTRY_CTLS);
ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
: MSR_IA32_VMX_PROCBASED_CTLS);
-   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
-   ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
+   if ((ctrl_cpu_rev[0].clr  CPU_SECONDARY) != 0)
+   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
+   else
+   ctrl_cpu_rev[1].val = 0;
+   if ((ctrl_cpu_rev[1].clr  (CPU_EPT | CPU_VPID)) != 0)
+   ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
+   else
+   ept_vpid.val = 0;
 
write_cr0((read_cr0()  fix_cr0_clr) | fix_cr0_set);
write_cr4((read_cr4()  fix_cr4_clr) | fix_cr4_set | X86_CR4_VMXE);
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 6d972c0..e891a9f 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1075,7 +1075,6 @@ static int ept_exit_handler()
print_vmexit_info();
return VMX_TEST_VMEXIT;
}
-   vmcs_write(GUEST_RIP, guest_rip + insn_len);
return VMX_TEST_RESUME;
case VMX_EPT_VIOLATION:
switch(get_stage()) {
@@ -1100,7 +1099,6 @@ static int ept_exit_handler()
print_vmexit_info();
return VMX_TEST_VMEXIT;
}
-   vmcs_write(GUEST_RIP, guest_rip + insn_len);
return VMX_TEST_RESUME;
default:
printf(Unknown exit reason, %d\n, reason);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Correct way of tracking reads on given gfn ?

2013-09-09 Thread Arthur Chunqi Li

On Mon, Sep 9, 2013 at 8:29 PM, Gleb Natapov g...@redhat.com wrote:
 On Mon, Sep 09, 2013 at 12:53:02PM +0200, Paolo Bonzini wrote:
 Il 09/09/2013 12:22, SPA ha scritto:
  Thanks Paolo.
 
  Is there a way where reads would trap ?
 
  I explored a bit on PM_PRESENT_MASK.  Though its not READ bit, but a
  PRESENT bit,  it looks like it should generate traps on reads if this
  bit is reset. From code, looks like rmap_write_protect() like function
  I stated in previous mail should do. Would this approach work ? Are
  there any glaring problems with this approach ?

 I cannot say right away.  Another way could be to set reserved bits to
 generate EPT misconfigurations.  See ept_set_mmio_spte_mask and
 is_mmio_spte.

 This would trap both reads and writes.

 Dropping all sptes will also work, but trapping each read access will be dog 
 slow. QEMU
 emulation will be much faster.
Hi Gleb,
I'm interested in this topic, what do you mean by QEMU emulation? Do
you mean the functions in arch/x86/kvm/emulate.c? In what scenario
will KVM call these functions?

Thanks,
Arthur

 --
 Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-08 Thread Arthur Chunqi Li

Some test cases for nested EPT features, including:
1. EPT basic framework tests: read, write and remap.
2. EPT misconfigurations test cases: page permission mieconfiguration
and memory type misconfiguration
3. EPT violations test cases: page permission violation and paging
structure violation

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx_tests.c |  266 +++
 1 file changed, 266 insertions(+)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index c1b39f4..a0b9824 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,4 +1,36 @@
 #include vmx.h
+#include processor.h
+#include vm.h
+#include msr.h
+#include fwcfg.h
+
+volatile u32 stage;
+volatile bool init_fail;
+unsigned long *pml4;
+u64 eptp;
+void *data_page1, *data_page2;
+
+static inline void set_stage(u32 s)
+{
+   barrier();
+   stage = s;
+   barrier();
+}
+
+static inline u32 get_stage()
+{
+   u32 s;
+
+   barrier();
+   s = stage;
+   barrier();
+   return s;
+}
+
+static inline void vmcall()
+{
+   asm volatile (vmcall);
+}
 
 void basic_init()
 {
@@ -76,6 +108,238 @@ int vmenter_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+static int setup_ept()
+{
+   int support_2m;
+   unsigned long end_of_memory;
+
+   if (!(ept_vpid.val  EPT_CAP_UC) 
+   !(ept_vpid.val  EPT_CAP_WB)) {
+   printf(\tEPT paging-structure memory type 
+   UCWB are not supported\n);
+   return 1;
+   }
+   if (ept_vpid.val  EPT_CAP_UC)
+   eptp = EPT_MEM_TYPE_UC;
+   else
+   eptp = EPT_MEM_TYPE_WB;
+   if (!(ept_vpid.val  EPT_CAP_PWL4)) {
+   printf(\tPWL4 is not supported\n);
+   return 1;
+   }
+   eptp |= (3  EPTP_PG_WALK_LEN_SHIFT);
+   pml4 = alloc_page();
+   memset(pml4, 0, PAGE_SIZE);
+   eptp |= virt_to_phys(pml4);
+   vmcs_write(EPTP, eptp);
+   support_2m = !!(ept_vpid.val  EPT_CAP_2M_PAGE);
+   end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE);
+   if (end_of_memory  (1ul  32))
+   end_of_memory = (1ul  32);
+   if (setup_ept_range(pml4, 0, end_of_memory,
+   0, support_2m, EPT_WA | EPT_RA | EPT_EA)) {
+   printf(\tSet ept tables failed.\n);
+   return 1;
+   }
+   return 0;
+}
+
+static void ept_init()
+{
+   u32 ctrl_cpu[2];
+
+   init_fail = false;
+   ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
+   ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY)
+ctrl_cpu_rev[0].clr;
+   ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
+ctrl_cpu_rev[1].clr;
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
+   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
+   if (setup_ept())
+   init_fail = true;
+   data_page1 = alloc_page();
+   data_page2 = alloc_page();
+   memset(data_page1, 0x0, PAGE_SIZE);
+   memset(data_page2, 0x0, PAGE_SIZE);
+   *((u32 *)data_page1) = MAGIC_VAL_1;
+   *((u32 *)data_page2) = MAGIC_VAL_2;
+   install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2,
+   EPT_RA | EPT_WA | EPT_EA);
+}
+
+static void ept_main()
+{
+   if (init_fail)
+   return;
+   if (!(ctrl_cpu_rev[0].clr  CPU_SECONDARY)
+!(ctrl_cpu_rev[1].clr  CPU_EPT)) {
+   printf(\tEPT is not supported);
+   return;
+   }
+   set_stage(0);
+   if (*((u32 *)data_page2) != MAGIC_VAL_1 
+   *((u32 *)data_page1) != MAGIC_VAL_1)
+   report(EPT basic framework - read, 0);
+   else {
+   *((u32 *)data_page2) = MAGIC_VAL_3;
+   vmcall();
+   if (get_stage() == 1) {
+   if (*((u32 *)data_page1) == MAGIC_VAL_3 
+   *((u32 *)data_page2) == MAGIC_VAL_2)
+   report(EPT basic framework, 1);
+   else
+   report(EPT basic framework - remap, 1);
+   }
+   }
+   // Test EPT Misconfigurations
+   set_stage(1);
+   vmcall();
+   *((u32 *)data_page1) = MAGIC_VAL_1;
+   if (get_stage() != 2) {
+   report(EPT misconfigurations, 0);
+   goto t1;
+   }
+   set_stage(2);
+   vmcall();
+   *((u32 *)data_page1) = MAGIC_VAL_1;
+   if (get_stage() != 3) {
+   report(EPT misconfigurations, 0);
+   goto t1;
+   }
+   report(EPT misconfigurations, 1);
+t1:
+   // Test EPT violation
+   set_stage(3);
+   vmcall();
+   *((u32 *)data_page1) = MAGIC_VAL_1;
+   if (get_stage() == 4)
+   report(EPT violation - page permission, 1);
+   else
+   report(EPT violation - page permission, 0

[PATCH 1/2] kvm-unit-tests: VMX: The framework of EPT for nested VMX testing

2013-09-08 Thread Arthur Chunqi Li

The framework of EPT for nested VMX, including functions to build up
EPT paging structures, read/set EPT PTEs and setup a range of 1:1 map
EPT.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.c |  159 +++--
 x86/vmx.h |   76 +
 2 files changed, 231 insertions(+), 4 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index ca36d35..87d1d55 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -143,6 +143,159 @@ asm(
   call hypercall\n\t
 );
 
+/* EPT paging structure related functions */
+/* install_ept_entry : Install a page to a given level in EPT
+   @pml4 : addr of pml4 table
+   @pte_level : level of PTE to set
+   @guest_addr : physical address of guest
+   @pte : pte value to set
+   @pt_page : address of page table, NULL for a new page
+ */
+void install_ept_entry(unsigned long *pml4,
+   int pte_level,
+   unsigned long guest_addr,
+   unsigned long pte,
+   unsigned long *pt_page)
+{
+   int level;
+   unsigned long *pt = pml4;
+   unsigned offset;
+
+   for (level = EPT_PAGE_LEVEL; level  pte_level; --level) {
+   offset = (guest_addr  ((level-1) * EPT_PGDIR_WIDTH + 12))
+EPT_PGDIR_MASK;
+   if (!(pt[offset]  (EPT_PRESENT))) {
+   unsigned long *new_pt = pt_page;
+   if (!new_pt)
+   new_pt = alloc_page();
+   else
+   pt_page = 0;
+   memset(new_pt, 0, PAGE_SIZE);
+   pt[offset] = virt_to_phys(new_pt)
+   | EPT_RA | EPT_WA | EPT_EA;
+   }
+   pt = phys_to_virt(pt[offset]  0xff000ull);
+   }
+   offset = ((unsigned long)guest_addr  ((level-1) *
+   EPT_PGDIR_WIDTH + 12))  EPT_PGDIR_MASK;
+   pt[offset] = pte;
+}
+
+/* Map a page, @perm is the permission of the page */
+void install_ept(unsigned long *pml4,
+   unsigned long phys,
+   unsigned long guest_addr,
+   u64 perm)
+{
+   install_ept_entry(pml4, 1, guest_addr, (phys  PAGE_MASK) | perm, 0);
+}
+
+/* Map a 1G-size page */
+void install_1g_ept(unsigned long *pml4,
+   unsigned long phys,
+   unsigned long guest_addr,
+   u64 perm)
+{
+   install_ept_entry(pml4, 3, guest_addr,
+   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
+}
+
+/* Map a 2M-size page */
+void install_2m_ept(unsigned long *pml4,
+   unsigned long phys,
+   unsigned long guest_addr,
+   u64 perm)
+{
+   install_ept_entry(pml4, 2, guest_addr,
+   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
+}
+
+/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging structure.
+   @start : start address of guest page
+   @len : length of address to be mapped
+   @map_1g : whether 1G page map is used
+   @map_2m : whether 2M page map is used
+   @perm : permission for every page
+ */
+int setup_ept_range(unsigned long *pml4, unsigned long start,
+   unsigned long len, int map_1g, int map_2m, u64 perm)
+{
+   u64 phys = start;
+   u64 max = (u64)len + (u64)start;
+
+   if (map_1g) {
+   while (phys + PAGE_SIZE_1G = max) {
+   install_1g_ept(pml4, phys, phys, perm);
+   phys += PAGE_SIZE_1G;
+   }
+   }
+   if (map_2m) {
+   while (phys + PAGE_SIZE_2M = max) {
+   install_2m_ept(pml4, phys, phys, perm);
+   phys += PAGE_SIZE_2M;
+   }
+   }
+   while (phys + PAGE_SIZE = max) {
+   install_ept(pml4, phys, phys, perm);
+   phys += PAGE_SIZE;
+   }
+   return 0;
+}
+
+/* get_ept_pte : Get the PTE of a given level in EPT,
+@level == 1 means get the latest level*/
+unsigned long get_ept_pte(unsigned long *pml4,
+   unsigned long guest_addr, int level)
+{
+   int l;
+   unsigned long *pt = pml4, pte;
+   unsigned offset;
+
+   for (l = EPT_PAGE_LEVEL; l  1; --l) {
+   offset = (guest_addr  (((l-1) * EPT_PGDIR_WIDTH) + 12))
+EPT_PGDIR_MASK;
+   pte = pt[offset];
+   if (!(pte  (EPT_PRESENT)))
+   return 0;
+   if (l == level)
+   return pte;
+   if (l  4  (pte  EPT_LARGE_PAGE))
+   return pte;
+   pt = (unsigned long *)(pte  0xff000ull);
+   }
+   offset = (guest_addr  (((l-1) * EPT_PGDIR_WIDTH) + 12))
+EPT_PGDIR_MASK

[PATCH 0/2] kvm-unit-tests: VMX: Test nested EPT features

2013-09-08 Thread Arthur Chunqi Li

This series of patches provide the framework of nested EPT and some test
cases for nested EPT features.

Arthur Chunqi Li (2):
  kvm-unit-tests: VMX: The framework of EPT for nested VMX testing
  kvm-unit-tests: VMX: Test cases for nested EPT

 x86/vmx.c   |  159 -
 x86/vmx.h   |   76 
 x86/vmx_tests.c |  266 +++
 3 files changed, 497 insertions(+), 4 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-05 Thread Arthur Chunqi Li

On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote:
 Arthur Chunqi Li wrote on 2013-09-04:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some
 reasons not emulated by L1, preemption timer value should be save in such
 exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls to
 nVMX.

 With this patch, nested VMX preemption timer features are fully supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
 This series depends on queue.

  arch/x86/include/uapi/asm/msr-index.h |1 +
  arch/x86/kvm/vmx.c|   51
 ++---
  2 files changed, 48 insertions(+), 4 deletions(-)

 diff --git a/arch/x86/include/uapi/asm/msr-index.h
 b/arch/x86/include/uapi/asm/msr-index.h
 index bb04650..b93e09a 100644
 --- a/arch/x86/include/uapi/asm/msr-index.h
 +++ b/arch/x86/include/uapi/asm/msr-index.h
 @@ -536,6 +536,7 @@

  /* MSR_IA32_VMX_MISC bits */
  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
 +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
  /* AMD-V MSRs */

  #define MSR_VM_CR   0xc0010114
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..870caa8
 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,14 @@ static __init void
 nested_vmx_setup_ctls_msrs(void)  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + if (!(nested_vmx_pinbased_ctls_high 
 PIN_BASED_VMX_PREEMPTION_TIMER))
 + nested_vmx_exit_ctls_high =
 + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + if (!(nested_vmx_exit_ctls_high 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
 + nested_vmx_pinbased_ctls_high =
 + (~PIN_BASED_VMX_PREEMPTION_TIMER);
 The following logic is more clearly:
 if(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER)
 nested_vmx_exit_ctls_high |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
Here I have such consideration: this logic is wrong if CPU support
PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does
occurs. So the codes above reads the MSR and reserves the features it
supports, and here I just check if these two features are supported
simultaneously.

You remind that this piece of codes can write like this:
if (!(nested_vmx_pin_based_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER) ||
!(nested_vmx_exit_ctls_high 
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
nested_vmx_exit_ctls_high =(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
nested_vmx_pinbased_ctls_high = (~PIN_BASED_VMX_PREEMPTION_TIMER);
}

This may reflect the logic I describe above that these two flags
should support simultaneously, and brings less confusion.

 BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's 
 capability when expose those vmx features(not just preemption timer) to L1.
The codes just above here, when setting pinbased control for nested
vmx, it firstly rdmsr MSR_IA32_VMX_PINBASED_CTLS, then use this to
mask the features hardware not support. So does other control fields.

   nested_vmx_exit_ctls_high |=
 (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu
 *vcpu, u64 *info1, u64 *info2)
   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);  }

 +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) {
 + u64 delta_tsc_l1;
 + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
 +
 + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
 + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
 + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu,
 + native_read_tsc()) - vcpu-arch.last_guest_tsc;
 + preempt_val_l1 = delta_tsc_l1  preempt_scale;
 + if (preempt_val_l2 - preempt_val_l1  0)
 + preempt_val_l2 = 0;
 + else
 + preempt_val_l2 -= preempt_val_l1;
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); }
  /*
   * The guest has exited.  See if we can fix it or if we need userspace
   * assistance.
 @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 exit_reason = vmx-exit_reason;
   u32 vectoring_info = vmx-idt_vectoring_info;
 + int ret;

   /* If guest state is invalid, start emulating */
   if (vmx-emulation_required)
 @@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu
 *vcpu)

   if (exit_reason  kvm_vmx_max_exit_handlers

Re: [PATCH] kvm-unit-tests: VMX: Test suite for preemption timer

2013-09-05 Thread Arthur Chunqi Li

Hi Jan, Gleb and Paolo,

It suddenly occurred to me that, if guest's PIN_PREEMPT disabled while
EXI_SAVE_PREEMPT_VALUE enabled, what will happen? The preempt value in
vmcs will not be affected, yes?

This cases fails to test in this patch.

Arthur

On Wed, Sep 4, 2013 at 11:26 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 Test cases for preemption timer in nested VMX. Two aspects are tested:
 1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL
 2. Test a relevant bug of KVM. The bug will not save preemption timer
 value if exit L2-L0 for some reason and enter L0-L2. Thus preemption
 timer will never trigger if the value is large enough.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |3 ++
  x86/vmx_tests.c |  117 
 +++
  2 files changed, 120 insertions(+)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index 28595d8..ebc8cfd 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -210,6 +210,7 @@ enum Encoding {
 GUEST_ACTV_STATE= 0x4826ul,
 GUEST_SMBASE= 0x4828ul,
 GUEST_SYSENTER_CS   = 0x482aul,
 +   PREEMPT_TIMER_VALUE = 0x482eul,

 /* 32-Bit Host State Fields */
 HOST_SYSENTER_CS= 0x4c00ul,
 @@ -331,6 +332,7 @@ enum Ctrl_exi {
 EXI_LOAD_PERF   = 1UL  12,
 EXI_INTA= 1UL  15,
 EXI_LOAD_EFER   = 1UL  21,
 +   EXI_SAVE_PREEMPT= 1UL  22,
  };

  enum Ctrl_ent {
 @@ -342,6 +344,7 @@ enum Ctrl_pin {
 PIN_EXTINT  = 1ul  0,
 PIN_NMI = 1ul  3,
 PIN_VIRT_NMI= 1ul  5,
 +   PIN_PREEMPT = 1ul  6,
  };

  enum Ctrl0 {
 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index c1b39f4..d358148 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -1,4 +1,30 @@
  #include vmx.h
 +#include msr.h
 +#include processor.h
 +
 +volatile u32 stage;
 +
 +static inline void vmcall()
 +{
 +   asm volatile(vmcall);
 +}
 +
 +static inline void set_stage(u32 s)
 +{
 +   barrier();
 +   stage = s;
 +   barrier();
 +}
 +
 +static inline u32 get_stage()
 +{
 +   u32 s;
 +
 +   barrier();
 +   s = stage;
 +   barrier();
 +   return s;
 +}

  void basic_init()
  {
 @@ -76,6 +102,95 @@ int vmenter_exit_handler()
 return VMX_TEST_VMEXIT;
  }

 +u32 preempt_scale;
 +volatile unsigned long long tsc_val;
 +volatile u32 preempt_val;
 +
 +void preemption_timer_init()
 +{
 +   u32 ctrl_pin;
 +
 +   ctrl_pin = vmcs_read(PIN_CONTROLS) | PIN_PREEMPT;
 +   ctrl_pin = ctrl_pin_rev.clr;
 +   vmcs_write(PIN_CONTROLS, ctrl_pin);
 +   preempt_val = 1000;
 +   vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
 +   preempt_scale = rdmsr(MSR_IA32_VMX_MISC)  0x1F;
 +}
 +
 +void preemption_timer_main()
 +{
 +   tsc_val = rdtsc();
 +   if (!(ctrl_pin_rev.clr  PIN_PREEMPT)) {
 +   printf(\tPreemption timer is not supported\n);
 +   return;
 +   }
 +   if (!(ctrl_exit_rev.clr  EXI_SAVE_PREEMPT))
 +   printf(\tSave preemption value is not supported\n);
 +   else {
 +   set_stage(0);
 +   vmcall();
 +   if (get_stage() == 1)
 +   vmcall();
 +   }
 +   while (1) {
 +   if (((rdtsc() - tsc_val)  preempt_scale)
 +10 * preempt_val) {
 +   report(Preemption timer, 0);
 +   break;
 +   }
 +   }
 +}
 +
 +int preemption_timer_exit_handler()
 +{
 +   u64 guest_rip;
 +   ulong reason;
 +   u32 insn_len;
 +   u32 ctrl_exit;
 +
 +   guest_rip = vmcs_read(GUEST_RIP);
 +   reason = vmcs_read(EXI_REASON)  0xff;
 +   insn_len = vmcs_read(EXI_INST_LEN);
 +   switch (reason) {
 +   case VMX_PREEMPT:
 +   if (((rdtsc() - tsc_val)  preempt_scale)  preempt_val)
 +   report(Preemption timer, 0);
 +   else
 +   report(Preemption timer, 1);
 +   return VMX_TEST_VMEXIT;
 +   case VMX_VMCALL:
 +   switch (get_stage()) {
 +   case 0:
 +   if (vmcs_read(PREEMPT_TIMER_VALUE) != preempt_val)
 +   report(Save preemption value, 0);
 +   else {
 +   set_stage(get_stage() + 1);
 +   ctrl_exit = (vmcs_read(EXI_CONTROLS) |
 +   EXI_SAVE_PREEMPT)  ctrl_exit_rev.clr;
 +   vmcs_write(EXI_CONTROLS, ctrl_exit);
 +   }
 +   break;
 +   case 1:
 +   if (vmcs_read(PREEMPT_TIMER_VALUE) = preempt_val)
 +   report(Save preemption value, 0);
 +   else

Re: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-05 Thread Arthur Chunqi Li

On Thu, Sep 5, 2013 at 5:24 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote:
 Arthur Chunqi Li wrote on 2013-09-05:
 On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z yang.z.zh...@intel.com
 wrote:
  Arthur Chunqi Li wrote on 2013-09-04:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
  This series depends on queue.
 
   arch/x86/include/uapi/asm/msr-index.h |1 +
   arch/x86/kvm/vmx.c|   51
  ++---
   2 files changed, 48 insertions(+), 4 deletions(-)
 
  diff --git a/arch/x86/include/uapi/asm/msr-index.h
  b/arch/x86/include/uapi/asm/msr-index.h
  index bb04650..b93e09a 100644
  --- a/arch/x86/include/uapi/asm/msr-index.h
  +++ b/arch/x86/include/uapi/asm/msr-index.h
  @@ -536,6 +536,7 @@
 
   /* MSR_IA32_VMX_MISC bits */
   #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 
 29)
  +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
   /* AMD-V MSRs */
 
   #define MSR_VM_CR   0xc0010114
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
  1f1da43..870caa8
  100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -2204,7 +2204,14 @@ static __init void
  nested_vmx_setup_ctls_msrs(void)  #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |  #endif
  - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
  + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
  + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
  + if (!(nested_vmx_pinbased_ctls_high 
  PIN_BASED_VMX_PREEMPTION_TIMER))
  + nested_vmx_exit_ctls_high =
  + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
  + if (!(nested_vmx_exit_ctls_high 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
  + nested_vmx_pinbased_ctls_high =
  + (~PIN_BASED_VMX_PREEMPTION_TIMER);
  The following logic is more clearly:
  if(nested_vmx_pinbased_ctls_high 
 PIN_BASED_VMX_PREEMPTION_TIMER)
  nested_vmx_exit_ctls_high |=
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
 Here I have such consideration: this logic is wrong if CPU support
 PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does
 occurs. So the codes above reads the MSR and reserves the features it
 supports, and here I just check if these two features are supported
 simultaneously.

 No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on 
 PIN_BASED_VMX_PREEMPTION_TIMER. PIN_BASED_VMX_PREEMPTION_TIMER is an 
 independent feature

 You remind that this piece of codes can write like this:
 if (!(nested_vmx_pin_based_ctls_high 
 PIN_BASED_VMX_PREEMPTION_TIMER) ||
 !(nested_vmx_exit_ctls_high 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 nested_vmx_exit_ctls_high
 =(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 nested_vmx_pinbased_ctls_high =
 (~PIN_BASED_VMX_PREEMPTION_TIMER);
 }

 This may reflect the logic I describe above that these two flags should 
 support
 simultaneously, and brings less confusion.
 
  BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's
 capability when expose those vmx features(not just preemption timer) to L1.
 The codes just above here, when setting pinbased control for nested vmx, it
 firstly rdmsr MSR_IA32_VMX_PINBASED_CTLS, then use this to mask the
 features hardware not support. So does other control fields.
 
 Yes, I saw it.

nested_vmx_exit_ctls_high |=
  (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  VM_EXIT_LOAD_IA32_EFER);
 
  @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu
  *vcpu, u64 *info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);  }
 
  +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) {
  + u64 delta_tsc_l1;
  + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
  +
  + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
  +
 MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
  + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu,
  + native_read_tsc()) - vcpu-arch.last_guest_tsc;
  + preempt_val_l1 = delta_tsc_l1  preempt_scale;
  + if (preempt_val_l2 - preempt_val_l1  0)
  + preempt_val_l2 = 0;
  + else
  + preempt_val_l2 -= preempt_val_l1;
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 preempt_val_l2); }
   /*
* The guest has exited.  See if we can fix it or if we need userspace
* assistance.
  @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu
 *vcpu)
struct vcpu_vmx *vmx

Re: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-05 Thread Arthur Chunqi Li

On Thu, Sep 5, 2013 at 7:05 PM, Zhang, Yang Z yang.z.zh...@intel.com wrote:
 Arthur Chunqi Li wrote on 2013-09-05:
  Arthur Chunqi Li wrote on 2013-09-05:
  On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z
  yang.z.zh...@intel.com
  wrote:
   Arthur Chunqi Li wrote on 2013-09-04:
   This patch contains the following two changes:
   1. Fix the bug in nested preemption timer support. If vmexit
   L2-L0 with some reasons not emulated by L1, preemption timer
   value should be save in such exits.
   2. Add support of Save VMX-preemption timer value VM-Exit
   controls to nVMX.
  
   With this patch, nested VMX preemption timer features are fully
 supported.
  
   Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
   ---
   This series depends on queue.
  
arch/x86/include/uapi/asm/msr-index.h |1 +
arch/x86/kvm/vmx.c|   51
   ++---
2 files changed, 48 insertions(+), 4 deletions(-)
  
   diff --git a/arch/x86/include/uapi/asm/msr-index.h
   b/arch/x86/include/uapi/asm/msr-index.h
   index bb04650..b93e09a 100644
   --- a/arch/x86/include/uapi/asm/msr-index.h
   +++ b/arch/x86/include/uapi/asm/msr-index.h
   @@ -536,6 +536,7 @@
  
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL
 
  29)
   +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
/* AMD-V MSRs */
  
#define MSR_VM_CR   0xc0010114
   diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
   1f1da43..870caa8
   100644
   --- a/arch/x86/kvm/vmx.c
   +++ b/arch/x86/kvm/vmx.c
   @@ -2204,7 +2204,14 @@ static __init void
   nested_vmx_setup_ctls_msrs(void)  #ifdef CONFIG_X86_64
 VM_EXIT_HOST_ADDR_SPACE_SIZE |  #endif
   - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
   + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT
 |
   + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   + if (!(nested_vmx_pinbased_ctls_high 
   PIN_BASED_VMX_PREEMPTION_TIMER))
   + nested_vmx_exit_ctls_high =
   +
 (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
   + if (!(nested_vmx_exit_ctls_high 
   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
   + nested_vmx_pinbased_ctls_high =
   + (~PIN_BASED_VMX_PREEMPTION_TIMER);
   The following logic is more clearly:
   if(nested_vmx_pinbased_ctls_high 
  PIN_BASED_VMX_PREEMPTION_TIMER)
   nested_vmx_exit_ctls_high |=
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  Here I have such consideration: this logic is wrong if CPU support
  PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this
 does
  occurs. So the codes above reads the MSR and reserves the features it
  supports, and here I just check if these two features are supported
  simultaneously.
 
  No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on
  PIN_BASED_VMX_PREEMPTION_TIMER.
 PIN_BASED_VMX_PREEMPTION_TIMER is an
  independent feature
 
  You remind that this piece of codes can write like this:
  if (!(nested_vmx_pin_based_ctls_high 
  PIN_BASED_VMX_PREEMPTION_TIMER) ||
  !(nested_vmx_exit_ctls_high 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
  nested_vmx_exit_ctls_high
  =(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
  nested_vmx_pinbased_ctls_high =
  (~PIN_BASED_VMX_PREEMPTION_TIMER);
  }
 
  This may reflect the logic I describe above that these two flags
  should support simultaneously, and brings less confusion.
  
   BTW: I don't see nested_vmx_setup_ctls_msrs() considers the
   hardware's
  capability when expose those vmx features(not just preemption timer) to 
  L1.
  The codes just above here, when setting pinbased control for nested
  vmx, it firstly rdmsr MSR_IA32_VMX_PINBASED_CTLS, then use this to
  mask the features hardware not support. So does other control fields.
  
  Yes, I saw it.
 
 nested_vmx_exit_ctls_high |=
   (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  
 VM_EXIT_LOAD_IA32_EFER);
  
   @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct
   kvm_vcpu *vcpu, u64 *info1, u64 *info2)
 *info2 = vmcs_read32(VM_EXIT_INTR_INFO);  }
  
   +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) {
   + u64 delta_tsc_l1;
   + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
   +
   + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
   +
  MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
   + preempt_val_l2 =
 vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
   + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu,
   + native_read_tsc()) -
 vcpu-arch.last_guest_tsc;
   + preempt_val_l1 = delta_tsc_l1  preempt_scale;
   + if (preempt_val_l2 - preempt_val_l1  0)
   + preempt_val_l2 = 0;
   + else
   + preempt_val_l2 -= preempt_val_l1;
   + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  preempt_val_l2); }
/*
 * The guest has exited.  See if we can fix it or if we need userspace

[PATCH v4] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-05 Thread Arthur Chunqi Li

This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2-L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of Save VMX-preemption timer value VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
ChangeLog to v3:
Move nested_adjust_preemption_timer to the latest place just before vmenter.
Some minor changes.

 arch/x86/include/uapi/asm/msr-index.h |1 +
 arch/x86/kvm/vmx.c|   49 +++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index bb04650..b93e09a 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -536,6 +536,7 @@
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
+#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
 /* AMD-V MSRs */
 
 #define MSR_VM_CR   0xc0010114
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f1da43..f364d16 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -374,6 +374,8 @@ struct nested_vmx {
 */
struct page *apic_access_page;
u64 msr_ia32_feature_control;
+   /* Set if vmexit is L2-L1 */
+   bool nested_vmx_exit;
 };
 
 #define POSTED_INTR_ON  0
@@ -2204,7 +2206,17 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   if (!(nested_vmx_pinbased_ctls_high 
+   PIN_BASED_VMX_PREEMPTION_TIMER) ||
+   !(nested_vmx_exit_ctls_high 
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
+   nested_vmx_exit_ctls_high =
+   (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
+   nested_vmx_pinbased_ctls_high =
+   (~PIN_BASED_VMX_PREEMPTION_TIMER);
+   }
nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  VM_EXIT_LOAD_IA32_EFER);
 
@@ -6707,6 +6719,24 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
+static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
+{
+   u64 delta_tsc_l1;
+   u32 preempt_val_l1, preempt_val_l2, preempt_scale;
+
+   preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
+   MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
+   preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu,
+   native_read_tsc()) - vcpu-arch.last_guest_tsc;
+   preempt_val_l1 = delta_tsc_l1  preempt_scale;
+   if (preempt_val_l2 = preempt_val_l1)
+   preempt_val_l2 = 0;
+   else
+   preempt_val_l2 -= preempt_val_l1;
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
+}
+
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -6736,9 +6766,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
vmx-nested.nested_run_pending = 0;
 
if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
+   vmx-nested.nested_vmx_exit = true;
nested_vmx_vmexit(vcpu);
return 1;
}
+   vmx-nested.nested_vmx_exit = false;
 
if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY;
@@ -7132,6 +7164,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
debugctlmsr = get_debugctlmsr();
 
vmx-__launched = vmx-loaded_vmcs-launched;
+   if (is_guest_mode(vcpu)  !(vmx-nested.nested_vmx_exit))
+   nested_adjust_preemption_timer(vcpu);
asm(
/* Store host registers */
push %% _ASM_DX ; push %% _ASM_BP ;
@@ -7518,6 +7552,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   u32 exit_control;
 
vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
@@ -7691,7 +7726,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
 * bits are further modified by vmx_set_efer() below.
 */
-   vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
+   exit_control

[PATCH v2] kvm-unit-tests: VMX: Test suite for preemption timer

2013-09-05 Thread Arthur Chunqi Li

Test cases for preemption timer in nested VMX. Two aspects are tested:
1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL
2. Test a relevant bug of KVM. The bug will not save preemption timer
value if exit L2-L0 for some reason and enter L0-L2. Thus preemption
timer will never trigger if the value is large enough.
3. Some other aspects are tested, e.g. preempt without save, preempt
when value is 0.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
ChangeLog to v1:
1. Add test of EXI_SAVE_PREEMPT enable and PIN_PREEMPT disable
2. Add test of PIN_PREEMPT enable and EXI_SAVE_PREEMPT enable/disable
3. Add test of preemption value is 0

 x86/vmx.h   |3 +
 x86/vmx_tests.c |  175 +++
 2 files changed, 178 insertions(+)

diff --git a/x86/vmx.h b/x86/vmx.h
index 28595d8..ebc8cfd 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -210,6 +210,7 @@ enum Encoding {
GUEST_ACTV_STATE= 0x4826ul,
GUEST_SMBASE= 0x4828ul,
GUEST_SYSENTER_CS   = 0x482aul,
+   PREEMPT_TIMER_VALUE = 0x482eul,
 
/* 32-Bit Host State Fields */
HOST_SYSENTER_CS= 0x4c00ul,
@@ -331,6 +332,7 @@ enum Ctrl_exi {
EXI_LOAD_PERF   = 1UL  12,
EXI_INTA= 1UL  15,
EXI_LOAD_EFER   = 1UL  21,
+   EXI_SAVE_PREEMPT= 1UL  22,
 };
 
 enum Ctrl_ent {
@@ -342,6 +344,7 @@ enum Ctrl_pin {
PIN_EXTINT  = 1ul  0,
PIN_NMI = 1ul  3,
PIN_VIRT_NMI= 1ul  5,
+   PIN_PREEMPT = 1ul  6,
 };
 
 enum Ctrl0 {
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index c1b39f4..2e32031 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,4 +1,30 @@
 #include vmx.h
+#include msr.h
+#include processor.h
+
+volatile u32 stage;
+
+static inline void vmcall()
+{
+   asm volatile(vmcall);
+}
+ 
+static inline void set_stage(u32 s)
+{
+   barrier();
+   stage = s;
+   barrier();
+}
+
+static inline u32 get_stage()
+{
+   u32 s;
+
+   barrier();
+   s = stage;
+   barrier();
+   return s;
+}
 
 void basic_init()
 {
@@ -76,6 +102,153 @@ int vmenter_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+u32 preempt_scale;
+volatile unsigned long long tsc_val;
+volatile u32 preempt_val;
+
+void preemption_timer_init()
+{
+   u32 ctrl_exit;
+
+   // Enable EXI_SAVE_PREEMPT with PIN_PREEMPT dieabled
+   ctrl_exit = (vmcs_read(EXI_CONTROLS) |
+   EXI_SAVE_PREEMPT)  ctrl_exit_rev.clr;
+   vmcs_write(EXI_CONTROLS, ctrl_exit);
+   preempt_val = 1000;
+   vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
+   set_stage(0);
+   preempt_scale = rdmsr(MSR_IA32_VMX_MISC)  0x1F;
+}
+
+void preemption_timer_main()
+{
+   int i, j;
+
+   if (!(ctrl_pin_rev.clr  PIN_PREEMPT)) {
+   printf(\tPreemption timer is not supported\n);
+   return;
+   }
+   if (!(ctrl_exit_rev.clr  EXI_SAVE_PREEMPT))
+   printf(\tSave preemption value is not supported\n);
+   else {
+   // Test EXI_SAVE_PREEMPT enable and PIN_PREEMPT disable
+   set_stage(0);
+   // Consume enough time to let L2-L0-L2 occurs
+   for(i = 0; i  10; i++)
+   for (j = 0; j  1; j++);
+   vmcall();
+   // Test PIN_PREEMPT enable and EXI_SAVE_PREEMPT enable/disable
+   set_stage(1);
+   vmcall();
+   // Test both enable
+   if (get_stage() == 2)
+   vmcall();
+   }
+   // Test the bug of reseting preempt value when L2-L0-L2
+   set_stage(3);
+   vmcall();
+   tsc_val = rdtsc();
+   while (1) {
+   if (((rdtsc() - tsc_val)  preempt_scale)
+10 * preempt_val) {
+   report(Preemption timer timeout, 0);
+   break;
+   }
+   if (get_stage() == 4)
+   break;
+   }
+   // Test preempt val is 0
+   set_stage(4);
+   report(Preemption timer, val=0, 0);
+}
+
+int preemption_timer_exit_handler()
+{
+   u64 guest_rip;
+   ulong reason;
+   u32 insn_len;
+   u32 ctrl_exit;
+   u32 ctrl_pin;
+
+   guest_rip = vmcs_read(GUEST_RIP);
+   reason = vmcs_read(EXI_REASON)  0xff;
+   insn_len = vmcs_read(EXI_INST_LEN);
+   switch (reason) {
+   case VMX_PREEMPT:
+   switch (get_stage()) {
+   case 3:
+   if (((rdtsc() - tsc_val)  preempt_scale)  
preempt_val)
+   report(Preemption timer timeout, 0);
+   else
+   report(Preemption timer timeout, 1);
+   set_stage(get_stage() + 1);
+   break;
+   case 4

[PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-04 Thread Arthur Chunqi Li

This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2-L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of Save VMX-preemption timer value VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
This series depends on queue.

 arch/x86/include/uapi/asm/msr-index.h |1 +
 arch/x86/kvm/vmx.c|   51 ++---
 2 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index bb04650..b93e09a 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -536,6 +536,7 @@
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
+#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
 /* AMD-V MSRs */
 
 #define MSR_VM_CR   0xc0010114
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f1da43..870caa8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER))
+   nested_vmx_exit_ctls_high =
+   (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
+   if (!(nested_vmx_exit_ctls_high  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
+   nested_vmx_pinbased_ctls_high =
+   (~PIN_BASED_VMX_PREEMPTION_TIMER);
nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  VM_EXIT_LOAD_IA32_EFER);
 
@@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
+static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
+{
+   u64 delta_tsc_l1;
+   u32 preempt_val_l1, preempt_val_l2, preempt_scale;
+
+   preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
+   MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
+   preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu,
+   native_read_tsc()) - vcpu-arch.last_guest_tsc;
+   preempt_val_l1 = delta_tsc_l1  preempt_scale;
+   if (preempt_val_l2 - preempt_val_l1  0)
+   preempt_val_l2 = 0;
+   else
+   preempt_val_l2 -= preempt_val_l1;
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
+}
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exit_reason = vmx-exit_reason;
u32 vectoring_info = vmx-idt_vectoring_info;
+   int ret;
 
/* If guest state is invalid, start emulating */
if (vmx-emulation_required)
@@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 
if (exit_reason  kvm_vmx_max_exit_handlers
 kvm_vmx_exit_handlers[exit_reason])
-   return kvm_vmx_exit_handlers[exit_reason](vcpu);
+   ret = kvm_vmx_exit_handlers[exit_reason](vcpu);
else {
vcpu-run-exit_reason = KVM_EXIT_UNKNOWN;
vcpu-run-hw.hardware_exit_reason = exit_reason;
+   ret = 0;
}
-   return 0;
+   if (is_guest_mode(vcpu))
+   nested_adjust_preemption_timer(vcpu);
+   return ret;
 }
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -7518,6 +7546,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   u32 exit_control;
 
vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
@@ -7691,7 +7720,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
 * bits are further modified by vmx_set_efer() below.
 */
-   vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
+   exit_control = vmcs_config.vmexit_ctrl;
+   if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
+   exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   vmcs_write32(VM_EXIT_CONTROLS, exit_control);
 
/* vmcs12's VM_ENTRY_LOAD_IA32_EFER

[PATCH] kvm-unit-tests: VMX: Test suite for preemption timer

2013-09-04 Thread Arthur Chunqi Li

Test cases for preemption timer in nested VMX. Two aspects are tested:
1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL
2. Test a relevant bug of KVM. The bug will not save preemption timer
value if exit L2-L0 for some reason and enter L0-L2. Thus preemption
timer will never trigger if the value is large enough.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.h   |3 ++
 x86/vmx_tests.c |  117 +++
 2 files changed, 120 insertions(+)

diff --git a/x86/vmx.h b/x86/vmx.h
index 28595d8..ebc8cfd 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -210,6 +210,7 @@ enum Encoding {
GUEST_ACTV_STATE= 0x4826ul,
GUEST_SMBASE= 0x4828ul,
GUEST_SYSENTER_CS   = 0x482aul,
+   PREEMPT_TIMER_VALUE = 0x482eul,
 
/* 32-Bit Host State Fields */
HOST_SYSENTER_CS= 0x4c00ul,
@@ -331,6 +332,7 @@ enum Ctrl_exi {
EXI_LOAD_PERF   = 1UL  12,
EXI_INTA= 1UL  15,
EXI_LOAD_EFER   = 1UL  21,
+   EXI_SAVE_PREEMPT= 1UL  22,
 };
 
 enum Ctrl_ent {
@@ -342,6 +344,7 @@ enum Ctrl_pin {
PIN_EXTINT  = 1ul  0,
PIN_NMI = 1ul  3,
PIN_VIRT_NMI= 1ul  5,
+   PIN_PREEMPT = 1ul  6,
 };
 
 enum Ctrl0 {
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index c1b39f4..d358148 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,4 +1,30 @@
 #include vmx.h
+#include msr.h
+#include processor.h
+
+volatile u32 stage;
+
+static inline void vmcall()
+{
+   asm volatile(vmcall);
+}
+ 
+static inline void set_stage(u32 s)
+{
+   barrier();
+   stage = s;
+   barrier();
+}
+
+static inline u32 get_stage()
+{
+   u32 s;
+
+   barrier();
+   s = stage;
+   barrier();
+   return s;
+}
 
 void basic_init()
 {
@@ -76,6 +102,95 @@ int vmenter_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+u32 preempt_scale;
+volatile unsigned long long tsc_val;
+volatile u32 preempt_val;
+
+void preemption_timer_init()
+{
+   u32 ctrl_pin;
+
+   ctrl_pin = vmcs_read(PIN_CONTROLS) | PIN_PREEMPT;
+   ctrl_pin = ctrl_pin_rev.clr;
+   vmcs_write(PIN_CONTROLS, ctrl_pin);
+   preempt_val = 1000;
+   vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
+   preempt_scale = rdmsr(MSR_IA32_VMX_MISC)  0x1F;
+}
+
+void preemption_timer_main()
+{
+   tsc_val = rdtsc();
+   if (!(ctrl_pin_rev.clr  PIN_PREEMPT)) {
+   printf(\tPreemption timer is not supported\n);
+   return;
+   }
+   if (!(ctrl_exit_rev.clr  EXI_SAVE_PREEMPT))
+   printf(\tSave preemption value is not supported\n);
+   else {
+   set_stage(0);
+   vmcall();
+   if (get_stage() == 1)
+   vmcall();
+   }
+   while (1) {
+   if (((rdtsc() - tsc_val)  preempt_scale)
+10 * preempt_val) {
+   report(Preemption timer, 0);
+   break;
+   }
+   }
+}
+
+int preemption_timer_exit_handler()
+{
+   u64 guest_rip;
+   ulong reason;
+   u32 insn_len;
+   u32 ctrl_exit;
+
+   guest_rip = vmcs_read(GUEST_RIP);
+   reason = vmcs_read(EXI_REASON)  0xff;
+   insn_len = vmcs_read(EXI_INST_LEN);
+   switch (reason) {
+   case VMX_PREEMPT:
+   if (((rdtsc() - tsc_val)  preempt_scale)  preempt_val)
+   report(Preemption timer, 0);
+   else
+   report(Preemption timer, 1);
+   return VMX_TEST_VMEXIT;
+   case VMX_VMCALL:
+   switch (get_stage()) {
+   case 0:
+   if (vmcs_read(PREEMPT_TIMER_VALUE) != preempt_val)
+   report(Save preemption value, 0);
+   else {
+   set_stage(get_stage() + 1);
+   ctrl_exit = (vmcs_read(EXI_CONTROLS) |
+   EXI_SAVE_PREEMPT)  ctrl_exit_rev.clr;
+   vmcs_write(EXI_CONTROLS, ctrl_exit);
+   }
+   break;
+   case 1:
+   if (vmcs_read(PREEMPT_TIMER_VALUE) = preempt_val)
+   report(Save preemption value, 0);
+   else
+   report(Save preemption value, 1);
+   break;
+   default:
+   printf(Invalid stage.\n);
+   print_vmexit_info();
+   return VMX_TEST_VMEXIT;
+   }
+   vmcs_write(GUEST_RIP, guest_rip + insn_len);
+   return VMX_TEST_RESUME;
+   default:
+   printf(Unknown exit reason, %d\n, reason);
+   print_vmexit_info

Re: [PATCH] kvm-unit-tests: VMX: Add the framework of EPT

2013-09-04 Thread Arthur Chunqi Li

Hi Xiao Guangrong, Jun Nakajima, Yang Zhang, Gleb and Paolo,

If you have any ideas of how and which aspects should nested EPT be
tested, please tell me and I will write relevant test cases. Besides,
I'm so happy if you can help me review this patch or propose other
suggestions.

Thanks very mush,
Arthur

On Mon, Sep 2, 2013 at 5:38 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 There must have some minor revisions to be done in this patch, so this
 is mainly a RFC mail.

 Besides, I'm not quite clear what we should test in nested EPT
 modules, and I bet writers of nested EPT must have ideas to continue
 and refine this testing part. Any suggestions of which part and how to
 test nested EPT is welcome.

 Please help me CC EPT-related guys if anyone knows.

 Thanks,
 Arthur

 On Mon, Sep 2, 2013 at 5:26 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 Add a framework of EPT in nested VMX testing, including a set of
 functions to construct and read EPT paging structures and a simple
 read/write test of EPT remapping from guest to host.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.c   |  132 --
  x86/vmx.h   |   76 +++
  x86/vmx_tests.c |  156 
 +++
  3 files changed, 360 insertions(+), 4 deletions(-)

 diff --git a/x86/vmx.c b/x86/vmx.c
 index ca36d35..a156b71 100644
 --- a/x86/vmx.c
 +++ b/x86/vmx.c
 @@ -143,6 +143,132 @@ asm(
call hypercall\n\t
  );

 +/* EPT paging structure related functions */
 +/* install_ept_entry : Install a page to a given level in EPT
 +   @pml4 : addr of pml4 table
 +   @pte_level : level of PTE to set
 +   @guest_addr : physical address of guest
 +   @pte : pte value to set
 +   @pt_page : address of page table, NULL for a new page
 + */
 +void install_ept_entry(unsigned long *pml4,
 +   int pte_level,
 +   unsigned long guest_addr,
 +   unsigned long pte,
 +   unsigned long *pt_page)
 +{
 +   int level;
 +   unsigned long *pt = pml4;
 +   unsigned offset;
 +
 +   for (level = EPT_PAGE_LEVEL; level  pte_level; --level) {
 +   offset = (guest_addr  ((level-1) * EPT_PGDIR_WIDTH + 12))
 +EPT_PGDIR_MASK;
 +   if (!(pt[offset]  (EPT_RA | EPT_WA | EPT_EA))) {
 +   unsigned long *new_pt = pt_page;
 +   if (!new_pt)
 +   new_pt = alloc_page();
 +   else
 +   pt_page = 0;
 +   memset(new_pt, 0, PAGE_SIZE);
 +   pt[offset] = virt_to_phys(new_pt)
 +   | EPT_RA | EPT_WA | EPT_EA;
 +   }
 +   pt = phys_to_virt(pt[offset]  0xff000ull);
 +   }
 +   offset = ((unsigned long)guest_addr  ((level-1) *
 +   EPT_PGDIR_WIDTH + 12))  EPT_PGDIR_MASK;
 +   pt[offset] = pte;
 +}
 +
 +/* Map a page, @perm is the permission of the page */
 +void install_ept(unsigned long *pml4,
 +   unsigned long phys,
 +   unsigned long guest_addr,
 +   u64 perm)
 +{
 +   install_ept_entry(pml4, 1, guest_addr, (phys  PAGE_MASK) | perm, 0);
 +}
 +
 +/* Map a 1G-size page */
 +void install_1g_ept(unsigned long *pml4,
 +   unsigned long phys,
 +   unsigned long guest_addr,
 +   u64 perm)
 +{
 +   install_ept_entry(pml4, 3, guest_addr,
 +   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
 +}
 +
 +/* Map a 2M-size page */
 +void install_2m_ept(unsigned long *pml4,
 +   unsigned long phys,
 +   unsigned long guest_addr,
 +   u64 perm)
 +{
 +   install_ept_entry(pml4, 2, guest_addr,
 +   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
 +}
 +
 +/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging 
 structure.
 +   @start : start address of guest page
 +   @len : length of address to be mapped
 +   @map_1g : whether 1G page map is used
 +   @map_2m : whether 2M page map is used
 +   @perm : permission for every page
 + */
 +int setup_ept_range(unsigned long *pml4, unsigned long start,
 +   unsigned long len, int map_1g, int map_2m, u64 perm)
 +{
 +   u64 phys = start;
 +   u64 max = (u64)len + (u64)start;
 +
 +   if (map_1g) {
 +   while (phys + PAGE_SIZE_1G = max) {
 +   install_1g_ept(pml4, phys, phys, perm);
 +   phys += PAGE_SIZE_1G;
 +   }
 +   }
 +   if (map_2m) {
 +   while (phys + PAGE_SIZE_2M = max) {
 +   install_2m_ept(pml4, phys, phys, perm);
 +   phys

Re: Corner cases of I/O bitmap

2013-09-03 Thread Arthur Chunqi Li

On Tue, Sep 3, 2013 at 7:19 PM, Gleb Natapov g...@redhat.com wrote:
 On Mon, Aug 12, 2013 at 08:35:57PM +0800, Arthur Chunqi Li wrote:
 Hi Gleb and Paolo,
 There are some corner cases when testing I/O bitmaps, and I don't know
 the exact action of HW.

 A little bit late but...
A little earlier mail, but you are warming up quickly, maybe it's a
tough time in the past week ;)

 1. If we set bit of 0x4000 in bitmap and call inl(0x3) or
 inl(0x4000) in guest, what will get of exit information?

 Spec says;
 execution of an I/O instruction causes a VM exit if any bit in the I/O
 bitmaps corresponding to a port it accesses is 1. Note any here. The
 exit will have address that instruction used, otherwise how can we be
 able to emulate it properly.

 2. What will we get when calling inl(0x) in guest with/without
 “unconditional I/O exiting” VM-execution control and “use I/O bitmaps”
 VM-execution control?
 In other words are you asking what happens if you do inl(0x) on real
 HW?

 The result of an attempt to address beyond the I/O address space limit of
 H is implementation-specific


 I test the two cases in nested env. For the first one, I got normal
 exit if any of the port accessed is masked in bitmap. For the second,
 it will acts the same as other ports. And the SDM says If an I/O
 operation “wraps around” the 16-bit I/O-port space (accesses ports
 H and H), the I/O instruction causes a VM exit. I cannot find
 the exact reaction of this case.
 What do you mean by exact reaction?
As to my understanding, any wrap around access to 0x will cause
VM exit even though mask of 0x is not set, but this is only my
guess. I cannot get what inl(0x) will result described in SDM. But
as what you said above, we do not need to test inl(0x) because we
are not expected to get a determined result.

Arthur


 Do you have any ideas about these?

 Arthur

 --
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Corner cases of I/O bitmap

2013-09-03 Thread Arthur Chunqi Li

On Tue, Sep 3, 2013 at 7:48 PM, Gleb Natapov g...@redhat.com wrote:
 On Tue, Sep 03, 2013 at 07:45:47PM +0800, Arthur Chunqi Li wrote:
 On Tue, Sep 3, 2013 at 7:19 PM, Gleb Natapov g...@redhat.com wrote:
  On Mon, Aug 12, 2013 at 08:35:57PM +0800, Arthur Chunqi Li wrote:
  Hi Gleb and Paolo,
  There are some corner cases when testing I/O bitmaps, and I don't know
  the exact action of HW.
 
  A little bit late but...
 A little earlier mail, but you are warming up quickly, maybe it's a
 tough time in the past week ;)
 
  1. If we set bit of 0x4000 in bitmap and call inl(0x3) or
  inl(0x4000) in guest, what will get of exit information?
 
  Spec says;
  execution of an I/O instruction causes a VM exit if any bit in the I/O
  bitmaps corresponding to a port it accesses is 1. Note any here. The
  exit will have address that instruction used, otherwise how can we be
  able to emulate it properly.
 
  2. What will we get when calling inl(0x) in guest with/without
  “unconditional I/O exiting” VM-execution control and “use I/O bitmaps”
  VM-execution control?
  In other words are you asking what happens if you do inl(0x) on real
  HW?
 
  The result of an attempt to address beyond the I/O address space limit of
  H is implementation-specific
 
 
  I test the two cases in nested env. For the first one, I got normal
  exit if any of the port accessed is masked in bitmap. For the second,
  it will acts the same as other ports. And the SDM says If an I/O
  operation “wraps around” the 16-bit I/O-port space (accesses ports
  H and H), the I/O instruction causes a VM exit. I cannot find
  the exact reaction of this case.
  What do you mean by exact reaction?
 As to my understanding, any wrap around access to 0x will cause
 VM exit even though mask of 0x is not set, but this is only my
 guess. I cannot get what inl(0x) will result described in SDM. But
 as what you said above, we do not need to test inl(0x) because we
 are not expected to get a determined result.

 Implementation-specific behaviour is only for what happens on real HW.
 In non root operation spec says VM exit should happen and we should test
 for that.
I have read the patch I have committed, and found that I have tested
inl(0x).
Does access to 0x0 also cause VM exit in any cases of non-root operation?

Arthur

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Information of EPT violation VMEXIT

2013-09-03 Thread Arthur Chunqi Li

Hi there,

When I testing EPT violation VMEXIT, I get some confusions in bit 78
in Exit Qualification for EPT Violations (Table 27-7 in SDM).

Bit 7 means Set if the guest linear-address field is valid. In which
occasion will bit 7 clear? I don't quite understand the following
statements in SDM The guest linear-address field is valid for all EPT
violations except those resulting from an attempt to load the guest
PDPTEs as part of the execution of the MOV CR instruction.

Bit 8 means the causes of EPT violation. But I don't understand what
it is means when set and clear. I always get the exit qualification
with this bit set, how to design a violation with this bit clear?

Thanks,
Arthur
-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] kvm-unit-tests: Add a series of test cases

2013-09-02 Thread Arthur Chunqi Li

Hi Gleb, Paolo and Jan,

Would you please review this series of codes when you can spare time?
Jan has review it and, of course, further suggestions are welcomed.

Arthur

On Thu, Aug 15, 2013 at 7:45 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 Add a series of test cases for nested VMX in kvm-unit-tests.

 Arthur Chunqi Li (4):
   kvm-unit-tests: VMX: Add test cases for PAT and EFER
   kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing
   kvm-unit-tests: VMX: Add test cases for I/O bitmaps
   kvm-unit-tests: VMX: Add test cases for instruction  interception

  lib/x86/vm.h|4 +
  x86/vmx.c   |3 +-
  x86/vmx.h   |   20 +-
  x86/vmx_tests.c |  714 
 +++
  4 files changed, 736 insertions(+), 5 deletions(-)

 --
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm-unit-tests: VMX: Add the framework of EPT

2013-09-02 Thread Arthur Chunqi Li

Add a framework of EPT in nested VMX testing, including a set of
functions to construct and read EPT paging structures and a simple
read/write test of EPT remapping from guest to host.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.c   |  132 --
 x86/vmx.h   |   76 +++
 x86/vmx_tests.c |  156 +++
 3 files changed, 360 insertions(+), 4 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index ca36d35..a156b71 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -143,6 +143,132 @@ asm(
   call hypercall\n\t
 );
 
+/* EPT paging structure related functions */
+/* install_ept_entry : Install a page to a given level in EPT
+   @pml4 : addr of pml4 table
+   @pte_level : level of PTE to set
+   @guest_addr : physical address of guest
+   @pte : pte value to set
+   @pt_page : address of page table, NULL for a new page
+ */
+void install_ept_entry(unsigned long *pml4,
+   int pte_level,
+   unsigned long guest_addr,
+   unsigned long pte,
+   unsigned long *pt_page)
+{
+   int level;
+   unsigned long *pt = pml4;
+   unsigned offset;
+
+   for (level = EPT_PAGE_LEVEL; level  pte_level; --level) {
+   offset = (guest_addr  ((level-1) * EPT_PGDIR_WIDTH + 12))
+EPT_PGDIR_MASK;
+   if (!(pt[offset]  (EPT_RA | EPT_WA | EPT_EA))) {
+   unsigned long *new_pt = pt_page;
+   if (!new_pt)
+   new_pt = alloc_page();
+   else
+   pt_page = 0;
+   memset(new_pt, 0, PAGE_SIZE);
+   pt[offset] = virt_to_phys(new_pt)
+   | EPT_RA | EPT_WA | EPT_EA;
+   }
+   pt = phys_to_virt(pt[offset]  0xff000ull);
+   }
+   offset = ((unsigned long)guest_addr  ((level-1) *
+   EPT_PGDIR_WIDTH + 12))  EPT_PGDIR_MASK;
+   pt[offset] = pte;
+}
+
+/* Map a page, @perm is the permission of the page */
+void install_ept(unsigned long *pml4,
+   unsigned long phys,
+   unsigned long guest_addr,
+   u64 perm)
+{
+   install_ept_entry(pml4, 1, guest_addr, (phys  PAGE_MASK) | perm, 0);
+}
+
+/* Map a 1G-size page */
+void install_1g_ept(unsigned long *pml4,
+   unsigned long phys,
+   unsigned long guest_addr,
+   u64 perm)
+{
+   install_ept_entry(pml4, 3, guest_addr,
+   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
+}
+
+/* Map a 2M-size page */
+void install_2m_ept(unsigned long *pml4,
+   unsigned long phys,
+   unsigned long guest_addr,
+   u64 perm)
+{
+   install_ept_entry(pml4, 2, guest_addr,
+   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
+}
+
+/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging structure.
+   @start : start address of guest page
+   @len : length of address to be mapped
+   @map_1g : whether 1G page map is used
+   @map_2m : whether 2M page map is used
+   @perm : permission for every page
+ */
+int setup_ept_range(unsigned long *pml4, unsigned long start,
+   unsigned long len, int map_1g, int map_2m, u64 perm)
+{
+   u64 phys = start;
+   u64 max = (u64)len + (u64)start;
+
+   if (map_1g) {
+   while (phys + PAGE_SIZE_1G = max) {
+   install_1g_ept(pml4, phys, phys, perm);
+   phys += PAGE_SIZE_1G;
+   }
+   }
+   if (map_2m) {
+   while (phys + PAGE_SIZE_2M = max) {
+   install_2m_ept(pml4, phys, phys, perm);
+   phys += PAGE_SIZE_2M;
+   }
+   }
+   while (phys + PAGE_SIZE = max) {
+   install_ept(pml4, phys, phys, perm);
+   phys += PAGE_SIZE;
+   }
+   return 0;
+}
+
+/* get_ept_pte : Get the PTE of a given level in EPT,
+@level == 1 means get the latest level*/
+unsigned long get_ept_pte(unsigned long *pml4,
+   unsigned long guest_addr, int level)
+{
+   int l;
+   unsigned long *pt = pml4, pte;
+   unsigned offset;
+
+   for (l = EPT_PAGE_LEVEL; l  1; --l) {
+   offset = (guest_addr  (((l-1) * EPT_PGDIR_WIDTH) + 12))
+EPT_PGDIR_MASK;
+   pte = pt[offset];
+   if (!(pte  (EPT_RA | EPT_WA | EPT_EA)))
+   return 0;
+   if (l == level)
+   return pte;
+   if (l  4  (pte  EPT_LARGE_PAGE))
+   return pte;
+   pt = (unsigned long

Re: [PATCH] kvm-unit-tests: VMX: Add the framework of EPT

2013-09-02 Thread Arthur Chunqi Li

There must have some minor revisions to be done in this patch, so this
is mainly a RFC mail.

Besides, I'm not quite clear what we should test in nested EPT
modules, and I bet writers of nested EPT must have ideas to continue
and refine this testing part. Any suggestions of which part and how to
test nested EPT is welcome.

Please help me CC EPT-related guys if anyone knows.

Thanks,
Arthur

On Mon, Sep 2, 2013 at 5:26 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 Add a framework of EPT in nested VMX testing, including a set of
 functions to construct and read EPT paging structures and a simple
 read/write test of EPT remapping from guest to host.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.c   |  132 --
  x86/vmx.h   |   76 +++
  x86/vmx_tests.c |  156 
 +++
  3 files changed, 360 insertions(+), 4 deletions(-)

 diff --git a/x86/vmx.c b/x86/vmx.c
 index ca36d35..a156b71 100644
 --- a/x86/vmx.c
 +++ b/x86/vmx.c
 @@ -143,6 +143,132 @@ asm(
call hypercall\n\t
  );

 +/* EPT paging structure related functions */
 +/* install_ept_entry : Install a page to a given level in EPT
 +   @pml4 : addr of pml4 table
 +   @pte_level : level of PTE to set
 +   @guest_addr : physical address of guest
 +   @pte : pte value to set
 +   @pt_page : address of page table, NULL for a new page
 + */
 +void install_ept_entry(unsigned long *pml4,
 +   int pte_level,
 +   unsigned long guest_addr,
 +   unsigned long pte,
 +   unsigned long *pt_page)
 +{
 +   int level;
 +   unsigned long *pt = pml4;
 +   unsigned offset;
 +
 +   for (level = EPT_PAGE_LEVEL; level  pte_level; --level) {
 +   offset = (guest_addr  ((level-1) * EPT_PGDIR_WIDTH + 12))
 +EPT_PGDIR_MASK;
 +   if (!(pt[offset]  (EPT_RA | EPT_WA | EPT_EA))) {
 +   unsigned long *new_pt = pt_page;
 +   if (!new_pt)
 +   new_pt = alloc_page();
 +   else
 +   pt_page = 0;
 +   memset(new_pt, 0, PAGE_SIZE);
 +   pt[offset] = virt_to_phys(new_pt)
 +   | EPT_RA | EPT_WA | EPT_EA;
 +   }
 +   pt = phys_to_virt(pt[offset]  0xff000ull);
 +   }
 +   offset = ((unsigned long)guest_addr  ((level-1) *
 +   EPT_PGDIR_WIDTH + 12))  EPT_PGDIR_MASK;
 +   pt[offset] = pte;
 +}
 +
 +/* Map a page, @perm is the permission of the page */
 +void install_ept(unsigned long *pml4,
 +   unsigned long phys,
 +   unsigned long guest_addr,
 +   u64 perm)
 +{
 +   install_ept_entry(pml4, 1, guest_addr, (phys  PAGE_MASK) | perm, 0);
 +}
 +
 +/* Map a 1G-size page */
 +void install_1g_ept(unsigned long *pml4,
 +   unsigned long phys,
 +   unsigned long guest_addr,
 +   u64 perm)
 +{
 +   install_ept_entry(pml4, 3, guest_addr,
 +   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
 +}
 +
 +/* Map a 2M-size page */
 +void install_2m_ept(unsigned long *pml4,
 +   unsigned long phys,
 +   unsigned long guest_addr,
 +   u64 perm)
 +{
 +   install_ept_entry(pml4, 2, guest_addr,
 +   (phys  PAGE_MASK) | perm | EPT_LARGE_PAGE, 0);
 +}
 +
 +/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging 
 structure.
 +   @start : start address of guest page
 +   @len : length of address to be mapped
 +   @map_1g : whether 1G page map is used
 +   @map_2m : whether 2M page map is used
 +   @perm : permission for every page
 + */
 +int setup_ept_range(unsigned long *pml4, unsigned long start,
 +   unsigned long len, int map_1g, int map_2m, u64 perm)
 +{
 +   u64 phys = start;
 +   u64 max = (u64)len + (u64)start;
 +
 +   if (map_1g) {
 +   while (phys + PAGE_SIZE_1G = max) {
 +   install_1g_ept(pml4, phys, phys, perm);
 +   phys += PAGE_SIZE_1G;
 +   }
 +   }
 +   if (map_2m) {
 +   while (phys + PAGE_SIZE_2M = max) {
 +   install_2m_ept(pml4, phys, phys, perm);
 +   phys += PAGE_SIZE_2M;
 +   }
 +   }
 +   while (phys + PAGE_SIZE = max) {
 +   install_ept(pml4, phys, phys, perm);
 +   phys += PAGE_SIZE;
 +   }
 +   return 0;
 +}
 +
 +/* get_ept_pte : Get the PTE of a given level in EPT,
 +@level == 1 means get the latest level*/
 +unsigned long get_ept_pte(unsigned long *pml4,
 +   unsigned

Some questions about nested EPT

2013-08-30 Thread Arthur Chunqi Li

Hi there,

When I test nested EPT (enable EPT of L2-L1 address translation), it
occurred some questions when query IA32_VMX_EPT_VPID_CAP.

1. It show that bit 16 and 17 (support for 1G and 2M page) are
disabled in nested IA32_VMX_EPT_VPID_CAP. Why nested EPT fails to
support these? Are there any difficulties?

2. Can the bit 6 (support for a page-walk length of 4) of
IA32_VMX_EPT_VPID_CAP is 0? That is to say if I can design a paging
structure 4 or 4 levels?

Cause I don't know who is the original author of nested EPT, I send
this mail to the whole list. If anyone knows please tell me and CC the
authors for more detailed discussion.

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-26 Thread Arthur Chunqi Li

On Mon, Aug 26, 2013 at 3:23 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 17:26, Arthur Chunqi Li wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   49 -
  1 file changed, 44 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..6aa320e 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER))
 + nested_vmx_exit_ctls_high =
 + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + if (!(nested_vmx_exit_ctls_high  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
 + nested_vmx_pinbased_ctls_high =
 + (~PIN_BASED_VMX_PREEMPTION_TIMER);
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
 u64 *info1, u64 *info2)
   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
  }

 +static void nested_fix_preempt(struct kvm_vcpu *vcpu)

 nested_adjust_preemption_timer - just preempt can be misleading.

 +{
 + u64 delta_guest_tsc;
 + u32 preempt_val, preempt_bit, delta_preempt_val;
 +
 + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC)  0x1F;

 This is rather preemption_timer_scale. And if there is no symbolic value
 for the bitmask, please introduce one.

 + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu,
 + native_read_tsc()) - vcpu-arch.last_guest_tsc;
 + delta_preempt_val = delta_guest_tsc  preempt_bit;
 + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + if (preempt_val - delta_preempt_val  0)
 + preempt_val = 0;
 + else
 + preempt_val -= delta_preempt_val;
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val);

 The rest unfortunately wrong. It has to be split into two parts: Part
 one, the calculation of L1's TSC value and its storing in nested_vmx,
 has to be done on vmexit. Part two, reading the current TSC, calculating
 the time spent in L0 and converting it into L1 TSC time, this has to be
 done right before vmentry of L2.
As what we discussed yesterday, the calculation of L1's TSC value is
not saved in nested_vmx, however, to avoid adding codes to the hot
patch of vmexit. Instead, we use vcpu-arch.last_guest_tsc as the
value stored on vmexit (which has been done already). And the value of
part two is calculated in nested_fix_preempt() above (see variant
delta_guest_tsc, which stores the consumed TSC value in L0). Since
vmx_handle_exit is the last function called in vmexit path, I think
it's OK to put part two here.

 Arthur, please make sure that your test case detects the current
 breakage of preemption timer emulation properly, both /wrt to missing
 save/restore and also regarding missing L0 time compensation, and then
 check that your KVM patch fixes it based on the unit test results.
OK, I will commit a patch of kvm-unit-tests to test these changes.

Arthur

 Jan

 +}
  /*
   * The guest has exited.  See if we can fix it or if we need userspace
   * assistance.
 @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
   else
   vmx-nested.nested_run_pending = 0;

 - if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
 - nested_vmx_vmexit(vcpu);
 - return 1;
 + if (is_guest_mode(vcpu)) {
 + if (nested_vmx_exit_handled(vcpu)) {
 + nested_vmx_vmexit(vcpu);
 + return 1;
 + } else
 + nested_fix_preempt(vcpu);
   }

   if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
 @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
  {
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 exec_control;
 + u32 exit_control;

   vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
   vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
 @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   (vmcs_config.pin_based_exec_ctrl |
vmcs12-pin_based_vm_exec_control));

 - if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 + if (vmcs12-vm_exit_controls  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 + vmcs12-vmx_preemption_timer_value);
 + }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.


   /*
* Whether page-faults are trapped is determined by a combination of
 @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + else
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);

 Let's prepare the value for VM_EXIT_CONTROLS in a local variable first,
 then write it to the vmcs.


   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
 launch)
   }

   /*
 +  * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support
 +  * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
 +  */
 + if ((vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) 
 + !(nested_vmx_exit_ctls_high  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 + return 1;
 + }

 Nope, the guest is free to run the preemption timer without saving on
 exits. It may have a valid use case for this, e.g. that it will always
 reprogram it on entry.
Here !(nested_vmx_exit_ctls_high 
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware
support save preemption timer feature, which means if L2 supports
pinbased vmx preemption timer, host must support save preemption
timer feature. Though nested_vmx_exit_ctls_* is used for nested env,
but it can also used to reflect the host's feature. Here is what I
discuss with you yesterday, and we can also get the feature via
rdmsr here to avoid the confusion.

Arthur

 +
 + /*
* We're finally done with prerequisite checking, and can start with
* the nested entry.
*/


 Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 09:24, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.
Besides, we need to test that in the absence of
PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen
according to Intel SDM.


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   (vmcs_config.pin_based_exec_ctrl |
vmcs12-pin_based_vm_exec_control));

 - if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER)
 - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 + if (vmcs12-vm_exit_controls  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 + vmcs12-vmx_preemption_timer_value);
 + }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.


   /*
* Whether page-faults are trapped is determined by a combination of
 @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER)
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + else
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);

 Let's prepare the value for VM_EXIT_CONTROLS in a local variable first,
 then write it to the vmcs.


   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, 
 bool launch)
   }

   /*
 +  * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support
 +  * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
 +  */
 + if ((vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) 
 + !(nested_vmx_exit_ctls_high  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_failValid(vcpu, 
 VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 + return 1;
 + }

 Nope, the guest is free to run the preemption timer without saving on
 exits. It may have a valid use case for this, e.g. that it will always
 reprogram it on entry.
 Here !(nested_vmx_exit_ctls_high 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware
 support save preemption timer feature, which means if L2 supports
 pinbased vmx preemption timer, host must support save preemption
 timer feature.

 Sorry, parsed the code incorrectly.

 Though nested_vmx_exit_ctls_* is used for nested env,
 but it can also used to reflect the host's feature. Here is what I
 discuss with you yesterday, and we can also get the feature via
 rdmsr here to avoid the confusion.

 Yes. The point is that we

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

On Sun, Aug 25, 2013 at 3:37 PM, Abel Gordon ab...@il.ibm.com wrote:

 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.

  With this patch, nested VMX preemption timer features are fully
  supported.

  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---

  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));

  -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.

 I think something else maybe be missing here: assuming L0 handles exits
 for L2 without involving L1 (e.g. external interrupts or ept violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is not
 aware of these exits and from L1 perspective L2 was running on the CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.
My solution is setting save preemption value feature of L2 if L2
sets vmx preemption timer feature, thus external interrupts (or
others) will save the exact value in L2's vmcs, and the resume of L2
will load the value in L2's vmcs. Thus cycles of handling these vmexit
in L0 will not affect L2's preemption value.

Arthur

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

On Sun, Aug 25, 2013 at 3:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 09:37, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 09:24, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.
 Besides, we need to test that in the absence of
 PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen
 according to Intel SDM.

 If the SDM guarantees this for us, we don't need such a safety measure.
 Otherwise, it should be added, yes.
The SDM has such description (see 26.2.1.2):

If “activate VMX-preemption timer” VM-execution control is 0, the
“save VMX-preemption timer value” VM-exit
control must also be 0.

It doesn't tell us if these two flags are consistent when getting them
from related MSR (IA32_VMX_PINBASED_CTLS and IA32_VMX_EXIT_CTLS). So I
think the check is needed here.

Arthur

 Jan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote:

 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:

  From: Jan Kiszka jan.kis...@web.de
  To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
  Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
  Date: 25/08/2013 09:44 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org

  On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.

  With this patch, nested VMX preemption timer features are fully
  supported.

  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---

  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
  *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));

  -   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }

  This is not correct. We still need to set the vmcs to
  vmx_preemption_timer_value. The difference is that, on exit from L2,
  vmx_preemption_timer_value has to be updated according to the saved
  hardware state. The corresponding code is missing in your patch so
 far.

  I think something else maybe be missing here: assuming L0 handles exits
  for L2 without involving L1 (e.g. external interrupts or ept
 violations),
  then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
  aware of these exits and from L1 perspective L2 was running on the CPU.
  That means that we may need to reduce these cycles spent at
  L0 from the preemtion timer or emulate a preemption timer exit to
  force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.
Sorry, I previously misunderstand your comments. But why should we
need to exclude cycles in L0 from L2 preemption value? These cycles
are not spent by L2 and it should not be on L2.

Arthur

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote:

 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:

  kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

  From: Jan Kiszka jan.kis...@web.de
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
  pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
  Date: 25/08/2013 10:43 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org

  On 2013-08-25 09:37, Abel Gordon wrote:

  From: Jan Kiszka jan.kis...@web.de
  To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
  Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
  Date: 25/08/2013 09:44 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org

  On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit
 controls
  to nVMX.

  With this patch, nested VMX preemption timer features are fully
  supported.

  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---

  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
  *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));

  -   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }

  This is not correct. We still need to set the vmcs to
  vmx_preemption_timer_value. The difference is that, on exit from L2,
  vmx_preemption_timer_value has to be updated according to the saved
  hardware state. The corresponding code is missing in your patch so
  far.

  I think something else maybe be missing here: assuming L0 handles
 exits
  for L2 without involving L1 (e.g. external interrupts or ept
  violations),
  then, we may spend some cycles in L0 handling these exits. Note L1 is
  not
  aware of these exits and from L1 perspective L2 was running on the
 CPU.
  That means that we may need to reduce these cycles spent at
  L0 from the preemtion timer or emulate a preemption timer exit to
  force a transition to L1 instead of resuming L2.

  That's precisely what the logic I described should achieve: reload the
  value we saved on L2 exit on reentry.

  But don't you think we should also reduce the cycles spent at L0 from
 the
  preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
  which was not forwarded to L1, then, before we resume L2,
  the preemption timer should be: (previous_value_on_exit - X).
  If (previous_value_on_exit - X)  0, then we should force (emulate) a
  preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y

 Note that in this case X is ignored.

 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)
Yes, your description is right. But I'm also thinking about my
previous consideration, why should we consider such X cycles as what
L2 spent. For nested VMX. external interrupt is not provided by L1, it
is triggered from L0 and want to cause periodically exit to L1, L2

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 10:41, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote:

 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:

 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:

 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit
 controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---

 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles
 exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
 aware of these exits and from L1 perspective L2 was running on the
 CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from
 the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y

 Note that in this case X is ignored.

 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)
 Yes, your description is right. But I'm also thinking about my
 previous consideration, why should we consider such X cycles as what
 L2 spent. For nested VMX. external interrupt is not provided by L1, it
 is triggered from L0 and want

[PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li

This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2-L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of Save VMX-preemption timer value VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 arch/x86/kvm/vmx.c |   49 -
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 57b4e12..6aa320e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER))
+   nested_vmx_exit_ctls_high =
+   (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
+   if (!(nested_vmx_exit_ctls_high  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
+   nested_vmx_pinbased_ctls_high =
+   (~PIN_BASED_VMX_PREEMPTION_TIMER);
nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  VM_EXIT_LOAD_IA32_EFER);
 
@@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
+static void nested_fix_preempt(struct kvm_vcpu *vcpu)
+{
+   u64 delta_guest_tsc;
+   u32 preempt_val, preempt_bit, delta_preempt_val;
+
+   preempt_bit = native_read_msr(MSR_IA32_VMX_MISC)  0x1F;
+   delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu,
+   native_read_tsc()) - vcpu-arch.last_guest_tsc;
+   delta_preempt_val = delta_guest_tsc  preempt_bit;
+   preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   if (preempt_val - delta_preempt_val  0)
+   preempt_val = 0;
+   else
+   preempt_val -= delta_preempt_val;
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val);
+}
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
else
vmx-nested.nested_run_pending = 0;
 
-   if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
-   nested_vmx_vmexit(vcpu);
-   return 1;
+   if (is_guest_mode(vcpu)) {
+   if (nested_vmx_exit_handled(vcpu)) {
+   nested_vmx_vmexit(vcpu);
+   return 1;
+   } else
+   nested_fix_preempt(vcpu);
}
 
if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
@@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   u32 exit_control;
 
vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
@@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
 * bits are further modified by vmx_set_efer() below.
 */
-   vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
+   exit_control = vmcs_config.vmexit_ctrl;
+   if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
+   exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   vmcs_write32(VM_EXIT_CONTROLS, exit_control);
 
/* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
 * emulated by vmx_set_efer(), below.
@@ -8089,6 +8119,15 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmcs12-guest_pending_dbg_exceptions =
vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
+   if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER) 
{
+   if (vmcs12-vm_exit_controls  
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
+   vmcs12-vmx_preemption_timer_value =
+   vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   else
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
+   vmcs12-vmx_preemption_timer_value);
+   }
+
/*
 * In some cases (usually, nested EPT), L2 is allowed to change its
 * own CR3 without exiting. If it has changed it, we must keep it.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More

Re: [PATCH 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 3:30 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add testing for CR0/4 shadowing.

 A few sentences on the test strategy would be good.


 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  lib/x86/vm.h|4 +
  x86/vmx_tests.c |  218 
 +++
  2 files changed, 222 insertions(+)

 diff --git a/lib/x86/vm.h b/lib/x86/vm.h
 index eff6f72..6e0ce2b 100644
 --- a/lib/x86/vm.h
 +++ b/lib/x86/vm.h
 @@ -17,9 +17,13 @@
  #define PTE_ADDR(0xff000ull)

  #define X86_CR0_PE  0x0001
 +#define X86_CR0_MP  0x0002
 +#define X86_CR0_TS  0x0008
  #define X86_CR0_WP  0x0001
  #define X86_CR0_PG  0x8000
  #define X86_CR4_VMXE   0x0001
 +#define X86_CR4_TSD 0x0004
 +#define X86_CR4_DE  0x0008
  #define X86_CR4_PSE 0x0010
  #define X86_CR4_PAE 0x0020
  #define X86_CR4_PCIDE  0x0002
 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index 61b0cef..44be3f4 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -5,12 +5,18 @@

  u64 ia32_pat;
  u64 ia32_efer;
 +u32 stage;

  static inline void vmcall()
  {
   asm volatile(vmcall);
  }

 +static inline void set_stage(u32 s)
 +{
 + asm volatile(mov %0, stage\n\t::r(s):memory, cc);
 +}
 +

 Why do we need state = s as assembler instruction?
This is due to assembler optimization. If we simply use state = s,
assembler will sometimes optimize it and state may not be set indeed.

  void basic_init()
  {
  }
 @@ -257,6 +263,216 @@ static int test_ctrl_efer_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +u32 guest_cr0, guest_cr4;
 +
 +static void cr_shadowing_main()
 +{
 + u32 cr0, cr4, tmp;
 +
 + // Test read through
 + set_stage(0);
 + guest_cr0 = read_cr0();
 + if (stage == 1)
 + report(Read through CR0, 0);
 + else
 + vmcall();
 + set_stage(1);
 + guest_cr4 = read_cr4();
 + if (stage == 2)
 + report(Read through CR4, 0);
 + else
 + vmcall();
 + // Test write through
 + guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP);
 + guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE);
 + set_stage(2);
 + write_cr0(guest_cr0);
 + if (stage == 3)
 + report(Write throuth CR0, 0);
 + else
 + vmcall();
 + set_stage(3);
 + write_cr4(guest_cr4);
 + if (stage == 4)
 + report(Write through CR4, 0);
 + else
 + vmcall();
 + // Test read shadow
 + set_stage(4);
 + vmcall();
 + cr0 = read_cr0();
 + if (stage != 5) {
 + if (cr0 == guest_cr0)
 + report(Read shadowing CR0, 1);
 + else
 + report(Read shadowing CR0, 0);
 + }
 + set_stage(5);
 + cr4 = read_cr4();
 + if (stage != 6) {
 + if (cr4 == guest_cr4)
 + report(Read shadowing CR4, 1);
 + else
 + report(Read shadowing CR4, 0);
 + }
 + // Test write shadow (same value with shadow)
 + set_stage(6);
 + write_cr0(guest_cr0);
 + if (stage == 7)
 + report(Write shadowing CR0 (same value with shadow), 0);
 + else
 + vmcall();
 + set_stage(7);
 + write_cr4(guest_cr4);
 + if (stage == 8)
 + report(Write shadowing CR4 (same value with shadow), 0);
 + else
 + vmcall();
 + // Test write shadow (different value)
 + set_stage(8);
 + tmp = guest_cr0 ^ X86_CR0_TS;
 + asm volatile(mov %0, %%rsi\n\t
 + mov %%rsi, %%cr0\n\t
 + ::m(tmp)
 + :rsi, memory, cc);
 + if (stage != 9)
 + report(Write shadowing different X86_CR0_TS, 0);
 + else
 + report(Write shadowing different X86_CR0_TS, 1);
 + set_stage(9);
 + tmp = guest_cr0 ^ X86_CR0_MP;
 + asm volatile(mov %0, %%rsi\n\t
 + mov %%rsi, %%cr0\n\t
 + ::m(tmp)
 + :rsi, memory, cc);
 + if (stage != 10)
 + report(Write shadowing different X86_CR0_MP, 0);
 + else
 + report(Write shadowing different X86_CR0_MP, 1);
 + set_stage(10);
 + tmp = guest_cr4 ^ X86_CR4_TSD;
 + asm volatile(mov %0, %%rsi\n\t
 + mov %%rsi, %%cr4\n\t
 + ::m(tmp)
 + :rsi, memory, cc);
 + if (stage != 11)
 + report(Write shadowing different X86_CR4_TSD, 0);
 + else
 + report(Write shadowing different X86_CR4_TSD, 1);
 + set_stage(11);
 + tmp = guest_cr4 ^ X86_CR4_DE;
 + asm volatile(mov %0, %%rsi\n\t
 + mov %%rsi, %%cr4\n\t
 + ::m(tmp)
 + :rsi, memory, cc);
 + if (stage != 12)
 + report(Write shadowing different X86_CR4_DE, 0);
 + else
 + report(Write shadowing different X86_CR4_DE, 1

Re: [PATCH 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 3:17 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT,
 EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit
 control fields.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |7 +++
  x86/vmx_tests.c |  185 
 +++
  2 files changed, 192 insertions(+)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index 28595d8..18961f1 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -152,10 +152,12 @@ enum Encoding {
   GUEST_DEBUGCTL  = 0x2802ul,
   GUEST_DEBUGCTL_HI   = 0x2803ul,
   GUEST_EFER  = 0x2806ul,
 + GUEST_PAT   = 0x2804ul,
   GUEST_PERF_GLOBAL_CTRL  = 0x2808ul,
   GUEST_PDPTE = 0x280aul,

   /* 64-Bit Host State */
 + HOST_PAT= 0x2c00ul,
   HOST_EFER   = 0x2c02ul,
   HOST_PERF_GLOBAL_CTRL   = 0x2c04ul,

 @@ -330,11 +332,15 @@ enum Ctrl_exi {
   EXI_HOST_64 = 1UL  9,
   EXI_LOAD_PERF   = 1UL  12,
   EXI_INTA= 1UL  15,
 + EXI_SAVE_PAT= 1UL  18,
 + EXI_LOAD_PAT= 1UL  19,
 + EXI_SAVE_EFER   = 1UL  20,
   EXI_LOAD_EFER   = 1UL  21,
  };

  enum Ctrl_ent {
   ENT_GUEST_64= 1UL  9,
 + ENT_LOAD_PAT= 1UL  14,
   ENT_LOAD_EFER   = 1UL  15,
  };

 @@ -354,6 +360,7 @@ enum Ctrl0 {
   CPU_NMI_WINDOW  = 1ul  22,
   CPU_IO  = 1ul  24,
   CPU_IO_BITMAP   = 1ul  25,
 + CPU_MSR_BITMAP  = 1ul  28,
   CPU_SECONDARY   = 1ul  31,
  };

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index c1b39f4..61b0cef 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -1,4 +1,15 @@
  #include vmx.h
 +#include msr.h
 +#include processor.h
 +#include vm.h
 +
 +u64 ia32_pat;
 +u64 ia32_efer;
 +
 +static inline void vmcall()
 +{
 + asm volatile(vmcall);
 +}

  void basic_init()
  {
 @@ -76,6 +87,176 @@ int vmenter_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +void msr_bmp_init()
 +{
 + void *msr_bitmap;
 + u32 ctrl_cpu0;
 +
 + msr_bitmap = alloc_page();
 + memset(msr_bitmap, 0x0, PAGE_SIZE);
 + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
 + ctrl_cpu0 |= CPU_MSR_BITMAP;
 + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
 + vmcs_write(MSR_BITMAP, (u64)msr_bitmap);
 +}

 Better safe this function for the test case where you actually stress
 the bitmap.
What do you mean by safe?

Arthur

 Jan

 +
 +static void test_ctrl_pat_init()
 +{
 + u64 ctrl_ent;
 + u64 ctrl_exi;
 +
 + msr_bmp_init();
 + ctrl_ent = vmcs_read(ENT_CONTROLS);
 + ctrl_exi = vmcs_read(EXI_CONTROLS);
 + vmcs_write(ENT_CONTROLS, ctrl_ent | ENT_LOAD_PAT);
 + vmcs_write(EXI_CONTROLS, ctrl_exi | (EXI_SAVE_PAT | EXI_LOAD_PAT));
 + ia32_pat = rdmsr(MSR_IA32_CR_PAT);
 + vmcs_write(GUEST_PAT, 0x0);
 + vmcs_write(HOST_PAT, ia32_pat);
 +}
 +
 +static void test_ctrl_pat_main()
 +{
 + u64 guest_ia32_pat;
 +
 + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT);
 + if (!(ctrl_enter_rev.clr  ENT_LOAD_PAT))
 + printf(\tENT_LOAD_PAT is not supported.\n);
 + else {
 + if (guest_ia32_pat != 0) {
 + report(Entry load PAT, 0);
 + return;
 + }
 + }
 + wrmsr(MSR_IA32_CR_PAT, 0x6);
 + vmcall();
 + guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT);
 + if (ctrl_enter_rev.clr  ENT_LOAD_PAT) {
 + if (guest_ia32_pat != ia32_pat) {
 + report(Entry load PAT, 0);
 + return;
 + }
 + report(Entry load PAT, 1);
 + }
 +}
 +
 +static int test_ctrl_pat_exit_handler()
 +{
 + u64 guest_rip;
 + ulong reason;
 + u64 guest_pat;
 +
 + guest_rip = vmcs_read(GUEST_RIP);
 + reason = vmcs_read(EXI_REASON)  0xff;
 + switch (reason) {
 + case VMX_VMCALL:
 + guest_pat = vmcs_read(GUEST_PAT);
 + if (!(ctrl_exit_rev.clr  EXI_SAVE_PAT)) {
 + printf(\tEXI_SAVE_PAT is not supported\n);
 + vmcs_write(GUEST_PAT, 0x6);
 + } else {
 + if (guest_pat == 0x6)
 + report(Exit save PAT, 1);
 + else
 + report(Exit save PAT, 0);
 + }
 + if (!(ctrl_exit_rev.clr  EXI_LOAD_PAT))
 + printf(\tEXI_LOAD_PAT is not supported\n);
 + else {
 + if (rdmsr(MSR_IA32_CR_PAT) == ia32_pat)
 + report(Exit load PAT, 1);
 + else
 + report(Exit load PAT, 0);
 + }
 + vmcs_write(GUEST_PAT, ia32_pat);
 + vmcs_write(GUEST_RIP

Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for I/O bitmaps, including corner cases.

 Would be good to briefly list the corner cases here.


 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |6 +-
  x86/vmx_tests.c |  167 
 +++
  2 files changed, 170 insertions(+), 3 deletions(-)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index 18961f1..dba8b20 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -417,15 +417,15 @@ enum Ctrl1 {
   popf\n\t

  #define VMX_IO_SIZE_MASK 0x7
 -#define _VMX_IO_BYTE 1
 -#define _VMX_IO_WORD 2
 +#define _VMX_IO_BYTE 0
 +#define _VMX_IO_WORD 1
  #define _VMX_IO_LONG 3
  #define VMX_IO_DIRECTION_MASK(1ul  3)
  #define VMX_IO_IN(1ul  3)
  #define VMX_IO_OUT   0
  #define VMX_IO_STRING(1ul  4)
  #define VMX_IO_REP   (1ul  5)
 -#define VMX_IO_OPRAND_DX (1ul  6)
 +#define VMX_IO_OPRAND_IMM(1ul  6)
  #define VMX_IO_PORT_MASK 0x
  #define VMX_IO_PORT_SHIFT16

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index 44be3f4..ad28c4c 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -2,10 +2,13 @@
  #include msr.h
  #include processor.h
  #include vm.h
 +#include io.h

  u64 ia32_pat;
  u64 ia32_efer;
  u32 stage;
 +void *io_bitmap_a, *io_bitmap_b;
 +u16 ioport;

  static inline void vmcall()
  {
 @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +static void iobmp_init()
 +{
 + u32 ctrl_cpu0;
 +
 + io_bitmap_a = alloc_page();
 + io_bitmap_a = alloc_page();
 + memset(io_bitmap_a, 0x0, PAGE_SIZE);
 + memset(io_bitmap_b, 0x0, PAGE_SIZE);
 + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
 + ctrl_cpu0 |= CPU_IO_BITMAP;
 + ctrl_cpu0 = (~CPU_IO);
 + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
 + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a);
 + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b);
 +}
 +
 +static void iobmp_main()
 +{
 +/*
 + data = (u8 *)io_bitmap_b;
 + ioport = 0x;
 + data[(ioport - 0x8000) /8] |= (1  (ioport % 8));
 + inb(ioport);
 + outb(0, ioport);
 +*/

 Forgotten debug code?

 + // stage 0, test IO pass
 + set_stage(0);
 + inb(0x5000);
 + outb(0x0, 0x5000);
 + if (stage != 0)
 + report(I/O bitmap - I/O pass, 0);
 + else
 + report(I/O bitmap - I/O pass, 1);
 + // test IO width, in/out
 + ((u8 *)io_bitmap_a)[0] = 0xFF;
 + set_stage(2);
 + inb(0x0);
 + if (stage != 3)
 + report(I/O bitmap - trap in, 0);
 + else
 + report(I/O bitmap - trap in, 1);
 + set_stage(3);
 + outw(0x0, 0x0);
 + if (stage != 4)
 + report(I/O bitmap - trap out, 0);
 + else
 + report(I/O bitmap - trap out, 1);
 + set_stage(4);
 + inl(0x0);

 Forgot to check the progress?

 + // test low/high IO port
 + set_stage(5);
 + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1  (0x5000 % 8));
 + inb(0x5000);
 + if (stage == 6)
 + report(I/O bitmap - I/O port, low part, 1);
 + else
 + report(I/O bitmap - I/O port, low part, 0);
 + set_stage(6);
 + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1  (0x1000 % 8));
 + inb(0x9000);
 + if (stage == 7)
 + report(I/O bitmap - I/O port, high part, 1);
 + else
 + report(I/O bitmap - I/O port, high part, 0);
 + // test partial pass
 + set_stage(7);
 + inl(0x4FFF);
 + if (stage == 8)
 + report(I/O bitmap - partial pass, 1);
 + else
 + report(I/O bitmap - partial pass, 0);
 + // test overrun
 + set_stage(8);
 + memset(io_bitmap_b, 0xFF, PAGE_SIZE);
 + inl(0x);

 Let's check the expected stage also here.
The check is below if (stage == 9), the following memset is just
used to prevent I/O mask to printf.

 + memset(io_bitmap_b, 0x0, PAGE_SIZE);

 Note that you still have io_bitmap_a[0] != 0 here. You probably want to
 clear it in order to have a clean setup.

 + if (stage == 9)
 + report(I/O bitmap - overrun, 1);
 + else
 + report(I/O bitmap - overrun, 0);
 +
 + return;
 +}
 +
 +static int iobmp_exit_handler()
 +{
 + u64 guest_rip;
 + ulong reason, exit_qual;
 + u32 insn_len;
 + //u32 ctrl_cpu0;
 +
 + guest_rip = vmcs_read(GUEST_RIP);
 + reason = vmcs_read(EXI_REASON)  0xff;
 + exit_qual = vmcs_read(EXI_QUALIFICATION);
 + insn_len = vmcs_read(EXI_INST_LEN);
 + switch (reason) {
 + case VMX_IO:
 + switch (stage) {
 + case 2:
 + if ((exit_qual  VMX_IO_SIZE_MASK) != _VMX_IO_BYTE

Re: [PATCH 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 3:47 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 09:40, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 3:30 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add testing for CR0/4 shadowing.

 A few sentences on the test strategy would be good.


 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  lib/x86/vm.h|4 +
  x86/vmx_tests.c |  218 
 +++
  2 files changed, 222 insertions(+)

 diff --git a/lib/x86/vm.h b/lib/x86/vm.h
 index eff6f72..6e0ce2b 100644
 --- a/lib/x86/vm.h
 +++ b/lib/x86/vm.h
 @@ -17,9 +17,13 @@
  #define PTE_ADDR(0xff000ull)

  #define X86_CR0_PE  0x0001
 +#define X86_CR0_MP  0x0002
 +#define X86_CR0_TS  0x0008
  #define X86_CR0_WP  0x0001
  #define X86_CR0_PG  0x8000
  #define X86_CR4_VMXE   0x0001
 +#define X86_CR4_TSD 0x0004
 +#define X86_CR4_DE  0x0008
  #define X86_CR4_PSE 0x0010
  #define X86_CR4_PAE 0x0020
  #define X86_CR4_PCIDE  0x0002
 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index 61b0cef..44be3f4 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -5,12 +5,18 @@

  u64 ia32_pat;
  u64 ia32_efer;
 +u32 stage;

  static inline void vmcall()
  {
   asm volatile(vmcall);
  }

 +static inline void set_stage(u32 s)
 +{
 + asm volatile(mov %0, stage\n\t::r(s):memory, cc);
 +}
 +

 Why do we need state = s as assembler instruction?
 This is due to assembler optimization. If we simply use state = s,
 assembler will sometimes optimize it and state may not be set indeed.

 volatile u32 stage? And we have barrier() to avoid reordering.
Reordering here is not a big deal here, though it is actually needed
here. I occurred the following problem:

stage = 1;
do something that causes vmexit;
stage = 2;

Then the compiler will optimize stage = 1 and stage = 2 to one
instruction stage =2, since instructions between them don't use
stage. Can volatile solve this problem?

Arthur


  void basic_init()
  {
  }
 @@ -257,6 +263,216 @@ static int test_ctrl_efer_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +u32 guest_cr0, guest_cr4;
 +
 +static void cr_shadowing_main()
 +{
 + u32 cr0, cr4, tmp;
 +
 + // Test read through
 + set_stage(0);
 + guest_cr0 = read_cr0();
 + if (stage == 1)
 + report(Read through CR0, 0);
 + else
 + vmcall();
 + set_stage(1);
 + guest_cr4 = read_cr4();
 + if (stage == 2)
 + report(Read through CR4, 0);
 + else
 + vmcall();
 + // Test write through
 + guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP);
 + guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE);
 + set_stage(2);
 + write_cr0(guest_cr0);
 + if (stage == 3)
 + report(Write throuth CR0, 0);
 + else
 + vmcall();
 + set_stage(3);
 + write_cr4(guest_cr4);
 + if (stage == 4)
 + report(Write through CR4, 0);
 + else
 + vmcall();
 + // Test read shadow
 + set_stage(4);
 + vmcall();
 + cr0 = read_cr0();
 + if (stage != 5) {
 + if (cr0 == guest_cr0)
 + report(Read shadowing CR0, 1);
 + else
 + report(Read shadowing CR0, 0);
 + }
 + set_stage(5);
 + cr4 = read_cr4();
 + if (stage != 6) {
 + if (cr4 == guest_cr4)
 + report(Read shadowing CR4, 1);
 + else
 + report(Read shadowing CR4, 0);
 + }
 + // Test write shadow (same value with shadow)
 + set_stage(6);
 + write_cr0(guest_cr0);
 + if (stage == 7)
 + report(Write shadowing CR0 (same value with shadow), 0);
 + else
 + vmcall();
 + set_stage(7);
 + write_cr4(guest_cr4);
 + if (stage == 8)
 + report(Write shadowing CR4 (same value with shadow), 0);
 + else
 + vmcall();
 + // Test write shadow (different value)
 + set_stage(8);
 + tmp = guest_cr0 ^ X86_CR0_TS;
 + asm volatile(mov %0, %%rsi\n\t
 + mov %%rsi, %%cr0\n\t
 + ::m(tmp)
 + :rsi, memory, cc);
 + if (stage != 9)
 + report(Write shadowing different X86_CR0_TS, 0);
 + else
 + report(Write shadowing different X86_CR0_TS, 1);
 + set_stage(9);
 + tmp = guest_cr0 ^ X86_CR0_MP;
 + asm volatile(mov %0, %%rsi\n\t
 + mov %%rsi, %%cr0\n\t
 + ::m(tmp)
 + :rsi, memory, cc);
 + if (stage != 10)
 + report(Write shadowing different X86_CR0_MP, 0);
 + else
 + report(Write shadowing different X86_CR0_MP, 1);
 + set_stage(10);
 + tmp = guest_cr4 ^ X86_CR4_TSD;
 + asm volatile(mov %0, %%rsi\n\t
 + mov %%rsi, %%cr4\n\t
 + ::m(tmp)
 + :rsi, memory, cc

Re: [PATCH 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 3:48 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 09:41, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 3:17 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT,
 EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit
 control fields.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |7 +++
  x86/vmx_tests.c |  185 
 +++
  2 files changed, 192 insertions(+)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index 28595d8..18961f1 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -152,10 +152,12 @@ enum Encoding {
   GUEST_DEBUGCTL  = 0x2802ul,
   GUEST_DEBUGCTL_HI   = 0x2803ul,
   GUEST_EFER  = 0x2806ul,
 + GUEST_PAT   = 0x2804ul,
   GUEST_PERF_GLOBAL_CTRL  = 0x2808ul,
   GUEST_PDPTE = 0x280aul,

   /* 64-Bit Host State */
 + HOST_PAT= 0x2c00ul,
   HOST_EFER   = 0x2c02ul,
   HOST_PERF_GLOBAL_CTRL   = 0x2c04ul,

 @@ -330,11 +332,15 @@ enum Ctrl_exi {
   EXI_HOST_64 = 1UL  9,
   EXI_LOAD_PERF   = 1UL  12,
   EXI_INTA= 1UL  15,
 + EXI_SAVE_PAT= 1UL  18,
 + EXI_LOAD_PAT= 1UL  19,
 + EXI_SAVE_EFER   = 1UL  20,
   EXI_LOAD_EFER   = 1UL  21,
  };

  enum Ctrl_ent {
   ENT_GUEST_64= 1UL  9,
 + ENT_LOAD_PAT= 1UL  14,
   ENT_LOAD_EFER   = 1UL  15,
  };

 @@ -354,6 +360,7 @@ enum Ctrl0 {
   CPU_NMI_WINDOW  = 1ul  22,
   CPU_IO  = 1ul  24,
   CPU_IO_BITMAP   = 1ul  25,
 + CPU_MSR_BITMAP  = 1ul  28,
   CPU_SECONDARY   = 1ul  31,
  };

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index c1b39f4..61b0cef 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -1,4 +1,15 @@
  #include vmx.h
 +#include msr.h
 +#include processor.h
 +#include vm.h
 +
 +u64 ia32_pat;
 +u64 ia32_efer;
 +
 +static inline void vmcall()
 +{
 + asm volatile(vmcall);
 +}

  void basic_init()
  {
 @@ -76,6 +87,176 @@ int vmenter_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +void msr_bmp_init()
 +{
 + void *msr_bitmap;
 + u32 ctrl_cpu0;
 +
 + msr_bitmap = alloc_page();
 + memset(msr_bitmap, 0x0, PAGE_SIZE);
 + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
 + ctrl_cpu0 |= CPU_MSR_BITMAP;
 + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
 + vmcs_write(MSR_BITMAP, (u64)msr_bitmap);
 +}

 Better safe this function for the test case where you actually stress
 the bitmap.
 What do you mean by safe?

 I meant the other save: This function serves no purpose here. Let's
 only introduce it when that changes, i.e. when you actually test the MSR
 bitmap.
No, the function is meaningful here. We need directly access to MSRs
in guest and if msr bitmap is not set, any access to MSRs will cause
vmexit. Here we just let all rdmsr/wrmsr pass in guest.

Arthur

 Jan





-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 3:58 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 09:51, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for I/O bitmaps, including corner cases.

 Would be good to briefly list the corner cases here.


 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |6 +-
  x86/vmx_tests.c |  167 
 +++
  2 files changed, 170 insertions(+), 3 deletions(-)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index 18961f1..dba8b20 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -417,15 +417,15 @@ enum Ctrl1 {
   popf\n\t

  #define VMX_IO_SIZE_MASK 0x7
 -#define _VMX_IO_BYTE 1
 -#define _VMX_IO_WORD 2
 +#define _VMX_IO_BYTE 0
 +#define _VMX_IO_WORD 1
  #define _VMX_IO_LONG 3
  #define VMX_IO_DIRECTION_MASK(1ul  3)
  #define VMX_IO_IN(1ul  3)
  #define VMX_IO_OUT   0
  #define VMX_IO_STRING(1ul  4)
  #define VMX_IO_REP   (1ul  5)
 -#define VMX_IO_OPRAND_DX (1ul  6)
 +#define VMX_IO_OPRAND_IMM(1ul  6)
  #define VMX_IO_PORT_MASK 0x
  #define VMX_IO_PORT_SHIFT16

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index 44be3f4..ad28c4c 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -2,10 +2,13 @@
  #include msr.h
  #include processor.h
  #include vm.h
 +#include io.h

  u64 ia32_pat;
  u64 ia32_efer;
  u32 stage;
 +void *io_bitmap_a, *io_bitmap_b;
 +u16 ioport;

  static inline void vmcall()
  {
 @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +static void iobmp_init()
 +{
 + u32 ctrl_cpu0;
 +
 + io_bitmap_a = alloc_page();
 + io_bitmap_a = alloc_page();
 + memset(io_bitmap_a, 0x0, PAGE_SIZE);
 + memset(io_bitmap_b, 0x0, PAGE_SIZE);
 + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
 + ctrl_cpu0 |= CPU_IO_BITMAP;
 + ctrl_cpu0 = (~CPU_IO);
 + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
 + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a);
 + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b);
 +}
 +
 +static void iobmp_main()
 +{
 +/*
 + data = (u8 *)io_bitmap_b;
 + ioport = 0x;
 + data[(ioport - 0x8000) /8] |= (1  (ioport % 8));
 + inb(ioport);
 + outb(0, ioport);
 +*/

 Forgotten debug code?

 + // stage 0, test IO pass
 + set_stage(0);
 + inb(0x5000);
 + outb(0x0, 0x5000);
 + if (stage != 0)
 + report(I/O bitmap - I/O pass, 0);
 + else
 + report(I/O bitmap - I/O pass, 1);
 + // test IO width, in/out
 + ((u8 *)io_bitmap_a)[0] = 0xFF;
 + set_stage(2);
 + inb(0x0);
 + if (stage != 3)
 + report(I/O bitmap - trap in, 0);
 + else
 + report(I/O bitmap - trap in, 1);
 + set_stage(3);
 + outw(0x0, 0x0);
 + if (stage != 4)
 + report(I/O bitmap - trap out, 0);
 + else
 + report(I/O bitmap - trap out, 1);
 + set_stage(4);
 + inl(0x0);

 Forgot to check the progress?

 + // test low/high IO port
 + set_stage(5);
 + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1  (0x5000 % 8));
 + inb(0x5000);
 + if (stage == 6)
 + report(I/O bitmap - I/O port, low part, 1);
 + else
 + report(I/O bitmap - I/O port, low part, 0);
 + set_stage(6);
 + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1  (0x1000 % 8));
 + inb(0x9000);
 + if (stage == 7)
 + report(I/O bitmap - I/O port, high part, 1);
 + else
 + report(I/O bitmap - I/O port, high part, 0);
 + // test partial pass
 + set_stage(7);
 + inl(0x4FFF);
 + if (stage == 8)
 + report(I/O bitmap - partial pass, 1);
 + else
 + report(I/O bitmap - partial pass, 0);
 + // test overrun
 + set_stage(8);
 + memset(io_bitmap_b, 0xFF, PAGE_SIZE);
 + inl(0x);

 Let's check the expected stage also here.
 The check is below if (stage == 9), the following memset is just
 used to prevent I/O mask to printf.

 Right, there is an i/o instruction missing below after the second memset
 - or I cannot follow what you are trying to test. The above inl would
 always trigger, independent of the wrap-around. Only if you clear both
 bitmaps, we get to the interesting scenario. So something is still
 wrong here, no?
Yes, we need to memset io_bit_map_a to 0 here. The above inl and the
test if (stage == 9) are cooperatively used to test I/O overrun:
test 4 bits width in to 0x.

Arthur


 + memset(io_bitmap_b, 0x0, PAGE_SIZE);

 Note that you still have io_bitmap_a[0] != 0 here. You probably want to
 clear it in order to have a clean setup.

 + if (stage == 9)
 + report(I/O bitmap - overrun, 1

Re: [PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 4:06 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for instruction interception, including three types:
 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/
 RDPMC/RDTSC/MONITOR/PAUSE)
 2. Secondary Processor-Based VM-Execution Controls (WBINVD)
 3. No control flag (CPUID/INVD)

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.c   |3 +-
  x86/vmx.h   |7 
  x86/vmx_tests.c |  117 
 +++
  3 files changed, 125 insertions(+), 2 deletions(-)

 diff --git a/x86/vmx.c b/x86/vmx.c
 index ca36d35..c346070 100644
 --- a/x86/vmx.c
 +++ b/x86/vmx.c
 @@ -336,8 +336,7 @@ static void init_vmx(void)
   : MSR_IA32_VMX_ENTRY_CTLS);
   ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
   : MSR_IA32_VMX_PROCBASED_CTLS);
 - if (ctrl_cpu_rev[0].set  CPU_SECONDARY)
 - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
 + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
   if (ctrl_cpu_rev[1].set  CPU_EPT || ctrl_cpu_rev[1].set  CPU_VPID)
   ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);

 diff --git a/x86/vmx.h b/x86/vmx.h
 index dba8b20..d81d25d 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -354,6 +354,9 @@ enum Ctrl0 {
   CPU_INTR_WINDOW = 1ul  2,
   CPU_HLT = 1ul  7,
   CPU_INVLPG  = 1ul  9,
 + CPU_MWAIT   = 1ul  10,
 + CPU_RDPMC   = 1ul  11,
 + CPU_RDTSC   = 1ul  12,
   CPU_CR3_LOAD= 1ul  15,
   CPU_CR3_STORE   = 1ul  16,
   CPU_TPR_SHADOW  = 1ul  21,
 @@ -361,6 +364,8 @@ enum Ctrl0 {
   CPU_IO  = 1ul  24,
   CPU_IO_BITMAP   = 1ul  25,
   CPU_MSR_BITMAP  = 1ul  28,
 + CPU_MONITOR = 1ul  29,
 + CPU_PAUSE   = 1ul  30,
   CPU_SECONDARY   = 1ul  31,
  };

 @@ -368,6 +373,8 @@ enum Ctrl1 {
   CPU_EPT = 1ul  1,
   CPU_VPID= 1ul  5,
   CPU_URG = 1ul  7,
 + CPU_WBINVD  = 1ul  6,
 + CPU_RDRAND  = 1ul  11,
  };

  #define SAVE_GPR \
 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index ad28c4c..66187f4 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -20,6 +20,13 @@ static inline void set_stage(u32 s)
   asm volatile(mov %0, stage\n\t::r(s):memory, cc);
  }

 +static inline u32 get_stage()
 +{
 + u32 s;
 + asm volatile(mov stage, %0\n\t:=r(s)::memory, cc);
 + return s;
 +}

 Tagging stage volatile will obsolete this special assembly.

 +
  void basic_init()
  {
  }
 @@ -638,6 +645,114 @@ static int iobmp_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +asm(
 + insn_hlt: hlt;ret\n\t
 + insn_invlpg: invlpg 0x12345678;ret\n\t
 + insn_mwait: mwait;ret\n\t
 + insn_rdpmc: rdpmc;ret\n\t
 + insn_rdtsc: rdtsc;ret\n\t
 + insn_monitor: monitor;ret\n\t
 + insn_pause: pause;ret\n\t
 + insn_wbinvd: wbinvd;ret\n\t
 + insn_cpuid: cpuid;ret\n\t
 + insn_invd: invd;ret\n\t
 +);
 +extern void insn_hlt();
 +extern void insn_invlpg();
 +extern void insn_mwait();
 +extern void insn_rdpmc();
 +extern void insn_rdtsc();
 +extern void insn_monitor();
 +extern void insn_pause();
 +extern void insn_wbinvd();
 +extern void insn_cpuid();
 +extern void insn_invd();
 +
 +u32 cur_insn;
 +
 +struct insn_table {
 + const char *name;
 + u32 flag;
 + void (*insn_func)();
 + u32 type;

 What do the type values mean?
For intercepted instructions we have three type: controlled by Primary
Processor-Based VM-Execution Controls, controlled by Secondary
Controls and always intercepted. The testing process is different for
different types.

 + u32 reason;
 + ulong exit_qual;
 + u32 insn_info;

 For none of the instructions you test, EXI_INST_INFO will have valid
 content on exit. So you must not check it anyway.
Actually , RDRAND uses EXI_INST_INFO though it is not supported now.
Since for all intercepts these three vmcs fields are enough to
determine everything, I put it here for future use.

 +};
 +
 +static struct insn_table insn_table[] = {
 + // Flags for Primary Processor-Based VM-Execution Controls
 + {HLT,  CPU_HLT, insn_hlt, 0, 12, 0, 0},
 + {INVLPG, CPU_INVLPG, insn_invlpg, 0, 14, 0x12345678, 0},
 + {MWAIT, CPU_MWAIT, insn_mwait, 0, 36, 0, 0},
 + {RDPMC, CPU_RDPMC, insn_rdpmc, 0, 15, 0, 0},
 + {RDTSC, CPU_RDTSC, insn_rdtsc, 0, 16, 0, 0},
 + {MONITOR, CPU_MONITOR, insn_monitor, 0, 39, 0, 0},
 + {PAUSE, CPU_PAUSE, insn_pause, 0, 40, 0, 0},
 + // Flags for Secondary Processor-Based VM-Execution Controls
 + {WBINVD, CPU_WBINVD, insn_wbinvd, 1, 54, 0, 0},
 + // Flags for Non-Processor-Based
 + {CPUID

Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 4:13 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 10:09, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 3:58 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 09:51, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for I/O bitmaps, including corner cases.

 Would be good to briefly list the corner cases here.


 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |6 +-
  x86/vmx_tests.c |  167 
 +++
  2 files changed, 170 insertions(+), 3 deletions(-)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index 18961f1..dba8b20 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -417,15 +417,15 @@ enum Ctrl1 {
   popf\n\t

  #define VMX_IO_SIZE_MASK 0x7
 -#define _VMX_IO_BYTE 1
 -#define _VMX_IO_WORD 2
 +#define _VMX_IO_BYTE 0
 +#define _VMX_IO_WORD 1
  #define _VMX_IO_LONG 3
  #define VMX_IO_DIRECTION_MASK(1ul  3)
  #define VMX_IO_IN(1ul  3)
  #define VMX_IO_OUT   0
  #define VMX_IO_STRING(1ul  4)
  #define VMX_IO_REP   (1ul  5)
 -#define VMX_IO_OPRAND_DX (1ul  6)
 +#define VMX_IO_OPRAND_IMM(1ul  6)
  #define VMX_IO_PORT_MASK 0x
  #define VMX_IO_PORT_SHIFT16

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index 44be3f4..ad28c4c 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -2,10 +2,13 @@
  #include msr.h
  #include processor.h
  #include vm.h
 +#include io.h

  u64 ia32_pat;
  u64 ia32_efer;
  u32 stage;
 +void *io_bitmap_a, *io_bitmap_b;
 +u16 ioport;

  static inline void vmcall()
  {
 @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +static void iobmp_init()
 +{
 + u32 ctrl_cpu0;
 +
 + io_bitmap_a = alloc_page();
 + io_bitmap_a = alloc_page();
 + memset(io_bitmap_a, 0x0, PAGE_SIZE);
 + memset(io_bitmap_b, 0x0, PAGE_SIZE);
 + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
 + ctrl_cpu0 |= CPU_IO_BITMAP;
 + ctrl_cpu0 = (~CPU_IO);
 + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
 + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a);
 + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b);
 +}
 +
 +static void iobmp_main()
 +{
 +/*
 + data = (u8 *)io_bitmap_b;
 + ioport = 0x;
 + data[(ioport - 0x8000) /8] |= (1  (ioport % 8));
 + inb(ioport);
 + outb(0, ioport);
 +*/

 Forgotten debug code?

 + // stage 0, test IO pass
 + set_stage(0);
 + inb(0x5000);
 + outb(0x0, 0x5000);
 + if (stage != 0)
 + report(I/O bitmap - I/O pass, 0);
 + else
 + report(I/O bitmap - I/O pass, 1);
 + // test IO width, in/out
 + ((u8 *)io_bitmap_a)[0] = 0xFF;
 + set_stage(2);
 + inb(0x0);
 + if (stage != 3)
 + report(I/O bitmap - trap in, 0);
 + else
 + report(I/O bitmap - trap in, 1);
 + set_stage(3);
 + outw(0x0, 0x0);
 + if (stage != 4)
 + report(I/O bitmap - trap out, 0);
 + else
 + report(I/O bitmap - trap out, 1);
 + set_stage(4);
 + inl(0x0);

 Forgot to check the progress?

 + // test low/high IO port
 + set_stage(5);
 + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1  (0x5000 % 8));
 + inb(0x5000);
 + if (stage == 6)
 + report(I/O bitmap - I/O port, low part, 1);
 + else
 + report(I/O bitmap - I/O port, low part, 0);
 + set_stage(6);
 + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1  (0x1000 % 8));
 + inb(0x9000);
 + if (stage == 7)
 + report(I/O bitmap - I/O port, high part, 1);
 + else
 + report(I/O bitmap - I/O port, high part, 0);
 + // test partial pass
 + set_stage(7);
 + inl(0x4FFF);
 + if (stage == 8)
 + report(I/O bitmap - partial pass, 1);
 + else
 + report(I/O bitmap - partial pass, 0);
 + // test overrun
 + set_stage(8);
 + memset(io_bitmap_b, 0xFF, PAGE_SIZE);
 + inl(0x);

 Let's check the expected stage also here.
 The check is below if (stage == 9), the following memset is just
 used to prevent I/O mask to printf.

 Right, there is an i/o instruction missing below after the second memset
 - or I cannot follow what you are trying to test. The above inl would
 always trigger, independent of the wrap-around. Only if you clear both
 bitmaps, we get to the interesting scenario. So something is still
 wrong here, no?
 Yes, we need to memset io_bit_map_a to 0 here. The above inl and the
 test if (stage == 9) are cooperatively used to test I/O overrun:
 test 4 bits width in to 0x.

 The point is that, according to our understanding of the SDM, we should
 even see a trap in this wrap-around scenario if both

Re: [PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 4:20 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 10:16, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 4:06 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for instruction interception, including three types:
 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/
 RDPMC/RDTSC/MONITOR/PAUSE)
 2. Secondary Processor-Based VM-Execution Controls (WBINVD)
 3. No control flag (CPUID/INVD)

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.c   |3 +-
  x86/vmx.h   |7 
  x86/vmx_tests.c |  117 
 +++
  3 files changed, 125 insertions(+), 2 deletions(-)

 diff --git a/x86/vmx.c b/x86/vmx.c
 index ca36d35..c346070 100644
 --- a/x86/vmx.c
 +++ b/x86/vmx.c
 @@ -336,8 +336,7 @@ static void init_vmx(void)
   : MSR_IA32_VMX_ENTRY_CTLS);
   ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
   : MSR_IA32_VMX_PROCBASED_CTLS);
 - if (ctrl_cpu_rev[0].set  CPU_SECONDARY)
 - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
 + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
   if (ctrl_cpu_rev[1].set  CPU_EPT || ctrl_cpu_rev[1].set  CPU_VPID)
   ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);

 diff --git a/x86/vmx.h b/x86/vmx.h
 index dba8b20..d81d25d 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -354,6 +354,9 @@ enum Ctrl0 {
   CPU_INTR_WINDOW = 1ul  2,
   CPU_HLT = 1ul  7,
   CPU_INVLPG  = 1ul  9,
 + CPU_MWAIT   = 1ul  10,
 + CPU_RDPMC   = 1ul  11,
 + CPU_RDTSC   = 1ul  12,
   CPU_CR3_LOAD= 1ul  15,
   CPU_CR3_STORE   = 1ul  16,
   CPU_TPR_SHADOW  = 1ul  21,
 @@ -361,6 +364,8 @@ enum Ctrl0 {
   CPU_IO  = 1ul  24,
   CPU_IO_BITMAP   = 1ul  25,
   CPU_MSR_BITMAP  = 1ul  28,
 + CPU_MONITOR = 1ul  29,
 + CPU_PAUSE   = 1ul  30,
   CPU_SECONDARY   = 1ul  31,
  };

 @@ -368,6 +373,8 @@ enum Ctrl1 {
   CPU_EPT = 1ul  1,
   CPU_VPID= 1ul  5,
   CPU_URG = 1ul  7,
 + CPU_WBINVD  = 1ul  6,
 + CPU_RDRAND  = 1ul  11,
  };

  #define SAVE_GPR \
 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index ad28c4c..66187f4 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -20,6 +20,13 @@ static inline void set_stage(u32 s)
   asm volatile(mov %0, stage\n\t::r(s):memory, cc);
  }

 +static inline u32 get_stage()
 +{
 + u32 s;
 + asm volatile(mov stage, %0\n\t:=r(s)::memory, cc);
 + return s;
 +}

 Tagging stage volatile will obsolete this special assembly.

 +
  void basic_init()
  {
  }
 @@ -638,6 +645,114 @@ static int iobmp_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +asm(
 + insn_hlt: hlt;ret\n\t
 + insn_invlpg: invlpg 0x12345678;ret\n\t
 + insn_mwait: mwait;ret\n\t
 + insn_rdpmc: rdpmc;ret\n\t
 + insn_rdtsc: rdtsc;ret\n\t
 + insn_monitor: monitor;ret\n\t
 + insn_pause: pause;ret\n\t
 + insn_wbinvd: wbinvd;ret\n\t
 + insn_cpuid: cpuid;ret\n\t
 + insn_invd: invd;ret\n\t
 +);
 +extern void insn_hlt();
 +extern void insn_invlpg();
 +extern void insn_mwait();
 +extern void insn_rdpmc();
 +extern void insn_rdtsc();
 +extern void insn_monitor();
 +extern void insn_pause();
 +extern void insn_wbinvd();
 +extern void insn_cpuid();
 +extern void insn_invd();
 +
 +u32 cur_insn;
 +
 +struct insn_table {
 + const char *name;
 + u32 flag;
 + void (*insn_func)();
 + u32 type;

 What do the type values mean?
 For intercepted instructions we have three type: controlled by Primary
 Processor-Based VM-Execution Controls, controlled by Secondary
 Controls and always intercepted. The testing process is different for
 different types.

 This was a rhetorical questions. ;) Could you make the values symbolic?
OK. It's better to rename it to ctrl_field and define some macros
such as CTRL_CPU0, CTRL_CPU1, CTRL_NONE to make it more readable.


 + u32 reason;
 + ulong exit_qual;
 + u32 insn_info;

 For none of the instructions you test, EXI_INST_INFO will have valid
 content on exit. So you must not check it anyway.
 Actually , RDRAND uses EXI_INST_INFO though it is not supported now.
 Since for all intercepts these three vmcs fields are enough to
 determine everything, I put it here for future use.

 OK, but don't test its value when it's undefined - like in all cases
 implemented here.
Only test the used field will make it more complex because we need to
define which field is used in insn_table. Besides, if any of these
three fields is unused, it will be set to 0; and I think writing like
this is OK since we just write a test case

Re: [PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 4:40 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 10:35, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 4:20 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 10:16, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 4:06 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for instruction interception, including three types:
 1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/
 RDPMC/RDTSC/MONITOR/PAUSE)
 2. Secondary Processor-Based VM-Execution Controls (WBINVD)
 3. No control flag (CPUID/INVD)

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.c   |3 +-
  x86/vmx.h   |7 
  x86/vmx_tests.c |  117 
 +++
  3 files changed, 125 insertions(+), 2 deletions(-)

 diff --git a/x86/vmx.c b/x86/vmx.c
 index ca36d35..c346070 100644
 --- a/x86/vmx.c
 +++ b/x86/vmx.c
 @@ -336,8 +336,7 @@ static void init_vmx(void)
   : MSR_IA32_VMX_ENTRY_CTLS);
   ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
   : MSR_IA32_VMX_PROCBASED_CTLS);
 - if (ctrl_cpu_rev[0].set  CPU_SECONDARY)
 - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
 + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
   if (ctrl_cpu_rev[1].set  CPU_EPT || ctrl_cpu_rev[1].set  
 CPU_VPID)
   ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);

 diff --git a/x86/vmx.h b/x86/vmx.h
 index dba8b20..d81d25d 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -354,6 +354,9 @@ enum Ctrl0 {
   CPU_INTR_WINDOW = 1ul  2,
   CPU_HLT = 1ul  7,
   CPU_INVLPG  = 1ul  9,
 + CPU_MWAIT   = 1ul  10,
 + CPU_RDPMC   = 1ul  11,
 + CPU_RDTSC   = 1ul  12,
   CPU_CR3_LOAD= 1ul  15,
   CPU_CR3_STORE   = 1ul  16,
   CPU_TPR_SHADOW  = 1ul  21,
 @@ -361,6 +364,8 @@ enum Ctrl0 {
   CPU_IO  = 1ul  24,
   CPU_IO_BITMAP   = 1ul  25,
   CPU_MSR_BITMAP  = 1ul  28,
 + CPU_MONITOR = 1ul  29,
 + CPU_PAUSE   = 1ul  30,
   CPU_SECONDARY   = 1ul  31,
  };

 @@ -368,6 +373,8 @@ enum Ctrl1 {
   CPU_EPT = 1ul  1,
   CPU_VPID= 1ul  5,
   CPU_URG = 1ul  7,
 + CPU_WBINVD  = 1ul  6,
 + CPU_RDRAND  = 1ul  11,
  };

  #define SAVE_GPR \
 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index ad28c4c..66187f4 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -20,6 +20,13 @@ static inline void set_stage(u32 s)
   asm volatile(mov %0, stage\n\t::r(s):memory, cc);
  }

 +static inline u32 get_stage()
 +{
 + u32 s;
 + asm volatile(mov stage, %0\n\t:=r(s)::memory, cc);
 + return s;
 +}

 Tagging stage volatile will obsolete this special assembly.

 +
  void basic_init()
  {
  }
 @@ -638,6 +645,114 @@ static int iobmp_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +asm(
 + insn_hlt: hlt;ret\n\t
 + insn_invlpg: invlpg 0x12345678;ret\n\t
 + insn_mwait: mwait;ret\n\t
 + insn_rdpmc: rdpmc;ret\n\t
 + insn_rdtsc: rdtsc;ret\n\t
 + insn_monitor: monitor;ret\n\t
 + insn_pause: pause;ret\n\t
 + insn_wbinvd: wbinvd;ret\n\t
 + insn_cpuid: cpuid;ret\n\t
 + insn_invd: invd;ret\n\t
 +);
 +extern void insn_hlt();
 +extern void insn_invlpg();
 +extern void insn_mwait();
 +extern void insn_rdpmc();
 +extern void insn_rdtsc();
 +extern void insn_monitor();
 +extern void insn_pause();
 +extern void insn_wbinvd();
 +extern void insn_cpuid();
 +extern void insn_invd();
 +
 +u32 cur_insn;
 +
 +struct insn_table {
 + const char *name;
 + u32 flag;
 + void (*insn_func)();
 + u32 type;

 What do the type values mean?
 For intercepted instructions we have three type: controlled by Primary
 Processor-Based VM-Execution Controls, controlled by Secondary
 Controls and always intercepted. The testing process is different for
 different types.

 This was a rhetorical questions. ;) Could you make the values symbolic?
 OK. It's better to rename it to ctrl_field and define some macros
 such as CTRL_CPU0, CTRL_CPU1, CTRL_NONE to make it more readable.


 + u32 reason;
 + ulong exit_qual;
 + u32 insn_info;

 For none of the instructions you test, EXI_INST_INFO will have valid
 content on exit. So you must not check it anyway.
 Actually , RDRAND uses EXI_INST_INFO though it is not supported now.
 Since for all intercepts these three vmcs fields are enough to
 determine everything, I put it here for future use.

 OK, but don't test its value when it's undefined - like in all cases
 implemented here.
 Only test the used field will make it more complex because we need to
 define which field is used in insn_table. Besides, if any

Re: [PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps

2013-08-15 Thread Arthur Chunqi Li

On Thu, Aug 15, 2013 at 4:23 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 10:20, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 4:13 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 10:09, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 3:58 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-15 09:51, Arthur Chunqi Li wrote:
 On Thu, Aug 15, 2013 at 3:40 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-13 17:56, Arthur Chunqi Li wrote:
 Add test cases for I/O bitmaps, including corner cases.

 Would be good to briefly list the corner cases here.


 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/vmx.h   |6 +-
  x86/vmx_tests.c |  167 
 +++
  2 files changed, 170 insertions(+), 3 deletions(-)

 diff --git a/x86/vmx.h b/x86/vmx.h
 index 18961f1..dba8b20 100644
 --- a/x86/vmx.h
 +++ b/x86/vmx.h
 @@ -417,15 +417,15 @@ enum Ctrl1 {
   popf\n\t

  #define VMX_IO_SIZE_MASK 0x7
 -#define _VMX_IO_BYTE 1
 -#define _VMX_IO_WORD 2
 +#define _VMX_IO_BYTE 0
 +#define _VMX_IO_WORD 1
  #define _VMX_IO_LONG 3
  #define VMX_IO_DIRECTION_MASK(1ul  3)
  #define VMX_IO_IN(1ul  3)
  #define VMX_IO_OUT   0
  #define VMX_IO_STRING(1ul  4)
  #define VMX_IO_REP   (1ul  5)
 -#define VMX_IO_OPRAND_DX (1ul  6)
 +#define VMX_IO_OPRAND_IMM(1ul  6)
  #define VMX_IO_PORT_MASK 0x
  #define VMX_IO_PORT_SHIFT16

 diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
 index 44be3f4..ad28c4c 100644
 --- a/x86/vmx_tests.c
 +++ b/x86/vmx_tests.c
 @@ -2,10 +2,13 @@
  #include msr.h
  #include processor.h
  #include vm.h
 +#include io.h

  u64 ia32_pat;
  u64 ia32_efer;
  u32 stage;
 +void *io_bitmap_a, *io_bitmap_b;
 +u16 ioport;

  static inline void vmcall()
  {
 @@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler()
   return VMX_TEST_VMEXIT;
  }

 +static void iobmp_init()
 +{
 + u32 ctrl_cpu0;
 +
 + io_bitmap_a = alloc_page();
 + io_bitmap_a = alloc_page();
 + memset(io_bitmap_a, 0x0, PAGE_SIZE);
 + memset(io_bitmap_b, 0x0, PAGE_SIZE);
 + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
 + ctrl_cpu0 |= CPU_IO_BITMAP;
 + ctrl_cpu0 = (~CPU_IO);
 + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
 + vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a);
 + vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b);
 +}
 +
 +static void iobmp_main()
 +{
 +/*
 + data = (u8 *)io_bitmap_b;
 + ioport = 0x;
 + data[(ioport - 0x8000) /8] |= (1  (ioport % 8));
 + inb(ioport);
 + outb(0, ioport);
 +*/

 Forgotten debug code?

 + // stage 0, test IO pass
 + set_stage(0);
 + inb(0x5000);
 + outb(0x0, 0x5000);
 + if (stage != 0)
 + report(I/O bitmap - I/O pass, 0);
 + else
 + report(I/O bitmap - I/O pass, 1);
 + // test IO width, in/out
 + ((u8 *)io_bitmap_a)[0] = 0xFF;
 + set_stage(2);
 + inb(0x0);
 + if (stage != 3)
 + report(I/O bitmap - trap in, 0);
 + else
 + report(I/O bitmap - trap in, 1);
 + set_stage(3);
 + outw(0x0, 0x0);
 + if (stage != 4)
 + report(I/O bitmap - trap out, 0);
 + else
 + report(I/O bitmap - trap out, 1);
 + set_stage(4);
 + inl(0x0);

 Forgot to check the progress?

 + // test low/high IO port
 + set_stage(5);
 + ((u8 *)io_bitmap_a)[0x5000 / 8] = (1  (0x5000 % 8));
 + inb(0x5000);
 + if (stage == 6)
 + report(I/O bitmap - I/O port, low part, 1);
 + else
 + report(I/O bitmap - I/O port, low part, 0);
 + set_stage(6);
 + ((u8 *)io_bitmap_b)[0x1000 / 8] = (1  (0x1000 % 8));
 + inb(0x9000);
 + if (stage == 7)
 + report(I/O bitmap - I/O port, high part, 1);
 + else
 + report(I/O bitmap - I/O port, high part, 0);
 + // test partial pass
 + set_stage(7);
 + inl(0x4FFF);
 + if (stage == 8)
 + report(I/O bitmap - partial pass, 1);
 + else
 + report(I/O bitmap - partial pass, 0);
 + // test overrun
 + set_stage(8);
 + memset(io_bitmap_b, 0xFF, PAGE_SIZE);
 + inl(0x);

 Let's check the expected stage also here.
 The check is below if (stage == 9), the following memset is just
 used to prevent I/O mask to printf.

 Right, there is an i/o instruction missing below after the second memset
 - or I cannot follow what you are trying to test. The above inl would
 always trigger, independent of the wrap-around. Only if you clear both
 bitmaps, we get to the interesting scenario. So something is still
 wrong here, no?
 Yes, we need to memset io_bit_map_a to 0 here. The above inl and the
 test if (stage == 9) are cooperatively used to test I/O overrun:
 test 4 bits width in to 0x.

 The point

[PATCH v2 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps

2013-08-15 Thread Arthur Chunqi Li

Add test cases for I/O bitmaps, including corner cases.

Test includes: pass  trap, in  out, different I/O width, low  high
I/O bitmap, partial I/O pass, overrun (inl 0x).

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.h   |6 +--
 x86/vmx_tests.c |  159 +++
 2 files changed, 162 insertions(+), 3 deletions(-)

diff --git a/x86/vmx.h b/x86/vmx.h
index 18961f1..dba8b20 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -417,15 +417,15 @@ enum Ctrl1 {
popf\n\t
 
 #define VMX_IO_SIZE_MASK   0x7
-#define _VMX_IO_BYTE   1
-#define _VMX_IO_WORD   2
+#define _VMX_IO_BYTE   0
+#define _VMX_IO_WORD   1
 #define _VMX_IO_LONG   3
 #define VMX_IO_DIRECTION_MASK  (1ul  3)
 #define VMX_IO_IN  (1ul  3)
 #define VMX_IO_OUT 0
 #define VMX_IO_STRING  (1ul  4)
 #define VMX_IO_REP (1ul  5)
-#define VMX_IO_OPRAND_DX   (1ul  6)
+#define VMX_IO_OPRAND_IMM  (1ul  6)
 #define VMX_IO_PORT_MASK   0x
 #define VMX_IO_PORT_SHIFT  16
 
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index a5cc353..cd4dd99 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -2,10 +2,13 @@
 #include msr.h
 #include processor.h
 #include vm.h
+#include io.h
 
 u64 ia32_pat;
 u64 ia32_efer;
 volatile u32 stage;
+void *io_bitmap_a, *io_bitmap_b;
+u16 ioport;
 
 static inline void vmcall()
 {
@@ -473,6 +476,160 @@ static int cr_shadowing_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+static void iobmp_init()
+{
+   u32 ctrl_cpu0;
+
+   io_bitmap_a = alloc_page();
+   io_bitmap_a = alloc_page();
+   memset(io_bitmap_a, 0x0, PAGE_SIZE);
+   memset(io_bitmap_b, 0x0, PAGE_SIZE);
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu0 |= CPU_IO_BITMAP;
+   ctrl_cpu0 = (~CPU_IO);
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
+   vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a);
+   vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b);
+}
+
+static void iobmp_main()
+{
+   // stage 0, test IO pass
+   set_stage(0);
+   inb(0x5000);
+   outb(0x0, 0x5000);
+   if (stage != 0)
+   report(I/O bitmap - I/O pass, 0);
+   else
+   report(I/O bitmap - I/O pass, 1);
+   // test IO width, in/out
+   ((u8 *)io_bitmap_a)[0] = 0xFF;
+   set_stage(2);
+   inb(0x0);
+   if (stage != 3)
+   report(I/O bitmap - trap in, 0);
+   else
+   report(I/O bitmap - trap in, 1);
+   set_stage(3);
+   outw(0x0, 0x0);
+   if (stage != 4)
+   report(I/O bitmap - trap out, 0);
+   else
+   report(I/O bitmap - trap out, 1);
+   set_stage(4);
+   inl(0x0);
+   if (stage != 5)
+   report(I/O bitmap - I/O width, long, 0);
+   // test low/high IO port
+   set_stage(5);
+   ((u8 *)io_bitmap_a)[0x5000 / 8] = (1  (0x5000 % 8));
+   inb(0x5000);
+   if (stage == 6)
+   report(I/O bitmap - I/O port, low part, 1);
+   else
+   report(I/O bitmap - I/O port, low part, 0);
+   set_stage(6);
+   ((u8 *)io_bitmap_b)[0x1000 / 8] = (1  (0x1000 % 8));
+   inb(0x9000);
+   if (stage == 7)
+   report(I/O bitmap - I/O port, high part, 1);
+   else
+   report(I/O bitmap - I/O port, high part, 0);
+   // test partial pass
+   set_stage(7);
+   inl(0x4FFF);
+   if (stage == 8)
+   report(I/O bitmap - partial pass, 1);
+   else
+   report(I/O bitmap - partial pass, 0);
+   // test overrun
+   set_stage(8);
+   memset(io_bitmap_a, 0x0, PAGE_SIZE);
+   memset(io_bitmap_b, 0x0, PAGE_SIZE);
+   inl(0x);
+   if (stage == 9)
+   report(I/O bitmap - overrun, 1);
+   else
+   report(I/O bitmap - overrun, 0);
+   
+   return;
+}
+
+static int iobmp_exit_handler()
+{
+   u64 guest_rip;
+   ulong reason, exit_qual;
+   u32 insn_len;
+
+   guest_rip = vmcs_read(GUEST_RIP);
+   reason = vmcs_read(EXI_REASON)  0xff;
+   exit_qual = vmcs_read(EXI_QUALIFICATION);
+   insn_len = vmcs_read(EXI_INST_LEN);
+   switch (reason) {
+   case VMX_IO:
+   switch (stage) {
+   case 2:
+   if ((exit_qual  VMX_IO_SIZE_MASK) != _VMX_IO_BYTE)
+   report(I/O bitmap - I/O width, byte, 0);
+   else
+   report(I/O bitmap - I/O width, byte, 1);
+   if (!(exit_qual  VMX_IO_IN))
+   report(I/O bitmap - I/O direction, in, 0);
+   else
+   report(I/O bitmap - I/O direction, in, 1);
+   set_stage(stage + 1

[PATCH v2 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER

2013-08-15 Thread Arthur Chunqi Li

Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT,
EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit
control fields.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.h   |7 +++
 x86/vmx_tests.c |  185 +++
 2 files changed, 192 insertions(+)

diff --git a/x86/vmx.h b/x86/vmx.h
index 28595d8..18961f1 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -152,10 +152,12 @@ enum Encoding {
GUEST_DEBUGCTL  = 0x2802ul,
GUEST_DEBUGCTL_HI   = 0x2803ul,
GUEST_EFER  = 0x2806ul,
+   GUEST_PAT   = 0x2804ul,
GUEST_PERF_GLOBAL_CTRL  = 0x2808ul,
GUEST_PDPTE = 0x280aul,
 
/* 64-Bit Host State */
+   HOST_PAT= 0x2c00ul,
HOST_EFER   = 0x2c02ul,
HOST_PERF_GLOBAL_CTRL   = 0x2c04ul,
 
@@ -330,11 +332,15 @@ enum Ctrl_exi {
EXI_HOST_64 = 1UL  9,
EXI_LOAD_PERF   = 1UL  12,
EXI_INTA= 1UL  15,
+   EXI_SAVE_PAT= 1UL  18,
+   EXI_LOAD_PAT= 1UL  19,
+   EXI_SAVE_EFER   = 1UL  20,
EXI_LOAD_EFER   = 1UL  21,
 };
 
 enum Ctrl_ent {
ENT_GUEST_64= 1UL  9,
+   ENT_LOAD_PAT= 1UL  14,
ENT_LOAD_EFER   = 1UL  15,
 };
 
@@ -354,6 +360,7 @@ enum Ctrl0 {
CPU_NMI_WINDOW  = 1ul  22,
CPU_IO  = 1ul  24,
CPU_IO_BITMAP   = 1ul  25,
+   CPU_MSR_BITMAP  = 1ul  28,
CPU_SECONDARY   = 1ul  31,
 };
 
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index c1b39f4..61b0cef 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,4 +1,15 @@
 #include vmx.h
+#include msr.h
+#include processor.h
+#include vm.h
+
+u64 ia32_pat;
+u64 ia32_efer;
+
+static inline void vmcall()
+{
+   asm volatile(vmcall);
+}
 
 void basic_init()
 {
@@ -76,6 +87,176 @@ int vmenter_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+void msr_bmp_init()
+{
+   void *msr_bitmap;
+   u32 ctrl_cpu0;
+
+   msr_bitmap = alloc_page();
+   memset(msr_bitmap, 0x0, PAGE_SIZE);
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu0 |= CPU_MSR_BITMAP;
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
+   vmcs_write(MSR_BITMAP, (u64)msr_bitmap);
+}
+
+static void test_ctrl_pat_init()
+{
+   u64 ctrl_ent;
+   u64 ctrl_exi;
+
+   msr_bmp_init();
+   ctrl_ent = vmcs_read(ENT_CONTROLS);
+   ctrl_exi = vmcs_read(EXI_CONTROLS);
+   vmcs_write(ENT_CONTROLS, ctrl_ent | ENT_LOAD_PAT);
+   vmcs_write(EXI_CONTROLS, ctrl_exi | (EXI_SAVE_PAT | EXI_LOAD_PAT));
+   ia32_pat = rdmsr(MSR_IA32_CR_PAT);
+   vmcs_write(GUEST_PAT, 0x0);
+   vmcs_write(HOST_PAT, ia32_pat);
+}
+
+static void test_ctrl_pat_main()
+{
+   u64 guest_ia32_pat;
+
+   guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT);
+   if (!(ctrl_enter_rev.clr  ENT_LOAD_PAT))
+   printf(\tENT_LOAD_PAT is not supported.\n);
+   else {
+   if (guest_ia32_pat != 0) {
+   report(Entry load PAT, 0);
+   return;
+   }
+   }
+   wrmsr(MSR_IA32_CR_PAT, 0x6);
+   vmcall();
+   guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT);
+   if (ctrl_enter_rev.clr  ENT_LOAD_PAT) {
+   if (guest_ia32_pat != ia32_pat) {
+   report(Entry load PAT, 0);
+   return;
+   }
+   report(Entry load PAT, 1);
+   }
+}
+
+static int test_ctrl_pat_exit_handler()
+{
+   u64 guest_rip;
+   ulong reason;
+   u64 guest_pat;
+
+   guest_rip = vmcs_read(GUEST_RIP);
+   reason = vmcs_read(EXI_REASON)  0xff;
+   switch (reason) {
+   case VMX_VMCALL:
+   guest_pat = vmcs_read(GUEST_PAT);
+   if (!(ctrl_exit_rev.clr  EXI_SAVE_PAT)) {
+   printf(\tEXI_SAVE_PAT is not supported\n);
+   vmcs_write(GUEST_PAT, 0x6);
+   } else {
+   if (guest_pat == 0x6)
+   report(Exit save PAT, 1);
+   else
+   report(Exit save PAT, 0);
+   }
+   if (!(ctrl_exit_rev.clr  EXI_LOAD_PAT))
+   printf(\tEXI_LOAD_PAT is not supported\n);
+   else {
+   if (rdmsr(MSR_IA32_CR_PAT) == ia32_pat)
+   report(Exit load PAT, 1);
+   else
+   report(Exit load PAT, 0);
+   }
+   vmcs_write(GUEST_PAT, ia32_pat);
+   vmcs_write(GUEST_RIP, guest_rip + 3);
+   return VMX_TEST_RESUME;
+   default:
+   printf(ERROR : Undefined exit reason, reason = %d.\n, reason);
+   break;
+   }
+   return

[PATCH v2 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing

2013-08-15 Thread Arthur Chunqi Li

Add testing for CR0/4 shadowing. Two types of flags in CR0/4 are
tested: flags owned and shadowed by L1. They are treated differently
in KVM. We test one flag of both types in CR0 (TS and MP) and CR4
(DE and TSD) with read through, read shadow, write through, write
shadow (same as and different from shadowed value).

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 lib/x86/vm.h|4 +
 x86/vmx_tests.c |  218 +++
 2 files changed, 222 insertions(+)

diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index eff6f72..6e0ce2b 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -17,9 +17,13 @@
 #define PTE_ADDR(0xff000ull)
 
 #define X86_CR0_PE  0x0001
+#define X86_CR0_MP  0x0002
+#define X86_CR0_TS  0x0008
 #define X86_CR0_WP  0x0001
 #define X86_CR0_PG  0x8000
 #define X86_CR4_VMXE   0x0001
+#define X86_CR4_TSD 0x0004
+#define X86_CR4_DE  0x0008
 #define X86_CR4_PSE 0x0010
 #define X86_CR4_PAE 0x0020
 #define X86_CR4_PCIDE  0x0002
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 61b0cef..a5cc353 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -5,12 +5,20 @@
 
 u64 ia32_pat;
 u64 ia32_efer;
+volatile u32 stage;
 
 static inline void vmcall()
 {
asm volatile(vmcall);
 }
 
+static inline void set_stage(u32 s)
+{
+   barrier();
+   stage = s;
+   barrier();
+}
+
 void basic_init()
 {
 }
@@ -257,6 +265,214 @@ static int test_ctrl_efer_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+u32 guest_cr0, guest_cr4;
+
+static void cr_shadowing_main()
+{
+   u32 cr0, cr4, tmp;
+
+   // Test read through
+   set_stage(0);
+   guest_cr0 = read_cr0();
+   if (stage == 1)
+   report(Read through CR0, 0);
+   else
+   vmcall();
+   set_stage(1);
+   guest_cr4 = read_cr4();
+   if (stage == 2)
+   report(Read through CR4, 0);
+   else
+   vmcall();
+   // Test write through
+   guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP);
+   guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE);
+   set_stage(2);
+   write_cr0(guest_cr0);
+   if (stage == 3)
+   report(Write throuth CR0, 0);
+   else
+   vmcall();
+   set_stage(3);
+   write_cr4(guest_cr4);
+   if (stage == 4)
+   report(Write through CR4, 0);
+   else
+   vmcall();
+   // Test read shadow
+   set_stage(4);
+   vmcall();
+   cr0 = read_cr0();
+   if (stage != 5) {
+   if (cr0 == guest_cr0)
+   report(Read shadowing CR0, 1);
+   else
+   report(Read shadowing CR0, 0);
+   }
+   set_stage(5);
+   cr4 = read_cr4();
+   if (stage != 6) {
+   if (cr4 == guest_cr4)
+   report(Read shadowing CR4, 1);
+   else
+   report(Read shadowing CR4, 0);
+   }
+   // Test write shadow (same value with shadow)
+   set_stage(6);
+   write_cr0(guest_cr0);
+   if (stage == 7)
+   report(Write shadowing CR0 (same value with shadow), 0);
+   else
+   vmcall();
+   set_stage(7);
+   write_cr4(guest_cr4);
+   if (stage == 8)
+   report(Write shadowing CR4 (same value with shadow), 0);
+   else
+   vmcall();
+   // Test write shadow (different value)
+   set_stage(8);
+   tmp = guest_cr0 ^ X86_CR0_TS;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr0\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 9)
+   report(Write shadowing different X86_CR0_TS, 0);
+   else
+   report(Write shadowing different X86_CR0_TS, 1);
+   set_stage(9);
+   tmp = guest_cr0 ^ X86_CR0_MP;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr0\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 10)
+   report(Write shadowing different X86_CR0_MP, 0);
+   else
+   report(Write shadowing different X86_CR0_MP, 1);
+   set_stage(10);
+   tmp = guest_cr4 ^ X86_CR4_TSD;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr4\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 11)
+   report(Write shadowing different X86_CR4_TSD, 0);
+   else
+   report(Write shadowing different X86_CR4_TSD, 1);
+   set_stage(11);
+   tmp = guest_cr4 ^ X86_CR4_DE;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr4\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 12)
+   report(Write shadowing different X86_CR4_DE, 0);
+   else
+   report(Write shadowing different X86_CR4_DE, 1);
+}
+
+static int

[PATCH v2 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception

2013-08-15 Thread Arthur Chunqi Li

Add test cases for instruction interception, including four types:
1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/
RDPMC/RDTSC/MONITOR/PAUSE)
2. Secondary Processor-Based VM-Execution Controls (WBINVD)
3. No control flag, always trap (CPUID/INVD)
4. Instructions always pass

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.c   |3 +-
 x86/vmx.h   |7 +++
 x86/vmx_tests.c |  152 +++
 3 files changed, 160 insertions(+), 2 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index ca36d35..c346070 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -336,8 +336,7 @@ static void init_vmx(void)
: MSR_IA32_VMX_ENTRY_CTLS);
ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
: MSR_IA32_VMX_PROCBASED_CTLS);
-   if (ctrl_cpu_rev[0].set  CPU_SECONDARY)
-   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
+   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
if (ctrl_cpu_rev[1].set  CPU_EPT || ctrl_cpu_rev[1].set  CPU_VPID)
ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
 
diff --git a/x86/vmx.h b/x86/vmx.h
index dba8b20..2784ac6 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -354,12 +354,17 @@ enum Ctrl0 {
CPU_INTR_WINDOW = 1ul  2,
CPU_HLT = 1ul  7,
CPU_INVLPG  = 1ul  9,
+   CPU_MWAIT   = 1ul  10,
+   CPU_RDPMC   = 1ul  11,
+   CPU_RDTSC   = 1ul  12,
CPU_CR3_LOAD= 1ul  15,
CPU_CR3_STORE   = 1ul  16,
CPU_TPR_SHADOW  = 1ul  21,
CPU_NMI_WINDOW  = 1ul  22,
CPU_IO  = 1ul  24,
CPU_IO_BITMAP   = 1ul  25,
+   CPU_MONITOR = 1ul  29,
+   CPU_PAUSE   = 1ul  30,
CPU_MSR_BITMAP  = 1ul  28,
CPU_SECONDARY   = 1ul  31,
 };
@@ -368,6 +373,8 @@ enum Ctrl1 {
CPU_EPT = 1ul  1,
CPU_VPID= 1ul  5,
CPU_URG = 1ul  7,
+   CPU_WBINVD  = 1ul  6,
+   CPU_RDRAND  = 1ul  11,
 };
 
 #define SAVE_GPR   \
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index cd4dd99..be3e3b4 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -22,6 +22,16 @@ static inline void set_stage(u32 s)
barrier();
 }
 
+static inline u32 get_stage()
+{
+   u32 s;
+
+   barrier();
+   s = stage;
+   barrier();
+   return s;
+}
+
 void basic_init()
 {
 }
@@ -630,6 +640,146 @@ static int iobmp_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+#define INSN_CPU0  0
+#define INSN_CPU1  1
+#define INSN_ALWAYS_TRAP   2
+#define INSN_NEVER_TRAP3
+
+#define FIELD_EXIT_QUAL0
+#define FIELD_INSN_INFO1
+
+asm(
+   insn_hlt: hlt;ret\n\t
+   insn_invlpg: invlpg 0x12345678;ret\n\t
+   insn_mwait: mwait;ret\n\t
+   insn_rdpmc: rdpmc;ret\n\t
+   insn_rdtsc: rdtsc;ret\n\t
+   insn_monitor: monitor;ret\n\t
+   insn_pause: pause;ret\n\t
+   insn_wbinvd: wbinvd;ret\n\t
+   insn_cpuid: cpuid;ret\n\t
+   insn_invd: invd;ret\n\t
+);
+extern void insn_hlt();
+extern void insn_invlpg();
+extern void insn_mwait();
+extern void insn_rdpmc();
+extern void insn_rdtsc();
+extern void insn_monitor();
+extern void insn_pause();
+extern void insn_wbinvd();
+extern void insn_cpuid();
+extern void insn_invd();
+
+u32 cur_insn;
+
+struct insn_table {
+   const char *name;
+   u32 flag;
+   void (*insn_func)();
+   u32 type;
+   u32 reason;
+   ulong exit_qual;
+   u32 insn_info;
+   // Use FIELD_EXIT_QUAL and FIELD_INSN_INFO to efines
+   // which field need to be tested, reason is always tested
+   u32 test_field;
+};
+
+static struct insn_table insn_table[] = {
+   // Flags for Primary Processor-Based VM-Execution Controls
+   {HLT,  CPU_HLT, insn_hlt, INSN_CPU0, 12, 0, 0, 0},
+   {INVLPG, CPU_INVLPG, insn_invlpg, INSN_CPU0, 14,
+   0x12345678, 0, FIELD_EXIT_QUAL},
+   {MWAIT, CPU_MWAIT, insn_mwait, INSN_CPU0, 36, 0, 0, 0},
+   {RDPMC, CPU_RDPMC, insn_rdpmc, INSN_CPU0, 15, 0, 0, 0},
+   {RDTSC, CPU_RDTSC, insn_rdtsc, INSN_CPU0, 16, 0, 0, 0},
+   {MONITOR, CPU_MONITOR, insn_monitor, INSN_CPU0, 39, 0, 0, 0},
+   {PAUSE, CPU_PAUSE, insn_pause, INSN_CPU0, 40, 0, 0, 0},
+   // Flags for Secondary Processor-Based VM-Execution Controls
+   {WBINVD, CPU_WBINVD, insn_wbinvd, INSN_CPU1, 54, 0, 0, 0},
+   // Instructions always trap
+   {CPUID, 0, insn_cpuid, INSN_ALWAYS_TRAP, 10, 0, 0, 0},
+   {INVD, 0, insn_invd, INSN_ALWAYS_TRAP, 13, 0, 0, 0},
+   // Instructions never trap
+   {NULL},
+};
+
+static void insn_intercept_init()
+{
+   u32 ctrl_cpu[2

[PATCH v2 0/4] kvm-unit-tests: Add a series of test cases

2013-08-15 Thread Arthur Chunqi Li

Add a series of test cases for nested VMX in kvm-unit-tests.

Arthur Chunqi Li (4):
  kvm-unit-tests: VMX: Add test cases for PAT and EFER
  kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing
  kvm-unit-tests: VMX: Add test cases for I/O bitmaps
  kvm-unit-tests: VMX: Add test cases for instruction  interception

 lib/x86/vm.h|4 +
 x86/vmx.c   |3 +-
 x86/vmx.h   |   20 +-
 x86/vmx_tests.c |  714 +++
 4 files changed, 736 insertions(+), 5 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] kvm-unit-tests: Add a series of test cases

2013-08-13 Thread Arthur Chunqi Li

Add a series of test cases for nested VMX in kvm-unit-tests.

Arthur Chunqi Li (4):
  kvm-unit-tests: VMX: Add test cases for PAT and EFER
  kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing
  kvm-unit-tests: VMX: Add test cases for I/O bitmaps
  kvm-unit-tests: VMX: Add test cases for instruction interception

 lib/x86/vm.h|4 +
 x86/vmx.c   |3 +-
 x86/vmx.h   |   20 +-
 x86/vmx_tests.c |  687 +++
 4 files changed, 709 insertions(+), 5 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] kvm-unit-tests: VMX: Add test cases for I/O bitmaps

2013-08-13 Thread Arthur Chunqi Li

Add test cases for I/O bitmaps, including corner cases.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.h   |6 +-
 x86/vmx_tests.c |  167 +++
 2 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/x86/vmx.h b/x86/vmx.h
index 18961f1..dba8b20 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -417,15 +417,15 @@ enum Ctrl1 {
popf\n\t
 
 #define VMX_IO_SIZE_MASK   0x7
-#define _VMX_IO_BYTE   1
-#define _VMX_IO_WORD   2
+#define _VMX_IO_BYTE   0
+#define _VMX_IO_WORD   1
 #define _VMX_IO_LONG   3
 #define VMX_IO_DIRECTION_MASK  (1ul  3)
 #define VMX_IO_IN  (1ul  3)
 #define VMX_IO_OUT 0
 #define VMX_IO_STRING  (1ul  4)
 #define VMX_IO_REP (1ul  5)
-#define VMX_IO_OPRAND_DX   (1ul  6)
+#define VMX_IO_OPRAND_IMM  (1ul  6)
 #define VMX_IO_PORT_MASK   0x
 #define VMX_IO_PORT_SHIFT  16
 
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 44be3f4..ad28c4c 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -2,10 +2,13 @@
 #include msr.h
 #include processor.h
 #include vm.h
+#include io.h
 
 u64 ia32_pat;
 u64 ia32_efer;
 u32 stage;
+void *io_bitmap_a, *io_bitmap_b;
+u16 ioport;
 
 static inline void vmcall()
 {
@@ -473,6 +476,168 @@ static int cr_shadowing_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+static void iobmp_init()
+{
+   u32 ctrl_cpu0;
+
+   io_bitmap_a = alloc_page();
+   io_bitmap_a = alloc_page();
+   memset(io_bitmap_a, 0x0, PAGE_SIZE);
+   memset(io_bitmap_b, 0x0, PAGE_SIZE);
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu0 |= CPU_IO_BITMAP;
+   ctrl_cpu0 = (~CPU_IO);
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
+   vmcs_write(IO_BITMAP_A, (u64)io_bitmap_a);
+   vmcs_write(IO_BITMAP_B, (u64)io_bitmap_b);
+}
+
+static void iobmp_main()
+{
+/*
+   data = (u8 *)io_bitmap_b;
+   ioport = 0x;
+   data[(ioport - 0x8000) /8] |= (1  (ioport % 8));
+   inb(ioport);
+   outb(0, ioport);
+*/
+   // stage 0, test IO pass
+   set_stage(0);
+   inb(0x5000);
+   outb(0x0, 0x5000);
+   if (stage != 0)
+   report(I/O bitmap - I/O pass, 0);
+   else
+   report(I/O bitmap - I/O pass, 1);
+   // test IO width, in/out
+   ((u8 *)io_bitmap_a)[0] = 0xFF;
+   set_stage(2);
+   inb(0x0);
+   if (stage != 3)
+   report(I/O bitmap - trap in, 0);
+   else
+   report(I/O bitmap - trap in, 1);
+   set_stage(3);
+   outw(0x0, 0x0);
+   if (stage != 4)
+   report(I/O bitmap - trap out, 0);
+   else
+   report(I/O bitmap - trap out, 1);
+   set_stage(4);
+   inl(0x0);
+   // test low/high IO port
+   set_stage(5);
+   ((u8 *)io_bitmap_a)[0x5000 / 8] = (1  (0x5000 % 8));
+   inb(0x5000);
+   if (stage == 6)
+   report(I/O bitmap - I/O port, low part, 1);
+   else
+   report(I/O bitmap - I/O port, low part, 0);
+   set_stage(6);
+   ((u8 *)io_bitmap_b)[0x1000 / 8] = (1  (0x1000 % 8));
+   inb(0x9000);
+   if (stage == 7)
+   report(I/O bitmap - I/O port, high part, 1);
+   else
+   report(I/O bitmap - I/O port, high part, 0);
+   // test partial pass
+   set_stage(7);
+   inl(0x4FFF);
+   if (stage == 8)
+   report(I/O bitmap - partial pass, 1);
+   else
+   report(I/O bitmap - partial pass, 0);
+   // test overrun
+   set_stage(8);
+   memset(io_bitmap_b, 0xFF, PAGE_SIZE);
+   inl(0x);
+   memset(io_bitmap_b, 0x0, PAGE_SIZE);
+   if (stage == 9)
+   report(I/O bitmap - overrun, 1);
+   else
+   report(I/O bitmap - overrun, 0);
+   
+   return;
+}
+
+static int iobmp_exit_handler()
+{
+   u64 guest_rip;
+   ulong reason, exit_qual;
+   u32 insn_len;
+   //u32 ctrl_cpu0;
+
+   guest_rip = vmcs_read(GUEST_RIP);
+   reason = vmcs_read(EXI_REASON)  0xff;
+   exit_qual = vmcs_read(EXI_QUALIFICATION);
+   insn_len = vmcs_read(EXI_INST_LEN);
+   switch (reason) {
+   case VMX_IO:
+   switch (stage) {
+   case 2:
+   if ((exit_qual  VMX_IO_SIZE_MASK) != _VMX_IO_BYTE)
+   report(I/O bitmap - I/O width, byte, 0);
+   else
+   report(I/O bitmap - I/O width, byte, 1);
+   if (!(exit_qual  VMX_IO_IN))
+   report(I/O bitmap - I/O direction, in, 0);
+   else
+   report(I/O bitmap - I/O direction, in, 1);
+   set_stage(stage + 1

[PATCH 1/4] kvm-unit-tests: VMX: Add test cases for PAT and EFER

2013-08-13 Thread Arthur Chunqi Li

Add test cases for ENT_LOAD_PAT, ENT_LOAD_EFER, EXI_LOAD_PAT,
EXI_SAVE_PAT, EXI_LOAD_EFER, EXI_SAVE_PAT flags in enter/exit
control fields.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.h   |7 +++
 x86/vmx_tests.c |  185 +++
 2 files changed, 192 insertions(+)

diff --git a/x86/vmx.h b/x86/vmx.h
index 28595d8..18961f1 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -152,10 +152,12 @@ enum Encoding {
GUEST_DEBUGCTL  = 0x2802ul,
GUEST_DEBUGCTL_HI   = 0x2803ul,
GUEST_EFER  = 0x2806ul,
+   GUEST_PAT   = 0x2804ul,
GUEST_PERF_GLOBAL_CTRL  = 0x2808ul,
GUEST_PDPTE = 0x280aul,
 
/* 64-Bit Host State */
+   HOST_PAT= 0x2c00ul,
HOST_EFER   = 0x2c02ul,
HOST_PERF_GLOBAL_CTRL   = 0x2c04ul,
 
@@ -330,11 +332,15 @@ enum Ctrl_exi {
EXI_HOST_64 = 1UL  9,
EXI_LOAD_PERF   = 1UL  12,
EXI_INTA= 1UL  15,
+   EXI_SAVE_PAT= 1UL  18,
+   EXI_LOAD_PAT= 1UL  19,
+   EXI_SAVE_EFER   = 1UL  20,
EXI_LOAD_EFER   = 1UL  21,
 };
 
 enum Ctrl_ent {
ENT_GUEST_64= 1UL  9,
+   ENT_LOAD_PAT= 1UL  14,
ENT_LOAD_EFER   = 1UL  15,
 };
 
@@ -354,6 +360,7 @@ enum Ctrl0 {
CPU_NMI_WINDOW  = 1ul  22,
CPU_IO  = 1ul  24,
CPU_IO_BITMAP   = 1ul  25,
+   CPU_MSR_BITMAP  = 1ul  28,
CPU_SECONDARY   = 1ul  31,
 };
 
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index c1b39f4..61b0cef 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,4 +1,15 @@
 #include vmx.h
+#include msr.h
+#include processor.h
+#include vm.h
+
+u64 ia32_pat;
+u64 ia32_efer;
+
+static inline void vmcall()
+{
+   asm volatile(vmcall);
+}
 
 void basic_init()
 {
@@ -76,6 +87,176 @@ int vmenter_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+void msr_bmp_init()
+{
+   void *msr_bitmap;
+   u32 ctrl_cpu0;
+
+   msr_bitmap = alloc_page();
+   memset(msr_bitmap, 0x0, PAGE_SIZE);
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu0 |= CPU_MSR_BITMAP;
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
+   vmcs_write(MSR_BITMAP, (u64)msr_bitmap);
+}
+
+static void test_ctrl_pat_init()
+{
+   u64 ctrl_ent;
+   u64 ctrl_exi;
+
+   msr_bmp_init();
+   ctrl_ent = vmcs_read(ENT_CONTROLS);
+   ctrl_exi = vmcs_read(EXI_CONTROLS);
+   vmcs_write(ENT_CONTROLS, ctrl_ent | ENT_LOAD_PAT);
+   vmcs_write(EXI_CONTROLS, ctrl_exi | (EXI_SAVE_PAT | EXI_LOAD_PAT));
+   ia32_pat = rdmsr(MSR_IA32_CR_PAT);
+   vmcs_write(GUEST_PAT, 0x0);
+   vmcs_write(HOST_PAT, ia32_pat);
+}
+
+static void test_ctrl_pat_main()
+{
+   u64 guest_ia32_pat;
+
+   guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT);
+   if (!(ctrl_enter_rev.clr  ENT_LOAD_PAT))
+   printf(\tENT_LOAD_PAT is not supported.\n);
+   else {
+   if (guest_ia32_pat != 0) {
+   report(Entry load PAT, 0);
+   return;
+   }
+   }
+   wrmsr(MSR_IA32_CR_PAT, 0x6);
+   vmcall();
+   guest_ia32_pat = rdmsr(MSR_IA32_CR_PAT);
+   if (ctrl_enter_rev.clr  ENT_LOAD_PAT) {
+   if (guest_ia32_pat != ia32_pat) {
+   report(Entry load PAT, 0);
+   return;
+   }
+   report(Entry load PAT, 1);
+   }
+}
+
+static int test_ctrl_pat_exit_handler()
+{
+   u64 guest_rip;
+   ulong reason;
+   u64 guest_pat;
+
+   guest_rip = vmcs_read(GUEST_RIP);
+   reason = vmcs_read(EXI_REASON)  0xff;
+   switch (reason) {
+   case VMX_VMCALL:
+   guest_pat = vmcs_read(GUEST_PAT);
+   if (!(ctrl_exit_rev.clr  EXI_SAVE_PAT)) {
+   printf(\tEXI_SAVE_PAT is not supported\n);
+   vmcs_write(GUEST_PAT, 0x6);
+   } else {
+   if (guest_pat == 0x6)
+   report(Exit save PAT, 1);
+   else
+   report(Exit save PAT, 0);
+   }
+   if (!(ctrl_exit_rev.clr  EXI_LOAD_PAT))
+   printf(\tEXI_LOAD_PAT is not supported\n);
+   else {
+   if (rdmsr(MSR_IA32_CR_PAT) == ia32_pat)
+   report(Exit load PAT, 1);
+   else
+   report(Exit load PAT, 0);
+   }
+   vmcs_write(GUEST_PAT, ia32_pat);
+   vmcs_write(GUEST_RIP, guest_rip + 3);
+   return VMX_TEST_RESUME;
+   default:
+   printf(ERROR : Undefined exit reason, reason = %d.\n, reason);
+   break;
+   }
+   return

[PATCH 2/4] kvm-unit-tests: VMX: Add test cases for CR0/4 shadowing

2013-08-13 Thread Arthur Chunqi Li

Add testing for CR0/4 shadowing.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 lib/x86/vm.h|4 +
 x86/vmx_tests.c |  218 +++
 2 files changed, 222 insertions(+)

diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index eff6f72..6e0ce2b 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -17,9 +17,13 @@
 #define PTE_ADDR(0xff000ull)
 
 #define X86_CR0_PE  0x0001
+#define X86_CR0_MP  0x0002
+#define X86_CR0_TS  0x0008
 #define X86_CR0_WP  0x0001
 #define X86_CR0_PG  0x8000
 #define X86_CR4_VMXE   0x0001
+#define X86_CR4_TSD 0x0004
+#define X86_CR4_DE  0x0008
 #define X86_CR4_PSE 0x0010
 #define X86_CR4_PAE 0x0020
 #define X86_CR4_PCIDE  0x0002
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 61b0cef..44be3f4 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -5,12 +5,18 @@
 
 u64 ia32_pat;
 u64 ia32_efer;
+u32 stage;
 
 static inline void vmcall()
 {
asm volatile(vmcall);
 }
 
+static inline void set_stage(u32 s)
+{
+   asm volatile(mov %0, stage\n\t::r(s):memory, cc);
+}
+
 void basic_init()
 {
 }
@@ -257,6 +263,216 @@ static int test_ctrl_efer_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+u32 guest_cr0, guest_cr4;
+
+static void cr_shadowing_main()
+{
+   u32 cr0, cr4, tmp;
+
+   // Test read through
+   set_stage(0);
+   guest_cr0 = read_cr0();
+   if (stage == 1)
+   report(Read through CR0, 0);
+   else
+   vmcall();
+   set_stage(1);
+   guest_cr4 = read_cr4();
+   if (stage == 2)
+   report(Read through CR4, 0);
+   else
+   vmcall();
+   // Test write through
+   guest_cr0 = guest_cr0 ^ (X86_CR0_TS | X86_CR0_MP);
+   guest_cr4 = guest_cr4 ^ (X86_CR4_TSD | X86_CR4_DE);
+   set_stage(2);
+   write_cr0(guest_cr0);
+   if (stage == 3)
+   report(Write throuth CR0, 0);
+   else
+   vmcall();
+   set_stage(3);
+   write_cr4(guest_cr4);
+   if (stage == 4)
+   report(Write through CR4, 0);
+   else
+   vmcall();
+   // Test read shadow
+   set_stage(4);
+   vmcall();
+   cr0 = read_cr0();
+   if (stage != 5) {
+   if (cr0 == guest_cr0)
+   report(Read shadowing CR0, 1);
+   else
+   report(Read shadowing CR0, 0);
+   }
+   set_stage(5);
+   cr4 = read_cr4();
+   if (stage != 6) {
+   if (cr4 == guest_cr4)
+   report(Read shadowing CR4, 1);
+   else
+   report(Read shadowing CR4, 0);
+   }
+   // Test write shadow (same value with shadow)
+   set_stage(6);
+   write_cr0(guest_cr0);
+   if (stage == 7)
+   report(Write shadowing CR0 (same value with shadow), 0);
+   else
+   vmcall();
+   set_stage(7);
+   write_cr4(guest_cr4);
+   if (stage == 8)
+   report(Write shadowing CR4 (same value with shadow), 0);
+   else
+   vmcall();
+   // Test write shadow (different value)
+   set_stage(8);
+   tmp = guest_cr0 ^ X86_CR0_TS;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr0\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 9)
+   report(Write shadowing different X86_CR0_TS, 0);
+   else
+   report(Write shadowing different X86_CR0_TS, 1);
+   set_stage(9);
+   tmp = guest_cr0 ^ X86_CR0_MP;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr0\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 10)
+   report(Write shadowing different X86_CR0_MP, 0);
+   else
+   report(Write shadowing different X86_CR0_MP, 1);
+   set_stage(10);
+   tmp = guest_cr4 ^ X86_CR4_TSD;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr4\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 11)
+   report(Write shadowing different X86_CR4_TSD, 0);
+   else
+   report(Write shadowing different X86_CR4_TSD, 1);
+   set_stage(11);
+   tmp = guest_cr4 ^ X86_CR4_DE;
+   asm volatile(mov %0, %%rsi\n\t
+   mov %%rsi, %%cr4\n\t
+   ::m(tmp)
+   :rsi, memory, cc);
+   if (stage != 12)
+   report(Write shadowing different X86_CR4_DE, 0);
+   else
+   report(Write shadowing different X86_CR4_DE, 1);
+}
+
+static int cr_shadowing_exit_handler()
+{
+   u64 guest_rip;
+   ulong reason;
+   u32 insn_len;
+   u32 exit_qual;
+
+   guest_rip = vmcs_read(GUEST_RIP);
+   reason = vmcs_read(EXI_REASON)  0xff;
+   insn_len = vmcs_read(EXI_INST_LEN);
+   exit_qual = vmcs_read

[PATCH 4/4] kvm-unit-tests: VMX: Add test cases for instruction interception

2013-08-13 Thread Arthur Chunqi Li

Add test cases for instruction interception, including three types:
1. Primary Processor-Based VM-Execution Controls (HLT/INVLPG/MWAIT/
RDPMC/RDTSC/MONITOR/PAUSE)
2. Secondary Processor-Based VM-Execution Controls (WBINVD)
3. No control flag (CPUID/INVD)

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/vmx.c   |3 +-
 x86/vmx.h   |7 
 x86/vmx_tests.c |  117 +++
 3 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index ca36d35..c346070 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -336,8 +336,7 @@ static void init_vmx(void)
: MSR_IA32_VMX_ENTRY_CTLS);
ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
: MSR_IA32_VMX_PROCBASED_CTLS);
-   if (ctrl_cpu_rev[0].set  CPU_SECONDARY)
-   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
+   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
if (ctrl_cpu_rev[1].set  CPU_EPT || ctrl_cpu_rev[1].set  CPU_VPID)
ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
 
diff --git a/x86/vmx.h b/x86/vmx.h
index dba8b20..d81d25d 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -354,6 +354,9 @@ enum Ctrl0 {
CPU_INTR_WINDOW = 1ul  2,
CPU_HLT = 1ul  7,
CPU_INVLPG  = 1ul  9,
+   CPU_MWAIT   = 1ul  10,
+   CPU_RDPMC   = 1ul  11,
+   CPU_RDTSC   = 1ul  12,
CPU_CR3_LOAD= 1ul  15,
CPU_CR3_STORE   = 1ul  16,
CPU_TPR_SHADOW  = 1ul  21,
@@ -361,6 +364,8 @@ enum Ctrl0 {
CPU_IO  = 1ul  24,
CPU_IO_BITMAP   = 1ul  25,
CPU_MSR_BITMAP  = 1ul  28,
+   CPU_MONITOR = 1ul  29,
+   CPU_PAUSE   = 1ul  30,
CPU_SECONDARY   = 1ul  31,
 };
 
@@ -368,6 +373,8 @@ enum Ctrl1 {
CPU_EPT = 1ul  1,
CPU_VPID= 1ul  5,
CPU_URG = 1ul  7,
+   CPU_WBINVD  = 1ul  6,
+   CPU_RDRAND  = 1ul  11,
 };
 
 #define SAVE_GPR   \
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index ad28c4c..66187f4 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -20,6 +20,13 @@ static inline void set_stage(u32 s)
asm volatile(mov %0, stage\n\t::r(s):memory, cc);
 }
 
+static inline u32 get_stage()
+{
+   u32 s;
+   asm volatile(mov stage, %0\n\t:=r(s)::memory, cc);
+   return s;
+}
+
 void basic_init()
 {
 }
@@ -638,6 +645,114 @@ static int iobmp_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+asm(
+   insn_hlt: hlt;ret\n\t
+   insn_invlpg: invlpg 0x12345678;ret\n\t
+   insn_mwait: mwait;ret\n\t
+   insn_rdpmc: rdpmc;ret\n\t
+   insn_rdtsc: rdtsc;ret\n\t
+   insn_monitor: monitor;ret\n\t
+   insn_pause: pause;ret\n\t
+   insn_wbinvd: wbinvd;ret\n\t
+   insn_cpuid: cpuid;ret\n\t
+   insn_invd: invd;ret\n\t
+);
+extern void insn_hlt();
+extern void insn_invlpg();
+extern void insn_mwait();
+extern void insn_rdpmc();
+extern void insn_rdtsc();
+extern void insn_monitor();
+extern void insn_pause();
+extern void insn_wbinvd();
+extern void insn_cpuid();
+extern void insn_invd();
+
+u32 cur_insn;
+
+struct insn_table {
+   const char *name;
+   u32 flag;
+   void (*insn_func)();
+   u32 type;
+   u32 reason;
+   ulong exit_qual;
+   u32 insn_info;
+};
+
+static struct insn_table insn_table[] = {
+   // Flags for Primary Processor-Based VM-Execution Controls
+   {HLT,  CPU_HLT, insn_hlt, 0, 12, 0, 0},
+   {INVLPG, CPU_INVLPG, insn_invlpg, 0, 14, 0x12345678, 0},
+   {MWAIT, CPU_MWAIT, insn_mwait, 0, 36, 0, 0},
+   {RDPMC, CPU_RDPMC, insn_rdpmc, 0, 15, 0, 0},
+   {RDTSC, CPU_RDTSC, insn_rdtsc, 0, 16, 0, 0},
+   {MONITOR, CPU_MONITOR, insn_monitor, 0, 39, 0, 0},
+   {PAUSE, CPU_PAUSE, insn_pause, 0, 40, 0, 0},
+   // Flags for Secondary Processor-Based VM-Execution Controls
+   {WBINVD, CPU_WBINVD, insn_wbinvd, 1, 54, 0, 0},
+   // Flags for Non-Processor-Based
+   {CPUID, 0, insn_cpuid, 2, 10, 0, 0},
+   {INVD, 0, insn_invd, 2, 13, 0, 0},
+   {NULL},
+};
+
+static void insn_intercept_init()
+{
+   u32 ctrl_cpu[2];
+
+   ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu[0] |= CPU_HLT | CPU_INVLPG | CPU_MWAIT | CPU_RDPMC | CPU_RDTSC 
|
+   CPU_MONITOR | CPU_PAUSE | CPU_SECONDARY;
+   ctrl_cpu[0] = ctrl_cpu_rev[0].clr;
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
+   ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
+   ctrl_cpu[1] |= CPU_WBINVD | CPU_RDRAND;
+   ctrl_cpu[1] = ctrl_cpu_rev[1].clr;
+   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]);
+}
+
+static void insn_intercept_main()
+{
+   cur_insn = 0;
+   while(insn_table[cur_insn

Corner cases of I/O bitmap

2013-08-12 Thread Arthur Chunqi Li

Hi Gleb and Paolo,
There are some corner cases when testing I/O bitmaps, and I don't know
the exact action of HW.

1. If we set bit of 0x4000 in bitmap and call inl(0x3) or
inl(0x4000) in guest, what will get of exit information?

2. What will we get when calling inl(0x) in guest with/without
“unconditional I/O exiting” VM-execution control and “use I/O bitmaps”
VM-execution control?

I test the two cases in nested env. For the first one, I got normal
exit if any of the port accessed is masked in bitmap. For the second,
it will acts the same as other ports. And the SDM says If an I/O
operation “wraps around” the 16-bit I/O-port space (accesses ports
H and H), the I/O instruction causes a VM exit. I cannot find
the exact reaction of this case.

Do you have any ideas about these?

Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] KVM: nVMX: Advertise IA32_PAT in VM exit control

2013-08-06 Thread Arthur Chunqi Li

Advertise VM_EXIT_SAVE_IA32_PAT and VM_EXIT_LOAD_IA32_PAT.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 arch/x86/kvm/vmx.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 396572d..c45adea 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2191,14 +2191,17 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 * If bit 55 of VMX_BASIC is off, bits 0-8 and 10, 11, 13, 14, 16 and
 * 17 must be 1.
 */
+   rdmsr(MSR_IA32_VMX_EXIT_CTLS,
+   nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high);
nested_vmx_exit_ctls_low = VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
+   nested_vmx_exit_ctls_high =
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
+   VM_EXIT_HOST_ADDR_SPACE_SIZE;
/* Note that guest use of VM_EXIT_ACK_INTR_ON_EXIT is not supported. */
-#ifdef CONFIG_X86_64
-   nested_vmx_exit_ctls_high = VM_EXIT_HOST_ADDR_SPACE_SIZE;
-#else
-   nested_vmx_exit_ctls_high = 0;
+#ifndef CONFIG_X86_64
+   nested_vmx_exit_ctls_high = (~VM_EXIT_HOST_ADDR_SPACE_SIZE);
 #endif
-   nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
 
/* entry controls */
rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] nVMX: Keep arch.pat in sync on L1-L2 switches

2013-08-06 Thread Arthur Chunqi Li

On Sun, Aug 4, 2013 at 11:17 PM, Jan Kiszka jan.kis...@web.de wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 When asking vmx to load the PAT MSR for us while switching from L1 to L2
 or vice versa, we have to update arch.pat as well as it may later be
 used again to load or read out the MSR content.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Tested-by: Arthur Chunqi Li yzt...@gmail.com
Should cooperate with patch
http://www.mail-archive.com/kvm@vger.kernel.org/msg94349.html,
VM_EXIT_SAVE_IA32_PAT and VM_EXIT_LOAD_IA32_PAT should be advertised.

 ---

 Arthur, please add your tested-by also officially.

  arch/x86/kvm/vmx.c |9 ++---
  1 files changed, 6 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 45fd70c..396572d 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7535,9 +7535,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
 vmcs_write32(VM_ENTRY_CONTROLS, vmcs12-vm_entry_controls |
 (vmcs_config.vmentry_ctrl  ~VM_ENTRY_IA32E_MODE));

 -   if (vmcs12-vm_entry_controls  VM_ENTRY_LOAD_IA32_PAT)
 +   if (vmcs12-vm_entry_controls  VM_ENTRY_LOAD_IA32_PAT) {
 vmcs_write64(GUEST_IA32_PAT, vmcs12-guest_ia32_pat);
 -   else if (vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_IA32_PAT)
 +   vcpu-arch.pat = vmcs12-guest_ia32_pat;
 +   } else if (vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_IA32_PAT)
 vmcs_write64(GUEST_IA32_PAT, vmx-vcpu.arch.pat);


 @@ -8025,8 +8026,10 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
 *vcpu,
 vmcs_writel(GUEST_IDTR_BASE, vmcs12-host_idtr_base);
 vmcs_writel(GUEST_GDTR_BASE, vmcs12-host_gdtr_base);

 -   if (vmcs12-vm_exit_controls  VM_EXIT_LOAD_IA32_PAT)
 +   if (vmcs12-vm_exit_controls  VM_EXIT_LOAD_IA32_PAT) {
 vmcs_write64(GUEST_IA32_PAT, vmcs12-host_ia32_pat);
 +   vcpu-arch.pat = vmcs12-host_ia32_pat;
 +   }
 if (vmcs12-vm_exit_controls  VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
 vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL,
 vmcs12-host_ia32_perf_global_ctrl);
 --
 1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 203 matches

Mail list logo