Re: [PATCH 1/1] KVM: ioapic: Record edge-triggered interrupts delivery status.
2014-12-24 23:29 GMT+08:00 Jan Kiszka jan.kis...@web.de: On 2014-12-24 04:14, Wincy Van wrote: This patch fixes the bug discussed in https://www.mail-archive.com/kvm@vger.kernel.org/msg109813.html This patch uses a new field named irr_delivered to record the delivery status of edge-triggered interrupts, and clears the delivered interrupts in kvm_get_ioapic. So it has the same effect of commit 0bc830b05c667218d703f2026ec866c49df974fc (KVM: ioapic: clear IRR for edge-triggered interrupts at delivery) while avoids the bug of Windows guests. Signed-off-by: Wincy Van fanwenyi0...@gmail.com --- arch/x86/kvm/ioapic.c |7 ++- arch/x86/kvm/ioapic.h |1 + 2 files changed, 7 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c index b1947e0..a2e9d96 100644 --- a/arch/x86/kvm/ioapic.c +++ b/arch/x86/kvm/ioapic.c @@ -206,6 +206,8 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, old_irr = ioapic-irr; ioapic-irr |= mask; + if (edge) + ioapic-irr_delivered = ~mask; if ((edge old_irr == ioapic-irr) || (!edge entry.fields.remote_irr)) { ret = 0; @@ -349,7 +351,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status) irqe.shorthand = 0; if (irqe.trig_mode == IOAPIC_EDGE_TRIG) - ioapic-irr = ~(1 irq); + ioapic-irr_delivered |= 1 irq; if (irq == RTC_GSI line_status) { /* @@ -597,6 +599,7 @@ static void kvm_ioapic_reset(struct kvm_ioapic *ioapic) ioapic-base_address = IOAPIC_DEFAULT_BASE_ADDRESS; ioapic-ioregsel = 0; ioapic-irr = 0; + ioapic-irr_delivered = 0; ioapic-id = 0; memset(ioapic-irq_eoi, 0x00, IOAPIC_NUM_PINS); rtc_irq_eoi_tracking_reset(ioapic); @@ -654,6 +657,7 @@ int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state) spin_lock(ioapic-lock); memcpy(state, ioapic, sizeof(struct kvm_ioapic_state)); + state-irr = ~ioapic-irr_delivered; spin_unlock(ioapic-lock); return 0; } @@ -667,6 +671,7 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state) spin_lock(ioapic-lock); memcpy(ioapic, state, sizeof(struct kvm_ioapic_state)); ioapic-irr = 0; + ioapic-irr_delivered = 0; update_handled_vectors(ioapic); kvm_vcpu_request_scan_ioapic(kvm); kvm_ioapic_inject_all(ioapic, state-irr); diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index 3c91955..a5cdfc0 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -77,6 +77,7 @@ struct kvm_ioapic { struct rtc_status rtc_status; struct delayed_work eoi_inject; u32 irq_eoi[IOAPIC_NUM_PINS]; + u32 irr_delivered; }; #ifdef DEBUG Does this introduce a state which requires save/restore on migration? If so, then you need to extend the existing interface - in a backward-compatible way. If not, please leave a remark on the reason. No, we do not need to save/restore irr_delivered. First of all, irr_delivered affects irr only when saving ioapic's state, it does not affect any of the ioapic's logic. Then, let's see what will happen if we do not save/restore that field : 1. If irr_delivered is 0 before saving, it is no difference at all. 2. If a bit of irr_delivered is 1 before saving, since irr_delivered only affects migration, we should check that if the 2nd+ migration is OK. There are 3 possibilities on the first destination : (1) The edge-triggered IRQ is masked, that bit will be cleared, it is no difference to do 2nd migration. (2) The edge-triggered IRQ is raised and not masked, that bit will be set again, it is no difference to do 2nd migration. (3) The edge-triggered IRQ is lowered, and the corresponding bit of irr is cleared, the result of state-irr = ~ioapic-irr_delivered in kvm_get_ioapic is not affected by irr_delivered. So it is OK to migrate with a stateless irr_delivered. Thanks, Wincy Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bisected] KVM in 3.19-rc1 is completely broken
On 2014/12/25 8:55, Chen, Tiejun wrote: On 2014/12/25 1:11, Andy Lutomirski wrote: On Wed, Dec 24, 2014 at 12:23 AM, Chen, Tiejun tiejun.c...@intel.com wrote: On 2014/12/24 5:29, Andy Lutomirski wrote: On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: I can reproduce it using the same steps on a Sandy Bridge laptop, with whatever QEMU is packaged in Fedora 21. I attached the config. I also submitted a virtme update for Fedora Rawhide and 21 (20 is still building) in case it helps. The build is here: http://koji.fedoraproject.org/koji/buildinfo?buildID=600732 The other reporter bisected it to 0e60b0799fedc495a5c57dbd669de3c10d72edd2. Can you try its parent? That's what I bisected it to. The parent works. Also, does it break with 3.18 host and 3.19-rc1 guest, or with 3.19-rc1 host and 3.18 guest? (Sorry I should do this myself but I'm a bit swamped due to vacation until Jan 6th). The breakage is with 3.17.7-something L0 and the same test kernel as L1 and L2. I think it breaks the same way with 3.19-rc1 as host and guest without any nesting, but that's awkward to test right now. Andy, Could you try this? Signed-off-by: Tiejun Chen tiejun.c...@intel.com I applied it by hand, and it survives extremely light testing. Tested-by: Andy Lutomirski l...@amacapital.net Looks good so thanks for your validation. I refine that I posted that fix in another thread since looks that will broken !next case. And I myself already run those test instructions you showed previously, now looks good. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
On 2014/12/25 8:52, Nadav Amit wrote: Although pop sreg updates RSP according to the operand size, only 2 bytes are read. The current behavior may result in incorrect #GP or #PF exceptions. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e5a84be..702da5e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt) unsigned long selector; int rc; Looks we just should do similar thing to em_push_sreg(), unsigned long selector; int rc; + if (ctxt-op_bytes == 4) { + rsp_increment(ctxt, -2); + ctxt-op_bytes = 2; + } rc = emulate_pop(ctxt, selector, ctxt-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; Right? Thanks Tiejun - rc = emulate_pop(ctxt, selector, ctxt-op_bytes); + rc = emulate_pop(ctxt, selector, 2); if (rc != X86EMUL_CONTINUE) return rc; if (ctxt-modrm_reg == VCPU_SREG_SS) ctxt-interruptibility = KVM_X86_SHADOW_INT_MOV_SS; + if (ctxt-op_bytes 2) + rsp_increment(ctxt, ctxt-op_bytes - 2); rc = load_segment_descriptor(ctxt, (u16)selector, seg); return rc; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
Tiejun tiejun.c...@intel.com wrote: On 2014/12/25 8:52, Nadav Amit wrote: Although pop sreg updates RSP according to the operand size, only 2 bytes are read. The current behavior may result in incorrect #GP or #PF exceptions. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e5a84be..702da5e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt) unsigned long selector; int rc; Looks we just should do similar thing to em_push_sreg(), unsigned long selector; int rc; + if (ctxt-op_bytes == 4) { + rsp_increment(ctxt, -2); + ctxt-op_bytes = 2; + } rc = emulate_pop(ctxt, selector, ctxt-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; Right? I don’t think so. It seems the behaviour of push and pop is a bit different. For push: “If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits ... all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified.” Therefore, for push in the case of op_bytes==8 we push zero-extended value. For pop the behaviour is not well-documented, but experimentally it appears only the first two bytes are accessed. I cannot see why it would be different when opsize is 8, since it is not like the push case, where the segment register value was zero extended. If you feel strongly about it, I’ll create a unit test. Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
Chen, Tiejun wrote: On 2014/12/24 19:02, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 15:26, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 9:50, Chen, Tiejun wrote: On 2014/12/22 17:23, Jamie Heilman wrote: KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Looks just now Andy found that commit, 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule from size to GFN, maybe you can try to revert this to try yours again. That doesn't revert cleanly for me, and I don't have much time to fiddle with it until the 24th---so checked out the commit before it (d4ae84a0), applied your patch, built, and yes, everything works fine at that point. I'll probably have time for another full bisection later, assuming things aren't ironed out already by then. 3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK 3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2 as well. Could you try this to fix your last error? Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works, my real (headless) kvm guests work, but this new patch makes running qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with Are you sure? From my test based on 3.19-rc1 that it owns top commit, aa39477b5692611b91ac9455ae588738852b3f60 just plus my previous patch, kvm: x86: vmx: reorder some msr writing I already can execute such a command successfully, qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img And your log below seems not to relate mem_slot issue we're discussing, I guess you need to update qemu as well. Yes, I'm sure. But I also found my new patch just work out Andy's next case, its really bringing a new issue in !next case. So I tried to refine that patch again as follows, This latest patch (again, after fixing all the whitespace so it actually applies), does the trick. Both qemu-system-x86_64 -machine pc,accel=kvm and qemu-system-x86_64 -machine pc,accel=kvm -nodefaults work for me now without any of the aforementioned warnings from the host. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- virt/kvm/kvm_main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..910bc48 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + ((new-base_gfn mslots[i - 1].base_gfn) || +(!new-base_gfn + !mslots[i - 1].base_gfn !mslots[i - 1].npages))) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; Tiejun errors in the host to the tune of: [ cut here ] WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]() Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc
[PATCH v2 2/2] vhost: relax used address alignment
virtio 1.0 only requires used address to be 4 byte aligned, vhost required 8 bytes (size of vring_used_elem). Fix up vhost to match that. Additionally, while vhost correctly requires 8 byte alignment for log, it's unconnected to used ring: it's a consequence that log has u64 entries. Tweak code to make that clearer. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/vhost.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index ed71b53..cb807d0 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -713,9 +713,13 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp) r = -EFAULT; break; } - if ((a.avail_user_addr (sizeof *vq-avail-ring - 1)) || - (a.used_user_addr (sizeof *vq-used-ring - 1)) || - (a.log_guest_addr (sizeof *vq-used-ring - 1))) { + + /* Make sure it's safe to cast pointers to vring types. */ + BUILD_BUG_ON(__alignof__ *vq-avail VRING_AVAIL_ALIGN_SIZE); + BUILD_BUG_ON(__alignof__ *vq-used VRING_USED_ALIGN_SIZE); + if ((a.avail_user_addr (VRING_AVAIL_ALIGN_SIZE - 1)) || + (a.used_user_addr (VRING_USED_ALIGN_SIZE - 1)) || + (a.log_guest_addr (sizeof(u64) - 1))) { r = -EINVAL; break; } -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/25 18:52, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/24 19:02, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 15:26, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 9:50, Chen, Tiejun wrote: On 2014/12/22 17:23, Jamie Heilman wrote: KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Looks just now Andy found that commit, 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule from size to GFN, maybe you can try to revert this to try yours again. That doesn't revert cleanly for me, and I don't have much time to fiddle with it until the 24th---so checked out the commit before it (d4ae84a0), applied your patch, built, and yes, everything works fine at that point. I'll probably have time for another full bisection later, assuming things aren't ironed out already by then. 3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK 3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2 as well. Could you try this to fix your last error? Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works, my real (headless) kvm guests work, but this new patch makes running qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with Are you sure? From my test based on 3.19-rc1 that it owns top commit, aa39477b5692611b91ac9455ae588738852b3f60 just plus my previous patch, kvm: x86: vmx: reorder some msr writing I already can execute such a command successfully, qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img And your log below seems not to relate mem_slot issue we're discussing, I guess you need to update qemu as well. Yes, I'm sure. But I also found my new patch just work out Andy's next case, its really bringing a new issue in !next case. So I tried to refine that patch again as follows, This latest patch (again, after fixing all the whitespace so it actually Next time I guess I need to post that as a attached file :) applies), does the trick. Both qemu-system-x86_64 -machine pc,accel=kvm and qemu-system-x86_64 -machine pc,accel=kvm -nodefaults work for me now without any of the aforementioned warnings from the host. Sounds great and thanks for your test again. Tiejun Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- virt/kvm/kvm_main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..910bc48 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + ((new-base_gfn mslots[i - 1].base_gfn) || +(!new-base_gfn + !mslots[i - 1].base_gfn !mslots[i - 1].npages))) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; Tiejun errors in the host to the tune of: [ cut here ] WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]() Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand
Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
On 2014/12/25 17:55, Nadav Amit wrote: Tiejun tiejun.c...@intel.com wrote: On 2014/12/25 8:52, Nadav Amit wrote: Although pop sreg updates RSP according to the operand size, only 2 bytes are read. The current behavior may result in incorrect #GP or #PF exceptions. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e5a84be..702da5e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt) unsigned long selector; int rc; Looks we just should do similar thing to em_push_sreg(), unsigned long selector; int rc; + if (ctxt-op_bytes == 4) { + rsp_increment(ctxt, -2); + ctxt-op_bytes = 2; + } rc = emulate_pop(ctxt, selector, ctxt-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; Right? I don’t think so. It seems the behaviour of push and pop is a bit different. For push: “If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits ... all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified.” Therefore, for push in the case of op_bytes==8 we push zero-extended value. For pop the behaviour is not well-documented, but experimentally it appears only the first two bytes are accessed. I cannot see why it would be Maybe we can comment something here, like /* Just force 2 byte destination to already work well in most cases. */. different when opsize is 8, since it is not like the push case, where the segment register value was zero extended. Thanks for your explanation. If you feel strongly about it, I’ll create a unit test. Based on your description I think I can stand with you at this point. Tiejun Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: fix to update memslots properly
After commit, 0e60b0799fed, kvm: change memslot sorting rule from size to GFN is introduced, we're missing but need to consider such a case, (!new-base_gfn !mslots[i - 1].base_gfn !mslots[i - 1].npages), then re-sort kvm_memslots wrong in next case to issue the following, KVM internal error. Suberror: 1 emulation failure EAX=000dee58 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fc4 EIP=000f17f4 EFL=00010012 [A--] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6c58 0037 IDT= 000f6c96 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 75 fc ff ff 89 f2 a8 10 89 d8 75 0a b9 74 17 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff 57 56 53 8b 35 38 65 0f 00 85 f6 0f 88 be 00 00 00 0f b7 f6 And we also should set flag as 0 in case of (new-npages == 0) (new-base_gfn == 0). Reported-by: Jamie Heilman ja...@audible.transient.net Tested-by: Jamie Heilman ja...@audible.transient.net Reported-by: Andy Lutomirski l...@amacapital.net Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- I test this both in Andy' case and Jamie's case. virt/kvm/kvm_main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..6e52f3f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + ((new-base_gfn mslots[i - 1].base_gfn) || + (!new-base_gfn +!mslots[i - 1].base_gfn !mslots[i - 1].npages))) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Nadav Amit Sent: Thursday, December 25, 2014 5:55 PM To: Chen, Tiejun Cc: Paolo Bonzini; kvm list Subject: Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes Tiejun tiejun.c...@intel.com wrote: On 2014/12/25 8:52, Nadav Amit wrote: Although pop sreg updates RSP according to the operand size, only 2 bytes are read. The current behavior may result in incorrect #GP or #PF exceptions. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e5a84be..702da5e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt) unsigned long selector; int rc; Looks we just should do similar thing to em_push_sreg(), unsigned long selector; int rc; + if (ctxt-op_bytes == 4) { + rsp_increment(ctxt, -2); + ctxt-op_bytes = 2; + } rc = emulate_pop(ctxt, selector, ctxt-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; Right? I don't think so. It seems the behaviour of push and pop is a bit different. For push: If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits ... all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified. Therefore, for push in the case of op_bytes==8 we push zero-extended value. For pop the behaviour is not well-documented, but experimentally it appears only the first two bytes are accessed. I cannot see why it would be different when opsize is 8, since it is not like the push case, where the segment register value was zero extended. Let's take 64-bits operand size as an example. When pushing a segment register, it uses zero-extended value, so 8 bytes will be pushed on the stack. When popping it, the current code return the top 8 bytes in the stack, and it only uses the lowest 2 bytes for load_segment_descriptor(). what is the issue here? Thanks, Feng If you feel strongly about it, I'll create a unit test. Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Nadav Amit Sent: Thursday, December 25, 2014 8:52 AM To: pbonz...@redhat.com Cc: kvm@vger.kernel.org; Nadav Amit Subject: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong When emulating an instruction that reads the destination memory operand (i.e., instructions without the Mov flag in the emulator), the operand is first read. If a page-fault is detected in this phase, the error-code which would be delivered to the VM does not indicate that the access that caused the exception is a write one. This does not conform with real hardware, and may cause the VM to enter the page-fault handler twice for no reason (once for read, once for write). Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/kvm_host.h | 12 arch/x86/kvm/emulate.c | 6 +- arch/x86/kvm/mmu.h | 12 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 73ccb12..d6f90ca 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -162,6 +162,18 @@ enum { #define DR7_FIXED_1 0x0400 #define DR7_VOLATILE 0x2bff +#define PFERR_PRESENT_BIT 0 +#define PFERR_WRITE_BIT 1 +#define PFERR_USER_BIT 2 +#define PFERR_RSVD_BIT 3 +#define PFERR_FETCH_BIT 4 + +#define PFERR_PRESENT_MASK (1U PFERR_PRESENT_BIT) +#define PFERR_WRITE_MASK (1U PFERR_WRITE_BIT) +#define PFERR_USER_MASK (1U PFERR_USER_BIT) +#define PFERR_RSVD_MASK (1U PFERR_RSVD_BIT) +#define PFERR_FETCH_MASK (1U PFERR_FETCH_BIT) + /* apic attention bits */ #define KVM_APIC_CHECK_VAPIC 0 /* diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 7a9697f..e5a84be 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) /* optimisation - avoid slow emulated read if Mov */ rc = segmented_read(ctxt, ctxt-dst.addr.mem, ctxt-dst.val, ctxt-dst.bytes); This is a write operation, do you know why we need to read the dst operand first here? Thanks, Feng - if (rc != X86EMUL_CONTINUE) + if (rc != X86EMUL_CONTINUE) { + if (rc == X86EMUL_PROPAGATE_FAULT + ctxt-exception.vector == PF_VECTOR) + ctxt-exception.error_code |= PFERR_WRITE_MASK; goto done; + } } ctxt-dst.orig_val = ctxt-dst.val; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 6b34876b..daae711 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -44,18 +44,6 @@ #define PT_DIRECTORY_LEVEL 2 #define PT_PAGE_TABLE_LEVEL 1 -#define PFERR_PRESENT_BIT 0 -#define PFERR_WRITE_BIT 1 -#define PFERR_USER_BIT 2 -#define PFERR_RSVD_BIT 3 -#define PFERR_FETCH_BIT 4 - -#define PFERR_PRESENT_MASK (1U PFERR_PRESENT_BIT) -#define PFERR_WRITE_MASK (1U PFERR_WRITE_BIT) -#define PFERR_USER_MASK (1U PFERR_USER_BIT) -#define PFERR_RSVD_MASK (1U PFERR_RSVD_BIT) -#define PFERR_FETCH_MASK (1U PFERR_FETCH_BIT) - static inline u64 rsvd_bits(int s, int e) { return ((1ULL (e - s + 1)) - 1) s; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html