Re: [PATCH 1/1] KVM: ioapic: Record edge-triggered interrupts delivery status.

2014-12-25 Thread Wincy Van
2014-12-24 23:29 GMT+08:00 Jan Kiszka jan.kis...@web.de:
 On 2014-12-24 04:14, Wincy Van wrote:
 This patch fixes the bug discussed in
 https://www.mail-archive.com/kvm@vger.kernel.org/msg109813.html

 This patch uses a new field named irr_delivered to record the
 delivery status of edge-triggered interrupts, and clears the
 delivered interrupts in kvm_get_ioapic. So it has the same effect
 of commit 0bc830b05c667218d703f2026ec866c49df974fc
 (KVM: ioapic: clear IRR for edge-triggered interrupts at delivery)
 while avoids the bug of Windows guests.

 Signed-off-by: Wincy Van fanwenyi0...@gmail.com
 ---
  arch/x86/kvm/ioapic.c |7 ++-
  arch/x86/kvm/ioapic.h |1 +
  2 files changed, 7 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
 index b1947e0..a2e9d96 100644
 --- a/arch/x86/kvm/ioapic.c
 +++ b/arch/x86/kvm/ioapic.c
 @@ -206,6 +206,8 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, 
 unsigned int irq,

   old_irr = ioapic-irr;
   ioapic-irr |= mask;
 + if (edge)
 + ioapic-irr_delivered = ~mask;
   if ((edge  old_irr == ioapic-irr) ||
   (!edge  entry.fields.remote_irr)) {
   ret = 0;
 @@ -349,7 +351,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int 
 irq, bool line_status)
   irqe.shorthand = 0;

   if (irqe.trig_mode == IOAPIC_EDGE_TRIG)
 - ioapic-irr = ~(1  irq);
 + ioapic-irr_delivered |= 1  irq;

   if (irq == RTC_GSI  line_status) {
   /*
 @@ -597,6 +599,7 @@ static void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
   ioapic-base_address = IOAPIC_DEFAULT_BASE_ADDRESS;
   ioapic-ioregsel = 0;
   ioapic-irr = 0;
 + ioapic-irr_delivered = 0;
   ioapic-id = 0;
   memset(ioapic-irq_eoi, 0x00, IOAPIC_NUM_PINS);
   rtc_irq_eoi_tracking_reset(ioapic);
 @@ -654,6 +657,7 @@ int kvm_get_ioapic(struct kvm *kvm, struct 
 kvm_ioapic_state *state)

   spin_lock(ioapic-lock);
   memcpy(state, ioapic, sizeof(struct kvm_ioapic_state));
 + state-irr = ~ioapic-irr_delivered;
   spin_unlock(ioapic-lock);
   return 0;
  }
 @@ -667,6 +671,7 @@ int kvm_set_ioapic(struct kvm *kvm, struct 
 kvm_ioapic_state *state)
   spin_lock(ioapic-lock);
   memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
   ioapic-irr = 0;
 + ioapic-irr_delivered = 0;
   update_handled_vectors(ioapic);
   kvm_vcpu_request_scan_ioapic(kvm);
   kvm_ioapic_inject_all(ioapic, state-irr);
 diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
 index 3c91955..a5cdfc0 100644
 --- a/arch/x86/kvm/ioapic.h
 +++ b/arch/x86/kvm/ioapic.h
 @@ -77,6 +77,7 @@ struct kvm_ioapic {
   struct rtc_status rtc_status;
   struct delayed_work eoi_inject;
   u32 irq_eoi[IOAPIC_NUM_PINS];
 + u32 irr_delivered;
  };

  #ifdef DEBUG


 Does this introduce a state which requires save/restore on migration? If
 so, then you need to extend the existing interface - in a
 backward-compatible way. If not, please leave a remark on the reason.


No, we do not need to save/restore irr_delivered.

First of all, irr_delivered affects irr only when saving ioapic's
state, it does not affect any of the ioapic's logic.

Then, let's see what will happen if we do not save/restore that field :

1. If irr_delivered is 0 before saving, it is no difference at all.
2. If a bit of irr_delivered is 1 before saving, since irr_delivered
only affects migration,
we should check that if the 2nd+ migration is OK.
There are 3 possibilities on the first destination :
(1) The edge-triggered IRQ is masked, that bit will be cleared, it
is no difference to do 2nd migration.
(2) The edge-triggered IRQ is raised and not masked, that bit will
be set again, it is no difference to do 2nd migration.
(3) The edge-triggered IRQ is lowered, and the corresponding bit
of irr is cleared,
  the result of state-irr = ~ioapic-irr_delivered in
kvm_get_ioapic is not affected by irr_delivered.

So it is OK to migrate with a stateless irr_delivered.


Thanks,

Wincy




 Jan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bisected] KVM in 3.19-rc1 is completely broken

2014-12-25 Thread Chen, Tiejun



On 2014/12/25 8:55, Chen, Tiejun wrote:



On 2014/12/25 1:11, Andy Lutomirski wrote:

On Wed, Dec 24, 2014 at 12:23 AM, Chen, Tiejun tiejun.c...@intel.com
wrote:

On 2014/12/24 5:29, Andy Lutomirski wrote:


On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com
wrote:




I can reproduce it using the same steps on a Sandy Bridge laptop,
with
whatever QEMU is packaged in Fedora 21.  I attached the config.

I also submitted a virtme update for Fedora Rawhide and 21 (20 is
still building) in case it helps.  The build is here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=600732



The other reporter bisected it to
0e60b0799fedc495a5c57dbd669de3c10d72edd2.  Can you try its parent?



That's what I bisected it to.  The parent works.



Also, does it break with 3.18 host and 3.19-rc1 guest, or with
3.19-rc1 host and 3.18 guest?  (Sorry I should do this myself
but I'm a bit swamped due to vacation until Jan 6th).



The breakage is with 3.17.7-something L0 and the same test kernel as
L1 and L2.  I think it breaks the same way with 3.19-rc1 as host and
guest without any nesting, but that's awkward to test right now.




Andy,

Could you try this?

Signed-off-by: Tiejun Chen tiejun.c...@intel.com


I applied it by hand, and it survives extremely light testing.

Tested-by: Andy Lutomirski l...@amacapital.net



Looks good so thanks for your validation.



I refine that I posted that fix in another thread since looks that will 
broken !next case. And I myself already run those test instructions you 
showed previously, now looks good.


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-25 Thread Chen, Tiejun

On 2014/12/25 8:52, Nadav Amit wrote:

Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
  arch/x86/kvm/emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5a84be..702da5e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
unsigned long selector;
int rc;



Looks we just should do similar thing to em_push_sreg(),

unsigned long selector;
int rc;

+   if (ctxt-op_bytes == 4) {
+   rsp_increment(ctxt, -2);
+   ctxt-op_bytes = 2;
+   }
rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;

Right?

Thanks
Tiejun


-   rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
+   rc = emulate_pop(ctxt, selector, 2);
if (rc != X86EMUL_CONTINUE)
return rc;

if (ctxt-modrm_reg == VCPU_SREG_SS)
ctxt-interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
+   if (ctxt-op_bytes  2)
+   rsp_increment(ctxt, ctxt-op_bytes - 2);

rc = load_segment_descriptor(ctxt, (u16)selector, seg);
return rc;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-25 Thread Nadav Amit
Tiejun tiejun.c...@intel.com wrote:

 On 2014/12/25 8:52, Nadav Amit wrote:
 Although pop sreg updates RSP according to the operand size, only 2 bytes are
 read.  The current behavior may result in incorrect #GP or #PF exceptions.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/kvm/emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index e5a84be..702da5e 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
  unsigned long selector;
  int rc;
 
 Looks we just should do similar thing to em_push_sreg(),
 
unsigned long selector;
int rc;
 
 +   if (ctxt-op_bytes == 4) {
 +   rsp_increment(ctxt, -2);
 +   ctxt-op_bytes = 2;
 +   }
rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
 
 Right?
I don’t think so. It seems the behaviour of push and pop is a bit different.
For push: “If the source operand is a segment register (16 bits) and the
operand size is 64-bits, a zero-extended value is pushed on the stack; if
the operand size is 32-bits ... all recent Core and Atom processors perform
a 16-bit move, leaving the upper portion of the stack location unmodified.”

Therefore, for push in the case of op_bytes==8 we push zero-extended value.

For pop the behaviour is not well-documented, but experimentally it appears
only the first two bytes are accessed. I cannot see why it would be
different when opsize is 8, since it is not like the push case, where the
segment register value was zero extended.

If you feel strongly about it, I’ll create a unit test.

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-25 Thread Jamie Heilman
Chen, Tiejun wrote:
 On 2014/12/24 19:02, Jamie Heilman wrote:
 Chen, Tiejun wrote:
 On 2014/12/23 15:26, Jamie Heilman wrote:
 Chen, Tiejun wrote:
 On 2014/12/23 9:50, Chen, Tiejun wrote:
 On 2014/12/22 17:23, Jamie Heilman wrote:
 KVM internal error. Suberror: 1
 emulation failure
 EAX=000de494 EBX= ECX= EDX=0cfd
 ESI=0059 EDI= EBP= ESP=6fb4
 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =0010   00c09300 DPL=0 DS   [-WA]
 CS =0008   00c09b00 DPL=0 CS32 [-RA]
 SS =0010   00c09300 DPL=0 DS   [-WA]
 DS =0010   00c09300 DPL=0 DS   [-WA]
 FS =0010   00c09300 DPL=0 DS   [-WA]
 GS =0010   00c09300 DPL=0 DS   [-WA]
 LDT=   8200 DPL=0 LDT
 TR =   8b00 DPL=0 TSS32-busy
 GDT= 000f6be8 0037
 IDT= 000f6c26 
 CR0=6011 CR2= CR3= CR4=
 DR0= DR1= DR2=
 DR3=
 DR6=0ff0 DR7=0400
 EFER=
 Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b
 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1
 b0 04 e6 21 b0 02
 
 FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
 two bugs, maybe there's more to this first one.  I can repro this
 
 So if my understanding is correct, this is probably another bug. And
 especially, I already saw the same log in another thread, Cleaning up
 the KVM clock. Maybe you can continue to `git bisect` to locate that
 bad commit.
 
 
 Looks just now Andy found that commit,
 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule
 from size to GFN, maybe you can try to revert this to try yours again.
 
 That doesn't revert cleanly for me, and I don't have much time to
 fiddle with it until the 24th---so checked out the commit before it
 (d4ae84a0), applied your patch, built, and yes, everything works fine
 at that point.  I'll probably have time for another full bisection
 later, assuming things aren't ironed out already by then.
 
 3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK
 3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure
 
 So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2
 as well.
 
 Could you try this to fix your last error?
 
 Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works,
 my real (headless) kvm guests work, but this new patch makes running
 qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with
 
 Are you sure? From my test based on 3.19-rc1 that it owns top commit,
 
 aa39477b5692611b91ac9455ae588738852b3f60
 
 just plus my previous patch, kvm: x86: vmx: reorder some msr writing
 
 I already can execute such a command successfully,
 
 qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img
 
 And your log below seems not to relate mem_slot issue we're discussing, I
 guess you need to update qemu as well.

Yes, I'm sure.

 But I also found my new patch just work out Andy's next case, its really
 bringing a new issue in !next case. So I tried to refine that patch again as
 follows,

This latest patch (again, after fixing all the whitespace so it actually
applies), does the trick.  Both
qemu-system-x86_64 -machine pc,accel=kvm and
qemu-system-x86_64 -machine pc,accel=kvm -nodefaults work for me
now without any of the aforementioned warnings from the host.


 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
  virt/kvm/kvm_main.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index f528343..910bc48 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
 WARN_ON(mslots[i].id != id);
 if (!new-npages) {
 new-base_gfn = 0;
 +   new-flags = 0;
 if (mslots[i].npages)
 slots-used_slots--;
 } else {
 @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots,
 i++;
 }
 while (i  0 
 -  new-base_gfn  mslots[i - 1].base_gfn) {
 +  ((new-base_gfn  mslots[i - 1].base_gfn) ||
 +(!new-base_gfn 
 + !mslots[i - 1].base_gfn  !mslots[i - 1].npages))) {
 mslots[i] = mslots[i - 1];
 slots-id_to_index[mslots[i].id] = i;
 i--;
 
 
 
 Tiejun
 
 errors in the host to the tune of:
 
 [ cut here ]
 WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 
 kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]()
 Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave 
 cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd 
 grace fscache sunrpc 

[PATCH v2 2/2] vhost: relax used address alignment

2014-12-25 Thread Michael S. Tsirkin
virtio 1.0 only requires used address to be 4 byte aligned,
vhost required 8 bytes (size of vring_used_elem).
Fix up vhost to match that.

Additionally, while vhost correctly requires 8 byte
alignment for log, it's unconnected to used ring:
it's a consequence that log has u64 entries.
Tweak code to make that clearer.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ed71b53..cb807d0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -713,9 +713,13 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, 
void __user *argp)
r = -EFAULT;
break;
}
-   if ((a.avail_user_addr  (sizeof *vq-avail-ring - 1)) ||
-   (a.used_user_addr  (sizeof *vq-used-ring - 1)) ||
-   (a.log_guest_addr  (sizeof *vq-used-ring - 1))) {
+
+   /* Make sure it's safe to cast pointers to vring types. */
+   BUILD_BUG_ON(__alignof__ *vq-avail  VRING_AVAIL_ALIGN_SIZE);
+   BUILD_BUG_ON(__alignof__ *vq-used  VRING_USED_ALIGN_SIZE);
+   if ((a.avail_user_addr  (VRING_AVAIL_ALIGN_SIZE - 1)) ||
+   (a.used_user_addr  (VRING_USED_ALIGN_SIZE - 1)) ||
+   (a.log_guest_addr  (sizeof(u64) - 1))) {
r = -EINVAL;
break;
}
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-25 Thread Chen, Tiejun

On 2014/12/25 18:52, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/24 19:02, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 15:26, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 9:50, Chen, Tiejun wrote:

On 2014/12/22 17:23, Jamie Heilman wrote:

KVM internal error. Suberror: 1
emulation failure
EAX=000de494 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fb4
EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6be8 0037
IDT= 000f6c26 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1
b0 04 e6 21 b0 02

FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
two bugs, maybe there's more to this first one.  I can repro this


So if my understanding is correct, this is probably another bug. And
especially, I already saw the same log in another thread, Cleaning up
the KVM clock. Maybe you can continue to `git bisect` to locate that
bad commit.



Looks just now Andy found that commit,
0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule

from size to GFN, maybe you can try to revert this to try yours again.

That doesn't revert cleanly for me, and I don't have much time to
fiddle with it until the 24th---so checked out the commit before it
(d4ae84a0), applied your patch, built, and yes, everything works fine
at that point.  I'll probably have time for another full bisection
later, assuming things aren't ironed out already by then.


3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK
3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure

So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2
as well.


Could you try this to fix your last error?


Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works,
my real (headless) kvm guests work, but this new patch makes running
qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with


Are you sure? From my test based on 3.19-rc1 that it owns top commit,

aa39477b5692611b91ac9455ae588738852b3f60

just plus my previous patch, kvm: x86: vmx: reorder some msr writing

I already can execute such a command successfully,

qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img

And your log below seems not to relate mem_slot issue we're discussing, I
guess you need to update qemu as well.


Yes, I'm sure.


But I also found my new patch just work out Andy's next case, its really
bringing a new issue in !next case. So I tried to refine that patch again as
follows,


This latest patch (again, after fixing all the whitespace so it actually


Next time I guess I need to post that as a attached file :)


applies), does the trick.  Both
qemu-system-x86_64 -machine pc,accel=kvm and
qemu-system-x86_64 -machine pc,accel=kvm -nodefaults work for me
now without any of the aforementioned warnings from the host.


Sounds great and thanks for your test again.

Tiejun





Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  virt/kvm/kvm_main.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..910bc48 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
 WARN_ON(mslots[i].id != id);
 if (!new-npages) {
 new-base_gfn = 0;
+   new-flags = 0;
 if (mslots[i].npages)
 slots-used_slots--;
 } else {
@@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots,
 i++;
 }
 while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
+  ((new-base_gfn  mslots[i - 1].base_gfn) ||
+(!new-base_gfn 
+ !mslots[i - 1].base_gfn  !mslots[i - 1].npages))) {
 mslots[i] = mslots[i - 1];
 slots-id_to_index[mslots[i].id] = i;
 i--;



Tiejun


errors in the host to the tune of:

[ cut here ]
WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 
kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]()
Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave 
cpufreq_ondemand 

Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-25 Thread Chen, Tiejun

On 2014/12/25 17:55, Nadav Amit wrote:

Tiejun tiejun.c...@intel.com wrote:


On 2014/12/25 8:52, Nadav Amit wrote:

Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
  arch/x86/kvm/emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5a84be..702da5e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
unsigned long selector;
int rc;


Looks we just should do similar thing to em_push_sreg(),

unsigned long selector;
int rc;

+   if (ctxt-op_bytes == 4) {
+   rsp_increment(ctxt, -2);
+   ctxt-op_bytes = 2;
+   }
rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;

Right?

I don’t think so. It seems the behaviour of push and pop is a bit different.
For push: “If the source operand is a segment register (16 bits) and the
operand size is 64-bits, a zero-extended value is pushed on the stack; if
the operand size is 32-bits ... all recent Core and Atom processors perform
a 16-bit move, leaving the upper portion of the stack location unmodified.”

Therefore, for push in the case of op_bytes==8 we push zero-extended value.

For pop the behaviour is not well-documented, but experimentally it appears
only the first two bytes are accessed. I cannot see why it would be


Maybe we can comment something here, like /* Just force 2 byte 
destination to already work well in most cases. */.



different when opsize is 8, since it is not like the push case, where the
segment register value was zero extended.


Thanks for your explanation.



If you feel strongly about it, I’ll create a unit test.


Based on your description I think I can stand with you at this point.

Tiejun



Nadav



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: fix to update memslots properly

2014-12-25 Thread Tiejun Chen
After commit, 0e60b0799fed, kvm: change memslot sorting rule from size to
GFN is introduced, we're missing but need to consider such a case,
(!new-base_gfn  !mslots[i - 1].base_gfn  !mslots[i - 1].npages), then
re-sort kvm_memslots wrong in next case to issue the following,

KVM internal error. Suberror: 1
emulation failure
EAX=000dee58 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fc4
EIP=000f17f4 EFL=00010012 [A--] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6c58 0037
IDT= 000f6c96 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 75 fc ff ff 89 f2 a8 10 89 d8 75 0a b9 74 17 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff 57 56 53 8b 35 38 65 0f 00 85 f6 0f 88 be
00 00 00 0f b7 f6

And we also should set flag as 0 in case of (new-npages == 0) 
(new-base_gfn == 0).

Reported-by: Jamie Heilman ja...@audible.transient.net
Tested-by: Jamie Heilman ja...@audible.transient.net
Reported-by: Andy Lutomirski l...@amacapital.net
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---

I test this both in Andy' case and Jamie's case.

 virt/kvm/kvm_main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..6e52f3f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
WARN_ON(mslots[i].id != id);
if (!new-npages) {
new-base_gfn = 0;
+   new-flags = 0;
if (mslots[i].npages)
slots-used_slots--;
} else {
@@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots,
i++;
}
while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
+  ((new-base_gfn  mslots[i - 1].base_gfn) ||
+   (!new-base_gfn 
+!mslots[i - 1].base_gfn  !mslots[i - 1].npages))) {
mslots[i] = mslots[i - 1];
slots-id_to_index[mslots[i].id] = i;
i--;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-25 Thread Wu, Feng


 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Nadav Amit
 Sent: Thursday, December 25, 2014 5:55 PM
 To: Chen, Tiejun
 Cc: Paolo Bonzini; kvm list
 Subject: Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
 
 Tiejun tiejun.c...@intel.com wrote:
 
  On 2014/12/25 8:52, Nadav Amit wrote:
  Although pop sreg updates RSP according to the operand size, only 2 bytes
 are
  read.  The current behavior may result in incorrect #GP or #PF exceptions.
 
  Signed-off-by: Nadav Amit na...@cs.technion.ac.il
  ---
   arch/x86/kvm/emulate.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)
 
  diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
  index e5a84be..702da5e 100644
  --- a/arch/x86/kvm/emulate.c
  +++ b/arch/x86/kvm/emulate.c
  @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct
 x86_emulate_ctxt *ctxt)
 unsigned long selector;
 int rc;
 
  Looks we just should do similar thing to em_push_sreg(),
 
 unsigned long selector;
 int rc;
 
  +   if (ctxt-op_bytes == 4) {
  +   rsp_increment(ctxt, -2);
  +   ctxt-op_bytes = 2;
  +   }
 rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
 if (rc != X86EMUL_CONTINUE)
 return rc;
 
  Right?
 I don't think so. It seems the behaviour of push and pop is a bit different.
 For push: If the source operand is a segment register (16 bits) and the
 operand size is 64-bits, a zero-extended value is pushed on the stack; if
 the operand size is 32-bits ... all recent Core and Atom processors perform
 a 16-bit move, leaving the upper portion of the stack location unmodified.
 
 Therefore, for push in the case of op_bytes==8 we push zero-extended value.
 
 For pop the behaviour is not well-documented, but experimentally it appears
 only the first two bytes are accessed. I cannot see why it would be
 different when opsize is 8, since it is not like the push case, where the
 segment register value was zero extended.

Let's take 64-bits operand size as an example. When pushing a segment register, 
it
uses zero-extended value, so 8 bytes will be pushed on the stack. When popping 
it,
the current code return the top 8 bytes in the stack, and it only uses the 
lowest 2
bytes for load_segment_descriptor(). what is the issue here?

Thanks,
Feng

 
 If you feel strongly about it, I'll create a unit test.
 
 Nadav
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong

2014-12-25 Thread Wu, Feng


 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Nadav Amit
 Sent: Thursday, December 25, 2014 8:52 AM
 To: pbonz...@redhat.com
 Cc: kvm@vger.kernel.org; Nadav Amit
 Subject: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
 
 When emulating an instruction that reads the destination memory operand
 (i.e.,
 instructions without the Mov flag in the emulator), the operand is first read.
 If a page-fault is detected in this phase, the error-code which would be
 delivered to the VM does not indicate that the access that caused the
 exception
 is a write one. This does not conform with real hardware, and may cause the
 VM
 to enter the page-fault handler twice for no reason (once for read, once for
 write).
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/include/asm/kvm_host.h | 12 
  arch/x86/kvm/emulate.c  |  6 +-
  arch/x86/kvm/mmu.h  | 12 
  3 files changed, 17 insertions(+), 13 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h
 b/arch/x86/include/asm/kvm_host.h
 index 73ccb12..d6f90ca 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -162,6 +162,18 @@ enum {
  #define DR7_FIXED_1  0x0400
  #define DR7_VOLATILE 0x2bff
 
 +#define PFERR_PRESENT_BIT 0
 +#define PFERR_WRITE_BIT 1
 +#define PFERR_USER_BIT 2
 +#define PFERR_RSVD_BIT 3
 +#define PFERR_FETCH_BIT 4
 +
 +#define PFERR_PRESENT_MASK (1U  PFERR_PRESENT_BIT)
 +#define PFERR_WRITE_MASK (1U  PFERR_WRITE_BIT)
 +#define PFERR_USER_MASK (1U  PFERR_USER_BIT)
 +#define PFERR_RSVD_MASK (1U  PFERR_RSVD_BIT)
 +#define PFERR_FETCH_MASK (1U  PFERR_FETCH_BIT)
 +
  /* apic attention bits */
  #define KVM_APIC_CHECK_VAPIC 0
  /*
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 7a9697f..e5a84be 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt
 *ctxt)
   /* optimisation - avoid slow emulated read if Mov */
   rc = segmented_read(ctxt, ctxt-dst.addr.mem,
  ctxt-dst.val, ctxt-dst.bytes);

This is a write operation, do you know why we need to read the dst operand 
first here?

Thanks,
Feng

 - if (rc != X86EMUL_CONTINUE)
 + if (rc != X86EMUL_CONTINUE) {
 + if (rc == X86EMUL_PROPAGATE_FAULT 
 + ctxt-exception.vector == PF_VECTOR)
 + ctxt-exception.error_code |= PFERR_WRITE_MASK;
   goto done;
 + }
   }
   ctxt-dst.orig_val = ctxt-dst.val;
 
 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
 index 6b34876b..daae711 100644
 --- a/arch/x86/kvm/mmu.h
 +++ b/arch/x86/kvm/mmu.h
 @@ -44,18 +44,6 @@
  #define PT_DIRECTORY_LEVEL 2
  #define PT_PAGE_TABLE_LEVEL 1
 
 -#define PFERR_PRESENT_BIT 0
 -#define PFERR_WRITE_BIT 1
 -#define PFERR_USER_BIT 2
 -#define PFERR_RSVD_BIT 3
 -#define PFERR_FETCH_BIT 4
 -
 -#define PFERR_PRESENT_MASK (1U  PFERR_PRESENT_BIT)
 -#define PFERR_WRITE_MASK (1U  PFERR_WRITE_BIT)
 -#define PFERR_USER_MASK (1U  PFERR_USER_BIT)
 -#define PFERR_RSVD_MASK (1U  PFERR_RSVD_BIT)
 -#define PFERR_FETCH_MASK (1U  PFERR_FETCH_BIT)
 -
  static inline u64 rsvd_bits(int s, int e)
  {
   return ((1ULL  (e - s + 1)) - 1)  s;
 --
 1.9.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html