Hello,

 This patch should solve the problem observed during protected mode
transitions that appears for example during the installation of
openSuse-10.3. Unfortunately there is an issue that crashes
kvm-userspace. I'm not sure if it's a problem introduced by the
patch or if the patch is good and raises a new issue.

  Here is what I'm doing:

 1) Remove the SS patching that modifies SS_SELECTOR in enter_pmode()
to see vmentry failure.
 2) Add the handler that catches the VMentry failure. It is called
handle_vmentry_failure()
 3) while CS.RPL != SS.RPL, emulate the instruction.
 4) Add the emulation of "ljmp", "mov r, imm", "mov sreg, r/m16" and
"mov r/m16, sreg" that have respectively opcode 0xea, 0xb8, 0x8e and
0x8c.

Normally, it should be sufficient to boot openSuse-10.3 because
instructions that need to be emulated are:

  0x0000000000046e53:  ljmp   $0x18,$0x6e18
  0x0000000000046e58:  mov    $0x20,%ax
  0x0000000000046e5c:  mov    %eax,%ds
  0x0000000000046e5e:  mov    %ss,%eax
  0x0000000000046e60:  and    $0xffff,%esp
  0x0000000000046e66:  shl    $0x4,%eax
  0x0000000000046e69:  add    %eax,%esp
  0x0000000000046e6b:  mov    $0x8,%ax
  0x0000000000046e6f:  mov    %eax,%ss

At this point, cs.rpl is equal to ss.rpl. 

I added trace in handle_vmentry_failure() and also in writeback() to
see what functions are emulated and I observe:

[82766.614575] Failed vm entry (exit reason 0x21) invalid guest state
[82766.651046] emulation at (46e53) rip 6e13: ea 18 6e 18
[82766.682611]     writeback: dst.byte 0
[82766.706180]     writeback: dst.ptr  0x0000000000000000
[82766.734890]     writeback: dst.val  0x0
[82766.758591]     writeback: src.ptr  0x0000000000000000
[82766.790594]     writeback: src.val  0x0
[82766.855058] successfully emulated instruction
[82766.882695] Failed vm entry (exit reason 0x21) invalid guest state
[82766.923061] emulation at (46e58) rip 6e18: 66 b8 20 00
[82766.951079]     writeback: dst.byte 2
[82766.975074]     writeback: dst.ptr  0xffff810324d07400
[82767.003112]     writeback: dst.val  0x20
[82767.027100]     writeback: src.ptr  0x0000000000006e1a
[82767.059092]     writeback: src.val  0x20
[82767.127094] successfully emulated instruction
[82767.151111] Failed vm entry (exit reason 0x21) invalid guest state
[82767.191099] emulation at (46e5c) rip 6e1c: 8e d8 8c d0
[82767.219156]     writeback: dst.byte 4
[82767.243118]     writeback: dst.ptr  0xffff810324d07418
[82767.275091]     writeback: dst.val  0x800000
[82767.299122]     writeback: src.ptr  0x0000000000000000
[82767.331106]     writeback: src.val  0x20
[82767.395255] successfully emulated instruction
[82767.423135] Failed vm entry (exit reason 0x21) invalid guest state
[82767.459260] emulation at (46e5e) rip 6e1e: 8c d0 81 e4
[82767.491137]     writeback: dst.byte 2
[82767.515117]     writeback: dst.ptr  0xffff810324d07400
[82767.543138]     writeback: dst.val  0x53e1
[82767.567264]     writeback: src.ptr  0xffff810324d07410
[82767.599142]     writeback: src.val  0x20
[82767.667146] successfully emulated instruction
[82767.691277] Failed vm entry (exit reason 0x21) invalid guest state
[82767.731152] emulation at (46e60) rip 6e20: 81 e4 ff ff
[82767.763136]     writeback: dst.byte 0
[82767.783154]     writeback: dst.ptr  0x0000000000000000
[82767.815157]     writeback: dst.val  0x2004
[82767.839156]     writeback: src.ptr  0x0000000000006e22
[82767.871140]     writeback: src.val  0xffff
[82767.939170] successfully emulated instruction
[82767.963307] Failed vm entry (exit reason 0x21) invalid guest state
[82768.003174] emulation at (46e66) rip 6e26: c1 e0 04 01
[82768.035153]     writeback: dst.byte 0
[82768.055174]     writeback: dst.ptr  0x0000000000000000
[82768.087177]     writeback: dst.val  0x53e1
[82768.111178]     writeback: src.ptr  0x0000000000006e28
[82768.143157]     writeback: src.val  0x4
[82768.211151] successfully emulated instruction
[82768.235189] Failed vm entry (exit reason 0x21) invalid guest state
[82768.271311] emulation at (46e69) rip 6e29: 01 c4 66 b8
[82768.303214]     writeback: dst.byte 0
[82768.327213]     writeback: dst.ptr  0x0000000000000000
[82768.355238]     writeback: dst.val  0x2004
[82768.379316]     writeback: src.ptr  0xffff810324d07400
[82768.411227]     writeback: src.val  0x53e1
[82768.483168] successfully emulated instruction
[82768.507240] Failed vm entry (exit reason 0x21) invalid guest state
[82768.543329] emulation at (46e6b) rip 6e2b: 66 b8 08 00
[82768.575239]     writeback: dst.byte 2
[82768.599233]     writeback: dst.ptr  0xffff810324d07400
[82768.627257]     writeback: dst.val  0x8
[82768.651246]     writeback: src.ptr  0x0000000000006e2d
[82768.683245]     writeback: src.val  0x8
[82768.751250] successfully emulated instruction
[82768.775331] Failed vm entry (exit reason 0x21) invalid guest state
[82768.815256] emulation at (46e6f) rip 6e2f: 8e d0 8e c0
[82768.843348]     writeback: dst.byte 4
[82768.867268]     writeback: dst.ptr  0xffff810324d07410
[82768.899204]     writeback: dst.val  0x53e1
[82768.923259]     writeback: src.ptr  0x0000000000000000
[82768.951351]     writeback: src.val  0x8
[82769.019279] successfully emulated instruction

So everything seems ok but after the emulation of "mov %eax,%ss"
instruction, it seems that cs.rpl == ss.rpl but the guest is still in a
VT-unfriendly state because I have the following error in kvm-userspace:

[EMAIL PROTECTED]/local/kvm-userspace.git/bin]$ ./qemu-system-x86_64
-hda ~/disk_images/hd_50G.qcow2
-cdrom /images_iso/openSUSE-10.3-GM-x86_64-mini.iso -boot d -s -m 1024

exception 13 (33) 
rax 0000000000000673 rbx 0000000000800000 rcx 0000000000000000 
rdx 00000000000013ca rsi 0000000000055e1c rdi 0000000000055e1d 
rsp 00000000fffa0080 rbp 000000000000200b r8 0000000000000000 
r9  0000000000000000 r10 0000000000000000 r11 0000000000000000 
r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 
r15 0000000000000000 rip 000000000000b071 rflags 00033092 
cs 4004 (00040040/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) 
ds 4004 (00040040/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) 
es 00ff (00000ff0/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
ss ff11 (000ff110/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) 
fs 3002 (00030020/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) 
gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) 
tr 0000 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) 
ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) 
gdt 40920/47 idt 0/ffff cr0 10 cr2 0 cr3 0 cr4 0 cr8 0 efer 0
code: 17 06 29 4b 01 18 eb 18 a8 25 aa 19 28 4c 01 28 4d 01 01 17 -->
0f 17 0f 01 17 0f 17 12 01 17 2c 25 4b 19 21 00 02 17 1a 94 0a 76 67 61
3d 30 78 25 78 20 Aborted

It's strange because handle_vmentry_failure() is not called. I'm trying
to see where is the problem, any comments are welcome

Regards,
Guillaume



 arch/x86/kvm/vmx.c         |   68 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx.h         |    3 +
 arch/x86/kvm/x86.c         |   12 ++--
 arch/x86/kvm/x86_emulate.c |  112 +++++++++++++++++++++++++++++++++++++++++++--
 include/asm-x86/kvm_host.h |    4 +
 5 files changed, 190 insertions(+), 9 deletions(-)

---

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 79cdbe8..a0a13b8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1272,7 +1272,9 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
        fix_pmode_dataseg(VCPU_SREG_GS, &vcpu->arch.rmode.gs);
        fix_pmode_dataseg(VCPU_SREG_FS, &vcpu->arch.rmode.fs);
 
+#if 0
        vmcs_write16(GUEST_SS_SELECTOR, 0);
+#endif
        vmcs_write32(GUEST_SS_AR_BYTES, 0x93);
 
        vmcs_write16(GUEST_CS_SELECTOR,
@@ -2635,6 +2637,66 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
        return 1;
 }
 
+static int invalid_guest_state(struct kvm_vcpu *vcpu,
+               struct kvm_run *kvm_run, u32 failure_reason)
+{
+       u16 ss, cs;
+       u8 opcodes[4];
+       unsigned long rip = vcpu->arch.rip;
+       unsigned long rip_linear;
+
+       ss = vmcs_read16(GUEST_SS_SELECTOR);
+       cs = vmcs_read16(GUEST_CS_SELECTOR);
+
+       if ((ss & 0x03) != (cs & 0x03)) {
+               int err;
+               rip_linear = rip + vmx_get_segment_base(vcpu, VCPU_SREG_CS);
+               emulator_read_std(rip_linear, (void *)opcodes, 4, vcpu);
+               printk(KERN_INFO "emulation at (%lx) rip %lx: %02x %02x %02x 
%02x\n",
+                               rip_linear,
+                               rip, opcodes[0], opcodes[1], opcodes[2], 
opcodes[3]);
+               err = emulate_instruction(vcpu, kvm_run, 0, 0, 0);
+               switch (err) {
+                       case EMULATE_DONE:
+                               printk(KERN_INFO "successfully emulated 
instruction\n");
+                               return 1;
+                       case EMULATE_DO_MMIO:
+                               printk(KERN_INFO "mmio?\n");
+                               return 0;
+                       default:
+                               kvm_report_emulation_failure(vcpu, "vmentry 
failure");
+                               break;
+               }
+       }
+
+       kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
+       kvm_run->hw.hardware_exit_reason = failure_reason;
+       return 0;
+}
+
+static int handle_vmentry_failure(struct kvm_vcpu *vcpu,
+                                 struct kvm_run *kvm_run,
+                                 u32 failure_reason)
+{
+       unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+
+       printk(KERN_INFO "Failed vm entry (exit reason 0x%x) ", failure_reason);
+       switch (failure_reason) {
+               case EXIT_REASON_INVALID_GUEST_STATE:
+                       printk("invalid guest state \n");
+                       return invalid_guest_state(vcpu, kvm_run, 
failure_reason);
+               case EXIT_REASON_MSR_LOADING:
+                       printk("caused by MSR entry %ld loading.\n", 
exit_qualification);
+                       break;
+               case EXIT_REASON_MACHINE_CHECK:
+                       printk("caused by machine check.\n");
+                       break;
+               default:
+                       printk("reason not known yet!\n");
+                       break;
+       }
+       return 0;
+}
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -2696,6 +2758,12 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
                        exit_reason != EXIT_REASON_EPT_VIOLATION))
                printk(KERN_WARNING "%s: unexpected, valid vectoring info and "
                       "exit reason is 0x%x\n", __func__, exit_reason);
+
+       if ((exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
+               exit_reason &= ~VMX_EXIT_REASONS_FAILED_VMENTRY;
+               return handle_vmentry_failure(vcpu, kvm_run, exit_reason);
+       }
+
        if (exit_reason < kvm_vmx_max_exit_handlers
            && kvm_vmx_exit_handlers[exit_reason])
                return kvm_vmx_exit_handlers[exit_reason](vcpu, kvm_run);
diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h
index 79d94c6..2cebf48 100644
--- a/arch/x86/kvm/vmx.h
+++ b/arch/x86/kvm/vmx.h
@@ -238,7 +238,10 @@ enum vmcs_field {
 #define EXIT_REASON_IO_INSTRUCTION      30
 #define EXIT_REASON_MSR_READ            31
 #define EXIT_REASON_MSR_WRITE           32
+#define EXIT_REASON_INVALID_GUEST_STATE 33
+#define EXIT_REASON_MSR_LOADING         34
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_MACHINE_CHECK       41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
 #define EXIT_REASON_EPT_VIOLATION       48
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 578a0c1..9e5d687 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3027,8 +3027,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
        return 0;
 }
 
-static void get_segment(struct kvm_vcpu *vcpu,
-                       struct kvm_segment *var, int seg)
+void get_segment(struct kvm_vcpu *vcpu,
+                struct kvm_segment *var, int seg)
 {
        kvm_x86_ops->get_segment(vcpu, var, seg);
 }
@@ -3111,8 +3111,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
        return 0;
 }
 
-static void set_segment(struct kvm_vcpu *vcpu,
-                       struct kvm_segment *var, int seg)
+void set_segment(struct kvm_vcpu *vcpu,
+                struct kvm_segment *var, int seg)
 {
        kvm_x86_ops->set_segment(vcpu, var, seg);
 }
@@ -3270,8 +3270,8 @@ static int load_segment_descriptor_to_kvm_desct(struct 
kvm_vcpu *vcpu,
        return 0;
 }
 
-static int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-                                  int type_bits, int seg)
+int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
+                           int type_bits, int seg)
 {
        struct kvm_segment kvm_seg;
 
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 2ca0838..f6b9dad 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -138,7 +138,8 @@ static u16 opcode_table[256] = {
        /* 0x88 - 0x8F */
        ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov,
        ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
-       0, ModRM | DstReg, 0, Group | Group1A,
+       DstMem | SrcReg | ModRM | Mov, ModRM | DstReg,
+       DstReg | SrcMem | ModRM | Mov, Group | Group1A,
        /* 0x90 - 0x9F */
        0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0,
@@ -152,7 +153,8 @@ static u16 opcode_table[256] = {
        ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
        ByteOp | ImplicitOps | String, ImplicitOps | String,
        /* 0xB0 - 0xBF */
-       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+       0, 0, 0, 0, 0, 0, 0, 0,
+       DstReg | SrcImm | Mov, 0, 0, 0, 0, 0, 0, 0,
        /* 0xC0 - 0xC7 */
        ByteOp | DstMem | SrcImm | ModRM, DstMem | SrcImmByte | ModRM,
        0, ImplicitOps | Stack, 0, 0,
@@ -168,7 +170,7 @@ static u16 opcode_table[256] = {
        /* 0xE0 - 0xE7 */
        0, 0, 0, 0, 0, 0, 0, 0,
        /* 0xE8 - 0xEF */
-       ImplicitOps | Stack, SrcImm|ImplicitOps, 0, SrcImmByte|ImplicitOps,
+       ImplicitOps | Stack, SrcImm | ImplicitOps, ImplicitOps, SrcImmByte | 
ImplicitOps,
        0, 0, 0, 0,
        /* 0xF0 - 0xF7 */
        0, 0, 0, 0,
@@ -1511,14 +1513,90 @@ special_insn:
                break;
        case 0x88 ... 0x8b:     /* mov */
                goto mov;
+       case 0x8c: { /* mov r/m, sreg */
+               struct kvm_segment segreg;
+
+               if (c->modrm_mod == 0x3)
+                       c->src.val = c->modrm_val;
+
+               switch ( c->modrm_reg ) {
+               case 0:
+                       get_segment(ctxt->vcpu, &segreg, VCPU_SREG_ES);
+                       break;
+               case 1:
+                       get_segment(ctxt->vcpu, &segreg, VCPU_SREG_CS);
+                       break;
+               case 2:
+                       get_segment(ctxt->vcpu, &segreg, VCPU_SREG_SS);
+                       break;
+               case 3:
+                       get_segment(ctxt->vcpu, &segreg, VCPU_SREG_DS);
+                       break;
+               case 4:
+                       get_segment(ctxt->vcpu, &segreg, VCPU_SREG_FS);
+                       break;
+               case 5:
+                       get_segment(ctxt->vcpu, &segreg, VCPU_SREG_GS);
+                       break;
+               default:
+                       printk(KERN_INFO "0x8c: Invalid segreg in modrm byte 
0x%02x\n",
+                                        c->modrm);
+                       goto cannot_emulate;
+               }
+               c->dst.val = segreg.selector;
+               c->dst.bytes = 2;
+               c->dst.ptr = (unsigned long *)decode_register(c->modrm_rm, 
c->regs,
+                                                             c->d & ByteOp);
+               break;
+       }
        case 0x8d: /* lea r16/r32, m */
                c->dst.val = c->modrm_ea;
                break;
+       case 0x8e: { /* mov seg, r/m16 */
+               uint16_t sel;
+
+               sel = c->src.val;
+               switch ( c->modrm_reg ) {
+               case 0:
+                       if (load_segment_descriptor(ctxt->vcpu, sel, 1, 
VCPU_SREG_ES) < 0)
+                               goto cannot_emulate;
+                       break;
+               case 1:
+                       if (load_segment_descriptor(ctxt->vcpu, sel, 9, 
VCPU_SREG_CS) < 0)
+                               goto cannot_emulate;
+                       break;
+               case 2:
+                       if (load_segment_descriptor(ctxt->vcpu, sel, 1, 
VCPU_SREG_SS) < 0)
+                               goto cannot_emulate;
+                       break;
+               case 3:
+                       if (load_segment_descriptor(ctxt->vcpu, sel, 1, 
VCPU_SREG_DS) < 0)
+                               goto cannot_emulate;
+                       break;
+               case 4:
+                       if (load_segment_descriptor(ctxt->vcpu, sel, 1, 
VCPU_SREG_FS) < 0)
+                               goto cannot_emulate;
+                       break;
+               case 5:
+                       if (load_segment_descriptor(ctxt->vcpu, sel, 1, 
VCPU_SREG_GS) < 0)
+                               goto cannot_emulate;
+                       break;
+               default:
+                       printk(KERN_INFO "Invalid segreg in modrm byte 
0x%02x\n",
+                                         c->modrm);
+                       goto cannot_emulate;
+               }
+
+               c->dst.type = OP_NONE;  /* Disable writeback. */
+               break;
+       }
        case 0x8f:              /* pop (sole member of Grp1a) */
                rc = emulate_grp1a(ctxt, ops);
                if (rc != 0)
                        goto done;
                break;
+       case 0xb8: /* mov r, imm */
+               goto mov;
        case 0x9c: /* pushf */
                c->src.val =  (unsigned long) ctxt->eflags;
                emulate_push(ctxt);
@@ -1657,6 +1735,34 @@ special_insn:
                break;
        }
        case 0xe9: /* jmp rel */
+               jmp_rel(c, c->src.val);
+               c->dst.type = OP_NONE; /* Disable writeback. */
+               break;
+       case 0xea: /* jmp far */ {
+               uint32_t eip;
+               uint16_t sel;
+
+               switch (c->op_bytes) {
+               case 2:
+                       eip = insn_fetch(u16, 2, c->eip);
+                       eip = eip & 0x0000FFFF; /* clear upper 16 bits */
+                       break;
+               case 4:
+                       eip = insn_fetch(u32, 4, c->eip);
+                       break;
+               default:
+                       DPRINTF("jmp far: Invalid op_bytes\n");
+                       goto cannot_emulate;
+               }
+               sel = insn_fetch(u16, 2, c->eip);
+               if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 
0) {
+                       DPRINTF("jmp far: Failed to load CS descriptor\n");
+                       goto cannot_emulate;
+               }
+
+               c->eip = eip;
+               break;
+       }
        case 0xeb: /* jmp rel short */
                jmp_rel(c, c->src.val);
                c->dst.type = OP_NONE; /* Disable writeback. */
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 4baa9c9..7a0846a 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -495,6 +495,10 @@ int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr,
 int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr,
                    unsigned long value);
 
+void set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
+void get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
+int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
+                           int type_bits, int seg);
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason);
 
 void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to