[COMMIT master] device-assignment: Always use slow mapping for PCI option ROM

2010-08-17 Thread Avi Kivity
From: Alex Williamson alex.william...@redhat.com

KVM doesn't support read-only mappings for MMIO space.  Performance isn't
an issue for the option ROM mapping, so always use slow mapping.  kvm.git
cset b4f8c249 will make kvm hang with a Bad address fault without this.
We can also then drop the extraneous mprotects since the guest has no way
to write to these regions.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Chris Wright chr...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index c26ff6d..0e82a16 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -541,6 +541,8 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 /* map physical memory */
 pci_dev-v_addrs[i].e_physbase = cur_region-base_addr;
 if (i == PCI_ROM_SLOT) {
+/* KVM doesn't support read-only mappings, use slow map */
+slow_map = 1;
 pci_dev-v_addrs[i].u.r_virtbase =
 mmap(NULL,
  cur_region-size,
@@ -566,8 +568,6 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 if (i == PCI_ROM_SLOT) {
 memset(pci_dev-v_addrs[i].u.r_virtbase, 0,
(cur_region-size + 0xFFF)  0xF000);
-mprotect(pci_dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase,
- (cur_region-size + 0xFFF)  0xF000, PROT_READ);
 }
 
 pci_dev-v_addrs[i].r_size = cur_region-size;
@@ -1691,12 +1691,8 @@ static void assigned_dev_load_option_rom(AssignedDevice 
*dev)
 /* Copy ROM contents into the space backing the ROM BAR */
 if (dev-v_addrs[PCI_ROM_SLOT].r_size = size 
 dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase) {
-mprotect(dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase,
- size, PROT_READ | PROT_WRITE);
 memcpy(dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase,
buf, size);
-mprotect(dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase,
- size, PROT_READ);
 }
 
 free(buf);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: Fix slow option ROM mapping

2010-08-17 Thread Avi Kivity
From: Alex Williamson alex.william...@redhat.com

cpu_register_io_memory() supports individual function pointers
being NULL, not the structure itself.  Create and pass the
right thing.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Chris Wright chr...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index c56870e..c26ff6d 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -233,6 +233,8 @@ static CPUReadMemoryFunc * const slow_bar_read[] = {
 slow_bar_readl
 };
 
+static CPUWriteMemoryFunc * const slow_bar_null_write[] = {NULL, NULL, NULL};
+
 static void assigned_dev_iomem_map_slow(PCIDevice *pci_dev, int region_num,
 pcibus_t e_phys, pcibus_t e_size,
 int type)
@@ -244,7 +246,7 @@ static void assigned_dev_iomem_map_slow(PCIDevice *pci_dev, 
int region_num,
 
 DEBUG(%s, slow map\n);
 if (region_num == PCI_ROM_SLOT)
-m = cpu_register_io_memory(slow_bar_read, NULL, region);
+m = cpu_register_io_memory(slow_bar_read, slow_bar_null_write, region);
 else
 m = cpu_register_io_memory(slow_bar_read, slow_bar_write, region);
 cpu_register_physical_memory(e_phys, e_size, m);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: Byte-wise ROM read

2010-08-17 Thread Avi Kivity
From: Alex Williamson alex.william...@redhat.com

The host kernel filters the PCI option ROM, returning only bytes for
the actual ROM size, not for the whole BAR.  That means we typically
do a short read of the PCI sysfs ROM file.  Read it a byte at a time
so we know how much to actually copy and only skip the copy if we
get nothing.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Chris Wright chr...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 0e82a16..3bb7f0b 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1680,19 +1680,20 @@ static void assigned_dev_load_option_rom(AssignedDevice 
*dev)
 return;
 }
 
-ret = fread(buf, size, 1, fp);
-if (!feof(fp) || ferror(fp) || ret != 1) {
+if (!(ret = fread(buf, 1, size, fp))) {
 free(buf);
 fclose(fp);
 return;
 }
 fclose(fp);
 
+/* The number of bytes read is often much smaller than the BAR size */
+size = ret;
+
 /* Copy ROM contents into the space backing the ROM BAR */
 if (dev-v_addrs[PCI_ROM_SLOT].r_size = size 
 dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase) {
-memcpy(dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase,
-   buf, size);
+memcpy(dev-v_addrs[PCI_ROM_SLOT].u.r_virtbase, buf, size);
 }
 
 free(buf);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: PPC: fix leakage of error page in kvmppc_patch_dcbz()

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Add kvm_release_page_clean() after is_error_page() to avoid
leakage of error page.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index eee97b5..7656b6d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -455,8 +455,10 @@ static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, 
struct kvmppc_pte *pte)
int i;
 
hpage = gfn_to_page(vcpu-kvm, pte-raddr  PAGE_SHIFT);
-   if (is_error_page(hpage))
+   if (is_error_page(hpage)) {
+   kvm_release_page_clean(hpage);
return;
+   }
 
hpage_offset = pte-raddr  ~PAGE_MASK;
hpage_offset = ~0xFFFULL;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: destroy workqueue on kvm_create_pit() failures

2010-08-17 Thread Avi Kivity
From: Xiaotian Feng df...@redhat.com

kernel needs to destroy workqueue if kvm_create_pit() fails, otherwise
after pit is freed, the workqueue is leaked.

Signed-off-by: Xiaotian Feng df...@redhat.com
Cc: Avi Kivity a...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Gleb Natapov g...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gregory Haskins ghask...@novell.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 0fd6378..f539c3c 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -742,7 +742,7 @@ fail:
kvm_unregister_irq_mask_notifier(kvm, 0, pit-mask_notifier);
kvm_unregister_irq_ack_notifier(kvm, pit_state-irq_ack_notifier);
kvm_free_irq_source_id(kvm, pit-irq_source_id);
-
+   destroy_workqueue(pit-wq);
kfree(pit);
return NULL;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: PIT: free irq source id in handling error path

2010-08-17 Thread Avi Kivity
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

Free irq source id if create pit workqueue fail

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index f539c3c..ddeb231 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -697,6 +697,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
pit-wq = create_singlethread_workqueue(kvm-pit-wq);
if (!pit-wq) {
mutex_unlock(pit-pit_state.lock);
+   kvm_free_irq_source_id(kvm, pit-irq_source_id);
kfree(pit);
return NULL;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: remove useless label from x86_emulate_insn()

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fccbed6..42725df 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2788,16 +2788,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
c-eip = ctxt-eip;
}
 
-   if (c-src.type == OP_MEM) {
-   if (c-d  NoAccess)
-   goto no_fetch;
+   if ((c-src.type == OP_MEM)  !(c-d  NoAccess)) {
rc = read_emulated(ctxt, ops, c-src.addr.mem,
c-src.valptr, c-src.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
c-src.orig_val = c-src.val;
-   no_fetch:
-   ;
}
 
if (c-src2.type == OP_MEM) {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: add XADD instruction emulation

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1)

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d690daf..41ca98b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2386,7 +2386,8 @@ static struct opcode twobyte_table[256] = {
D(DstReg | SrcMem | ModRM), D(DstReg | SrcMem | ModRM),
D(ByteOp | DstReg | SrcMem | ModRM | Mov), D(DstReg | SrcMem16 | ModRM 
| Mov),
/* 0xC0 - 0xCF */
-   N, N, N, D(DstMem | SrcReg | ModRM | Mov),
+   D(ByteOp | DstMem | SrcReg | ModRM | Lock), D(DstMem | SrcReg | ModRM | 
Lock),
+   N, D(DstMem | SrcReg | ModRM | Mov),
N, N, N, GD(0, group9),
N, N, N, N, N, N, N, N,
/* 0xD0 - 0xDF */
@@ -3532,6 +3533,12 @@ twobyte_insn:
c-dst.val = (c-d  ByteOp) ? (s8) c-src.val :
(s16) c-src.val;
break;
+   case 0xc0 ... 0xc1: /* xadd */
+   emulate_2op_SrcV(add, c-src, c-dst, ctxt-eflags);
+   /* Write back the register source. */
+   c-src.val = c-dst.orig_val;
+   write_register_operand(c-src);
+   break;
case 0xc3:  /* movnti */
c-dst.bytes = c-op_bytes;
c-dst.val = (c-op_bytes == 4) ? (u32) c-src.val :
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: add setcc instruction emulation

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Add setcc instruction emulation (opcode 0x0f 0x90~0x9f)

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 41ca98b..fccbed6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2363,7 +2363,7 @@ static struct opcode twobyte_table[256] = {
/* 0x80 - 0x8F */
X16(D(SrcImm)),
/* 0x90 - 0x9F */
-   N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N,
+   X16(D(ByteOp | DstMem | SrcNone | ModRM| Mov)),
/* 0xA0 - 0xA7 */
D(ImplicitOps | Stack), D(ImplicitOps | Stack),
N, D(DstMem | SrcReg | ModRM | BitOp),
@@ -3425,6 +3425,9 @@ twobyte_insn:
if (test_cc(c-b, ctxt-eflags))
jmp_rel(c, c-src.val);
break;
+   case 0x90 ... 0x9f: /* setcc r/m8 */
+   c-dst.val = test_cc(c-b, ctxt-eflags);
+   break;
case 0xa0:/* push fs */
emulate_push_sreg(ctxt, ops, VCPU_SREG_FS);
break;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: change OUT instruction to use dst instead of src

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Change OUT instruction to use dst instead of src, so we can
reuse those code for all out instructions.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dd2e398..c208315 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2322,12 +2322,12 @@ static struct opcode opcode_table[256] = {
/* 0xE0 - 0xE7 */
N, N, N, N,
D(ByteOp | SrcImmUByte | DstAcc), D(SrcImmUByte | DstAcc),
-   D(ByteOp | SrcImmUByte | DstAcc), D(SrcImmUByte | DstAcc),
+   D(ByteOp | SrcAcc | DstImmUByte), D(SrcAcc | DstImmUByte),
/* 0xE8 - 0xEF */
D(SrcImm | Stack), D(SrcImm | ImplicitOps),
D(SrcImmFAddr | No64), D(SrcImmByte | ImplicitOps),
D(SrcNone | ByteOp | DstAcc), D(SrcNone | DstAcc),
-   D(SrcNone | ByteOp | DstAcc), D(SrcNone | DstAcc),
+   D(ByteOp | SrcAcc | ImplicitOps), D(SrcAcc | ImplicitOps),
/* 0xF0 - 0xF7 */
N, N, N, N,
D(ImplicitOps | Priv), D(ImplicitOps), G(ByteOp, group3), G(0, group3),
@@ -3149,15 +3149,16 @@ special_insn:
break;
case 0xee: /* out dx,al */
case 0xef: /* out dx,(e/r)ax */
-   c-src.val = c-regs[VCPU_REGS_RDX];
+   c-dst.val = c-regs[VCPU_REGS_RDX];
do_io_out:
-   c-dst.bytes = min(c-dst.bytes, 4u);
-   if (!emulator_io_permited(ctxt, ops, c-src.val, c-dst.bytes)) 
{
+   c-src.bytes = min(c-src.bytes, 4u);
+   if (!emulator_io_permited(ctxt, ops, c-dst.val,
+ c-src.bytes)) {
emulate_gp(ctxt, 0);
goto done;
}
-   ops-pio_out_emulated(c-dst.bytes, c-src.val, c-dst.val, 1,
- ctxt-vcpu);
+   ops-pio_out_emulated(c-src.bytes, c-dst.val,
+ c-src.val, 1, ctxt-vcpu);
c-dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xf4:  /* hlt */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: put register operand write back to a function

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Introduce function write_register_operand() to write back the
register operand.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c476a67..d690daf 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1020,6 +1020,25 @@ exception:
return X86EMUL_PROPAGATE_FAULT;
 }
 
+static void write_register_operand(struct operand *op)
+{
+   /* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
+   switch (op-bytes) {
+   case 1:
+   *(u8 *)op-addr.reg = (u8)op-val;
+   break;
+   case 2:
+   *(u16 *)op-addr.reg = (u16)op-val;
+   break;
+   case 4:
+   *op-addr.reg = (u32)op-val;
+   break;  /* 64b: zero-extend */
+   case 8:
+   *op-addr.reg = op-val;
+   break;
+   }
+}
+
 static inline int writeback(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops)
 {
@@ -1029,23 +1048,7 @@ static inline int writeback(struct x86_emulate_ctxt 
*ctxt,
 
switch (c-dst.type) {
case OP_REG:
-   /* The 4-byte case *is* correct:
-* in 64-bit mode we zero-extend.
-*/
-   switch (c-dst.bytes) {
-   case 1:
-   *(u8 *)c-dst.addr.reg = (u8)c-dst.val;
-   break;
-   case 2:
-   *(u16 *)c-dst.addr.reg = (u16)c-dst.val;
-   break;
-   case 4:
-   *c-dst.addr.reg = (u32)c-dst.val;
-   break;  /* 64b: zero-ext */
-   case 8:
-   *c-dst.addr.reg = c-dst.val;
-   break;
-   }
+   write_register_operand(c-dst);
break;
case OP_MEM:
if (c-lock_prefix)
@@ -2971,25 +2974,13 @@ special_insn:
case 0x86 ... 0x87: /* xchg */
xchg:
/* Write back the register source. */
-   switch (c-dst.bytes) {
-   case 1:
-   *(u8 *) c-src.addr.reg = (u8) c-dst.val;
-   break;
-   case 2:
-   *(u16 *) c-src.addr.reg = (u16) c-dst.val;
-   break;
-   case 4:
-   *c-src.addr.reg = (u32) c-dst.val;
-   break;  /* 64b reg: zero-extend */
-   case 8:
-   *c-src.addr.reg = c-dst.val;
-   break;
-   }
+   c-src.val = c-dst.val;
+   write_register_operand(c-src);
/*
 * Write back the memory destination with implicit LOCK
 * prefix.
 */
-   c-dst.val = c-src.val;
+   c-dst.val = c-src.orig_val;
c-lock_prefix = 1;
break;
case 0x88 ... 0x8b: /* mov */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: remove dup code of in/out instruction

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c208315..ac13831 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2924,28 +2924,12 @@ special_insn:
break;
case 0x6c:  /* insb */
case 0x6d:  /* insw/insd */
-   c-dst.bytes = min(c-dst.bytes, 4u);
-   if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX],
- c-dst.bytes)) {
-   emulate_gp(ctxt, 0);
-   goto done;
-   }
-   if (!pio_in_emulated(ctxt, ops, c-dst.bytes,
-c-regs[VCPU_REGS_RDX], c-dst.val))
-   goto done; /* IO is needed, skip writeback */
-   break;
+   c-src.val = c-regs[VCPU_REGS_RDX];
+   goto do_io_in;
case 0x6e:  /* outsb */
case 0x6f:  /* outsw/outsd */
-   c-src.bytes = min(c-src.bytes, 4u);
-   if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX],
- c-src.bytes)) {
-   emulate_gp(ctxt, 0);
-   goto done;
-   }
-   ops-pio_out_emulated(c-src.bytes, c-regs[VCPU_REGS_RDX],
- c-src.val, 1, ctxt-vcpu);
-
-   c-dst.type = OP_NONE; /* nothing to writeback */
+   c-dst.val = c-regs[VCPU_REGS_RDX];
+   goto do_io_out;
break;
case 0x70 ... 0x7f: /* jcc (short) */
if (test_cc(c-b, ctxt-eflags))
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86: explain 'no-kvmclock' kernel parameter

2010-08-17 Thread Avi Kivity
From: Jiri Kosina jkos...@suse.cz

no-kvmclock kernel parameter is missing its explanation in
Documentation/kernel-parameters.txt. Add it.

Signed-off-by: Jiri Kosina jkos...@suse.cz
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index d9239d5..45cf67b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1760,6 +1760,8 @@ and is between 256 and 4096 characters. It is defined in 
the file
 
nojitter[IA64] Disables jitter checking for ITC timers.
 
+   no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
+
nolapic [X86-32,APIC] Do not enable or use the local APIC.
 
nolapic_timer   [X86-32,APIC] Do not use the local APIC timer.
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: introduce DstImmUByte for dst operand decode

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Introduce DstImmUByte for dst operand decode, which
will be used for out instruction.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 42725df..dd2e398 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -54,6 +54,7 @@
 #define DstAcc  (41) /* Destination Accumulator */
 #define DstDI   (51) /* Destination is in ES:(E)DI */
 #define DstMem64(61) /* 64bit memory operand */
+#define DstImmUByte (71) /* 8-bit unsigned immediate operand */
 #define DstMask (71)
 /* Source operand type. */
 #define SrcNone (04) /* No source operand. */
@@ -2694,6 +2695,12 @@ done_prefixes:
decode_register_operand(c-dst, c,
 c-twobyte  (c-b == 0xb6 || c-b == 0xb7));
break;
+   case DstImmUByte:
+   c-dst.type = OP_IMM;
+   c-dst.addr.mem = c-eip;
+   c-dst.bytes = 1;
+   c-dst.val = insn_fetch(u8, 1, c-eip);
+   break;
case DstMem:
case DstMem64:
c-dst = memop;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Add imul real mode test

2010-08-17 Thread Avi Kivity
From: Mohammed Gamal m.gamal...@gmail.com

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/x86/realmode.c b/x86/realmode.c
index 3bbd630..14ea70f 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -733,6 +733,7 @@ void test_long_jmp()
else
print_serial(Long JMP Test: PASS\n);
 }
+
 void test_push_pop()
 {
struct regs inregs = { 0 }, outregs;
@@ -975,6 +976,90 @@ void test_int()
print_serial(int Test 1: PASS\n);
 }
 
+void test_imul()
+{
+   struct regs inregs = { 0 }, outregs;
+
+   MK_INSN(imul8_1, mov $2, %al\n\t
+   mov $-4, %cx\n\t
+   imul %cl\n\t);
+
+   MK_INSN(imul16_1, mov $2, %ax\n\t
+ mov $-4, %cx\n\t
+ imul %cx\n\t);
+
+   MK_INSN(imul32_1, mov $2, %eax\n\t
+  mov $-4, %ecx\n\t
+  imul %ecx\n\t);
+
+   MK_INSN(imul8_2, mov $0x12340002, %eax\n\t
+   mov $4, %cx\n\t
+   imul %cl\n\t);
+
+   MK_INSN(imul16_2, mov $2, %ax\n\t
+   mov $4, %cx\n\t
+   imul %cx\n\t);
+
+   MK_INSN(imul32_2, mov $2, %eax\n\t
+   mov $4, %ecx\n\t
+   imul %ecx\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_imul8_1,
+ insn_imul8_1_end - insn_imul8_1);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || (outregs.eax 
 0xff) != (u8)-8)
+   print_serial(imul Test 1: FAIL\n);
+   else
+   print_serial(imul Test 1: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_imul16_1,
+ insn_imul16_1_end - insn_imul16_1);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= (u16)-8)
+   print_serial(imul Test 2: FAIL\n);
+   else
+   print_serial(imul Test 2: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_imul32_1,
+ insn_imul32_1_end - insn_imul32_1);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= (u32)-8)
+   print_serial(imul Test 3: FAIL\n);
+   else
+   print_serial(imul Test 3: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_imul8_2,
+ insn_imul8_2_end - insn_imul8_2);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || (outregs.eax 
 0x) != 8 ||
+(outregs.eax  0x) != 0x1234)
+   print_serial(imul Test 4: FAIL\n);
+   else
+   print_serial(imul Test 4: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_imul16_2,
+ insn_imul16_2_end - insn_imul16_2);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= 8)
+   print_serial(imul Test 5: FAIL\n);
+   else
+   print_serial(imul Test 5: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_imul32_2,
+ insn_imul32_2_end - insn_imul32_2);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= 8)
+   print_serial(imul Test 6: FAIL\n);
+   else
+   print_serial(imul Test 6: PASS\n);
+}
+
 void realmode_start(void)
 {
test_null();
@@ -998,6 +1083,7 @@ void realmode_start(void)
test_xchg();
test_iret();
test_int();
+   test_imul();
 
exit(0);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Add real mode test for mul instruction

2010-08-17 Thread Avi Kivity
From: Mohammed Gamal m.gamal...@gmail.com

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/x86/realmode.c b/x86/realmode.c
index 14ea70f..3d8aed4 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1060,6 +1060,50 @@ void test_imul()
print_serial(imul Test 6: PASS\n);
 }
 
+void test_mul()
+{
+   struct regs inregs = { 0 }, outregs;
+
+   MK_INSN(mul8, mov $2, %al\n\t
+   mov $4, %cx\n\t
+   imul %cl\n\t);
+
+   MK_INSN(mul16, mov $2, %ax\n\t
+   mov $4, %cx\n\t
+   imul %cx\n\t);
+
+   MK_INSN(mul32, mov $2, %eax\n\t
+   mov $4, %ecx\n\t
+   imul %ecx\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_mul8,
+ insn_mul8_end - insn_mul8);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || (outregs.eax 
 0xff) != 8)
+   print_serial(mul Test 1: FAIL\n);
+   else
+   print_serial(mul Test 1: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_mul16,
+ insn_mul16_end - insn_mul16);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= 8)
+   print_serial(mul Test 2: FAIL\n);
+   else
+   print_serial(mul Test 2: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_mul32,
+ insn_mul32_end - insn_mul32);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= 8)
+   print_serial(mul Test 3: FAIL\n);
+   else
+   print_serial(mul Test 3: PASS\n);
+}
+
 void realmode_start(void)
 {
test_null();
@@ -1084,6 +1128,7 @@ void realmode_start(void)
test_iret();
test_int();
test_imul();
+   test_mul();
 
exit(0);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Add test for xadd instruction

2010-08-17 Thread Avi Kivity
From: Wei Yongjun yj...@cn.fujitsu.com

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/x86/emulator.c b/x86/emulator.c
index 571f48d..a302ffd 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -360,6 +360,56 @@ void test_xchg(void *mem)
   rax == 0x123456789abcdef  *memq == 0xfedcba9876543210);
 }
 
+void test_xadd(void *mem)
+{
+   unsigned long *memq = mem;
+   unsigned long rax;
+
+   asm volatile(mov $0x123456789abcdef, %%rax\n\t
+mov %%rax, (%[memq])\n\t
+mov $0xfedcba9876543210, %%rax\n\t
+xadd %%al, (%[memq])\n\t
+mov %%rax, %[rax]\n\t
+: [rax]=r(rax)
+: [memq]r(memq)
+: memory);
+   report(xadd reg, r/m (1),
+  rax == 0xfedcba98765432ef  *memq == 0x123456789abcdff);
+
+   asm volatile(mov $0x123456789abcdef, %%rax\n\t
+mov %%rax, (%[memq])\n\t
+mov $0xfedcba9876543210, %%rax\n\t
+xadd %%ax, (%[memq])\n\t
+mov %%rax, %[rax]\n\t
+: [rax]=r(rax)
+: [memq]r(memq)
+: memory);
+   report(xadd reg, r/m (2),
+  rax == 0xfedcba987654cdef  *memq == 0x123456789ab);
+
+   asm volatile(mov $0x123456789abcdef, %%rax\n\t
+mov %%rax, (%[memq])\n\t
+mov $0xfedcba9876543210, %%rax\n\t
+xadd %%eax, (%[memq])\n\t
+mov %%rax, %[rax]\n\t
+: [rax]=r(rax)
+: [memq]r(memq)
+: memory);
+   report(xadd reg, r/m (3),
+  rax == 0x89abcdef  *memq == 0x1234567);
+
+   asm volatile(mov $0x123456789abcdef, %%rax\n\t
+mov %%rax, (%[memq])\n\t
+mov $0xfedcba9876543210, %%rax\n\t
+xadd %%rax, (%[memq])\n\t
+mov %%rax, %[rax]\n\t
+: [rax]=r(rax)
+: [memq]r(memq)
+: memory);
+   report(xadd reg, r/m (4),
+  rax == 0x123456789abcdef  *memq == 0x);
+}
+
 void test_btc(void *mem)
 {
unsigned int *a = mem;
@@ -461,6 +511,7 @@ int main()
test_pop(mem);
 
test_xchg(mem);
+   test_xadd(mem);
 
test_cr8();
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Add real mode tests for div and idiv

2010-08-17 Thread Avi Kivity
From: Mohammed Gamal m.gamal...@gmail.com

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/x86/realmode.c b/x86/realmode.c
index 3d8aed4..35f6a16 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1104,6 +1104,96 @@ void test_mul()
print_serial(mul Test 3: PASS\n);
 }
 
+void test_div()
+{
+   struct regs inregs = { 0 }, outregs;
+
+   MK_INSN(div8, mov $257, %ax\n\t
+   mov $2, %cl\n\t
+   div %cl\n\t);
+
+   MK_INSN(div16, mov $512, %ax\n\t
+   mov $5, %cx\n\t
+   div %cx\n\t);
+
+   MK_INSN(div32, mov $512, %eax\n\t
+   mov $5, %ecx\n\t
+   div %ecx\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+insn_div8,
+insn_div8_end - insn_div8);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= 384)
+   print_serial(div Test 1: FAIL\n);
+   else
+   print_serial(div Test 1: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_div16,
+ insn_div16_end - insn_div16);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= 102 ||
+   outregs.edx != 2)
+   print_serial(div Test 2: FAIL\n);
+   else
+   print_serial(div Test 2: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_div32,
+ insn_div32_end - insn_div32);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= 102 ||
+   outregs.edx != 2)
+   print_serial(div Test 3: FAIL\n);
+   else
+   print_serial(div Test 3: PASS\n);
+}
+
+void test_idiv()
+{
+   struct regs inregs = { 0 }, outregs;
+
+   MK_INSN(idiv8, mov $256, %ax\n\t
+   mov $-2, %cl\n\t
+   idiv %cl\n\t);
+
+   MK_INSN(idiv16, mov $512, %ax\n\t
+   mov $-2, %cx\n\t
+   idiv %cx\n\t);
+
+   MK_INSN(idiv32, mov $512, %eax\n\t
+   mov $-2, %ecx\n\t
+   idiv %ecx\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+insn_idiv8,
+insn_idiv8_end - insn_idiv8);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= (u8)-128)
+   print_serial(idiv Test 1: FAIL\n);
+   else
+   print_serial(idiv Test 1: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_idiv16,
+ insn_idiv16_end - insn_idiv16);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= (u16)-256)
+   print_serial(idiv Test 2: FAIL\n);
+   else
+   print_serial(idiv Test 2: PASS\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+ insn_idiv32,
+ insn_idiv32_end - insn_idiv32);
+
+   if (!regs_equal(inregs, outregs, R_AX | R_CX | R_DX) || outregs.eax 
!= (u32)-256)
+   print_serial(idiv Test 3: FAIL\n);
+   else
+   print_serial(idiv Test 3: PASS\n);
+}
+
 void realmode_start(void)
 {
test_null();
@@ -1129,6 +1219,8 @@ void realmode_start(void)
test_int();
test_imul();
test_mul();
+   test_div();
+   test_idiv();
 
exit(0);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Intel VT-d and KVM

2010-08-17 Thread Nirmal Guhan
On Sat, Aug 7, 2010 at 7:05 PM, Nirmal Guhan vavat...@gmail.com wrote:
 On Sat, Aug 7, 2010 at 8:29 AM, Alex Williamson
 alex.william...@redhat.com wrote:
 On Fri, 2010-08-06 at 17:29 -0700, Nirmal Guhan wrote:
 On Thu, Aug 5, 2010 at 10:44 PM, Alex Williamson
 alex.william...@redhat.com wrote:
  On Thu, Aug 5, 2010 at 12:53 PM, Nirmal Guhan vavat...@gmail.com wrote:
  Hi,
 
  Am using Fedora 12 2.6.32.10-90.fc12.i686 on both host and guest. I
  see that the packets destined for a particular port (iperf/5001 if
  that matters) in guest can be captured using tcpdump on host whereas
  the reverse is not true i.e I run iperf server on host and tcpdump on
  guest can not read the packets sent to host. Is this expected behavior
  ?
 
  Yes.

 So all the packets are recd by the host kernel and sent to guest ? Is
 this the high level flow?

 Yes, when you used bridged/tap networking, all packets first go to the
 host, the bridge, then the guests.

 
  I have enabled VT-d (through intel_iommu=on) and so was thinking that
  guest will read the packets directly. If this is true, then wonder how
  tcpdump on host can read guest packets ? or is my understanding wrong
  ? Please clarify.
 
  Enabling VT-d on the host is only the first step, that doesn't
  automatically change the behavior of the guest.  VT-d allows you to
  make use of the -pcidevice (or preferably -device pci-assign) option
  to kvm, which exposes a PCI device directly to the guest.  For
  instance if you have a NIC that you want to dedicate to a guest at PCI
  address 00:19.0, you can use -device pci-assign,host=00:19.0, which
  should show up (more than likely at a different PCI address) in the
  guest.  (you'll have to unbind the device from host drivers, but the
  error messages will tell you how to do that) In this model, packets
  destined for the guest are only seen by the guest.

 Thanks. This worked, surprisingly with a performance penalty. The
 guest ethernet device (eth2 in my case) came up with 10Mb/s speed. I
 changed the speed to 100Mb/s using ethtool but still the performance
 (Mbits/sec using iperf) did not improve. Any clues?

 What's the device? (lspci -vvv from the host)  Link speed shouldn't
 depend on VM performance.  What are you using to measure performance?

 It is Intel e1000e driver.
 # lspci -vvv
 00:19.0 Ethernet controller: Intel Corporation Device 10ef (rev 06)
        Subsystem: Intel Corporation Device 
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
 ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
 TAbort- MAbort- SERR- PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 41
        Region 0: Memory at ff50 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at ff57 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at f040 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
 PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: fee0f00c  Data: 4182
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: e1000e
        Kernel modules: e1000e

 Am using iperf to measure performance with identical invocation for
 both pci-passthrough(VT-d) and no pci-passthrough case. Command used :
 iperf -c addr -w 16000 - window size selected was 32K though 16K
 was requested.
  0.0-30.0 sec  33.6 MBytes  9.39 Mbits/sec - guest with pci pass-through + 
 VT-d
 0.0-30.0 sec    324 MBytes  90.6 Mbits/sec  -- guest without pci pass-through
  0.0-30.0 sec   334 MBytes  93.3 Mbits/sec --- host

 --Nirmal





 Thanks,

 Alex

Adding kvm forum back. Please help!

Thanks, Nirmal
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: intel_scu_ipc

2010-08-17 Thread Zhou Peng
Thanks :)

2010/8/17 Hong Liu hong@intel.com:
 On Tue, 2010-08-17 at 13:59 +0800, Zhou Peng wrote:
 After compile disabling intel_scu_ipc, the kernel is  vmlinuz-2.6.35+
 In linux kernel 2.6.35 mrst.h is the same with kvm
 (http://lxr.free-electrons.com/source/arch/x86/include/asm/mrst.h)

 I've checked the latest Linus linux-2.6 tree, it should be fixed.

 2010/8/17 Zhou Peng ailvpen...@gmail.com:
  This problem seem exist in other distribution eg.
  http://stackoverflow.com/questions/3434676/error-while-compiling-the-linux-kernel-2-6-35
  http://www.spinics.net/lists/linux-wireless/msg54197.html
  How can the kvm to compile it if  without disabling intel_scu_ipc?.
 
  CC to KVM-ML
 
  2010/8/17 Hong Liu hong@intel.com:
  On Tue, 2010-08-17 at 13:12 +0800, Zhou Peng wrote:
  I don't know  :)
  I get the kernel(http://www.linux-kvm.org/page/Code) by
 
  git clone git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 
  Looks like it merged the intel_scu_ipc code while forgetting the mrst.h
  change.
 
 
  2010/8/17 Hong Liu hong@intel.com:
   On Tue, 2010-08-17 at 12:54 +0800, Zhou Peng wrote:
   Hi,
  
   But MRST_CPU_CHIP_PENWELL is missing in my source. The complete file 
   is below.
  
   So... which kernel are you using for compile?
  
   Thanks,
   Hong
  
  
   ---
   kvm$ find -name mrst.h
   ./include/config/x86/mrst.h
   ./arch/x86/include/asm/mrst.h
   kvm$ vim arch/x86/include/asm/mrst.h
   kvm$ vim include/config/x86/mrst.h
   kvm$ cat include/config/x86/mrst.h
   kvm$ cat arch/x86/include/asm/mrst.h
   /*
    * mrst.h: Intel Moorestown platform specific setup code
    *
    * (C) Copyright 2009 Intel Corporation
    *
    * This program is free software; you can redistribute it and/or
    * modify it under the terms of the GNU General Public License
    * as published by the Free Software Foundation; version 2
    * of the License.
    */
   #ifndef _ASM_X86_MRST_H
   #define _ASM_X86_MRST_H
   extern int pci_mrst_init(void);
   int __init sfi_parse_mrtc(struct sfi_table_header *table);
  
   #define SFI_MTMR_MAX_NUM 8
   #define SFI_MRTC_MAX  8
  
   #endif /* _ASM_X86_MRST_H */
  
  
   Thanks,
  
   2010/8/17 Hong Liu hong@intel.com:
On Tue, 2010-08-17 at 11:38 +0800, Ds, Sreedhara wrote:
   
-Original Message-
From: Zhou Peng [mailto:ailvpen...@gmail.com]
Sent: Tuesday, August 17, 2010 8:26 AM
To: Ds, Sreedhara
Subject: intel_scu_ipc
   
Hi Sreedhara DS,
   
Where does MRST_CPU_CHIP_PENWELL  be defined please?
   
It is defined in asm/mrst.h, please check the
arch/x86/include/asm/mrst.h file, seems there is problem with the
sfi_table_header structure which defined in include/linux/sfi.h.
   
Thanks,
Hong
   
   
While compiling the kvm linux kernel on my ubuntu 10.04(Linux
-laptop 2.6.32-24-generic #39-Ubuntu SMP Wed Jul 28 06:07:29 
UTC
2010 i686 GNU/Linux), the  err below appear, How to figure out 
please?
   
=err msg=
drivers/platform/x86/intel_scu_ipc.c: In function 'pwr_reg_rdwr':
drivers/platform/x86/intel_scu_ipc.c:175: error:
'MRST_CPU_CHIP_PENWELL' undeclared (first use in this function)
drivers/platform/x86/intel_scu_ipc.c:175: error: (Each undeclared
identifier is rep
orted only once
drivers/platform/x86/intel_scu_ipc.c:175: error: for each function 
it
appears in.)
drivers/platform/x86/intel_scu_ipc.c: In function
'intel_scu_ipc_init':
drivers/platform/x86/intel_scu_ipc.c:741: error: implicit 
declaration
of function '
mrst_identify_cpu'
make[3]: *** [drivers/platform/x86/intel_scu_ipc.o] Error 1
make[2]: *** [drivers/platform/x86] Error 2
make[1]: *** [drivers/platform] Error 2
make: *** [drivers] Error 2
   
   
=search the kvm kernel tree=
kvm$ grep MRST_CPU_CHIP_PENWELL . -R
./drivers/platform/x86/intel_scu_ipc.c:    if (platform
!=MRST_CPU_CHIP_PENWELL) {
./drivers/platform/x86/intel_scu_ipc.c:        if (platform
!=MRST_CPU_CHIP_PENWELL) {
   
Thanks,
   
Best,
   
   
   
  
  
  
  
  
  
 
 
 
 
 
 
 
 
 
  --
  Zhou Peng
 









-- 
Zhou Peng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vhost-net unreleased?

2010-08-17 Thread Christian Theune

Hi,

I've been plugging through code and presentations trying to find out 
whether the KVM/qemu side of vhost-net has been released yet. The git 
archive seems to include the vhost-net code since about a year already, 
but I could not find any trace of it in 0.12.5.


Can you confirm, that vhost-net on the kvm/qemu side is not released 
yet? And if it is so, does anyone have a gut feeling of when it will be?


Christian

--
Christian Theune · c...@gocept.com
gocept gmbh  co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1
Zope and Plone consulting and development

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] KVM: PIT: free irq source id in handling error path

2010-08-17 Thread Xiao Guangrong
Free irq source id if create pit workqueue fail

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/i8254.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 0fd6378..211716f 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -697,6 +697,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
pit-wq = create_singlethread_workqueue(kvm-pit-wq);
if (!pit-wq) {
mutex_unlock(pit-pit_state.lock);
+   kvm_free_irq_source_id(kvm, pit-irq_source_id);
kfree(pit);
return NULL;
}
-- 
1.6.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cgroups: fix API thinko

2010-08-17 Thread Li Zefan
(Just came back from vacation)

Michael S. Tsirkin wrote:
 cgroup_attach_task_current_cg API that have upstream is backwards: we
 really need an API to attach to the cgroups from another process A to
 the current one.
 
 In our case (vhost), a priveledged user wants to attach it's task to cgroups
 from a less priveledged one, the API makes us run it in the other
 task's context, and this fails.
 
 So let's make the API generic and just pass in 'from' and 'to' tasks.
 Add an inline wrapper for cgroup_attach_task_current_cg to avoid
 breaking bisect.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Acked-by: Li Zefan l...@cn.fujitsu.com

I also don't like the name, but I'm not good at English or naming. ;)

 ---
 
 Paul, Li, Sridhar, could you please review the following
 patch?
 
 I only compile-tested it due to travel, but looks
 straight-forward to me.
 Alex Williamson volunteered to test and report the results.
 Sending out now for review as I might be offline for a bit.
 Will only try to merge when done, obviously.
 
 If OK, I would like to merge this through -net tree,
 together with the patch fixing vhost-net.
 Let me know if that sounds ok.
 

That's Ok.

...
 diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
 index 43b2072..b38ec60 100644
 --- a/include/linux/cgroup.h
 +++ b/include/linux/cgroup.h
 @@ -525,7 +525,11 @@ struct task_struct *cgroup_iter_next(struct cgroup *cgrp,
  void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it);
  int cgroup_scan_tasks(struct cgroup_scanner *scan);
  int cgroup_attach_task(struct cgroup *, struct task_struct *);
 -int cgroup_attach_task_current_cg(struct task_struct *);
 +int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);

a nitpick:

better add a blank line here.

 +static inline int cgroup_attach_task_current_cg(struct task_struct *tsk)
 +{
 + return cgroup_attach_task_all(current, tsk);
 +}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost-net unreleased?

2010-08-17 Thread Avi Kivity

 On 08/17/2010 09:58 AM, Christian Theune wrote:

Hi,

I've been plugging through code and presentations trying to find out 
whether the KVM/qemu side of vhost-net has been released yet. The git 
archive seems to include the vhost-net code since about a year 
already, but I could not find any trace of it in 0.12.5.


Can you confirm, that vhost-net on the kvm/qemu side is not released 
yet? And if it is so, does anyone have a gut feeling of when it will be?


vhost-net will be supported by qemu-kvm 0.13 which is on track for 
release soon.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Emulator INTn and SCAS fixes

2010-08-17 Thread Avi Kivity
The following patchset makes INTn work and implements SCAS (used by vgabios).
With the patchset, vgabios is able to display its splash screen but gets
confused shortly afterwards.

Based on the non-atomic-injection branch.

Avi Kivity (4):
  KVM: Initialize operand and address sizes before emulating interrupts
  KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS
  KVM: x86 emulator: implement SCAS (opcodes AE, AF)
  KVM: x86 emulator: fix REPZ/REPNZ termination condition

 arch/x86/kvm/emulate.c |   60 +--
 arch/x86/kvm/x86.c |2 +
 2 files changed, 39 insertions(+), 23 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS

2010-08-17 Thread Avi Kivity
emulate_push() only schedules a push; it doesn't actually push anything.
Call writeback() to flush out the write.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |   13 -
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f1ec023..0e8f25e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1228,7 +1228,7 @@ int emulate_int_real(struct x86_emulate_ctxt *ctxt,
   struct x86_emulate_ops *ops, int irq)
 {
struct decode_cache *c = ctxt-decode;
-   int rc = X86EMUL_CONTINUE;
+   int rc;
struct desc_ptr dt;
gva_t cs_addr;
gva_t eip_addr;
@@ -1238,14 +1238,25 @@ int emulate_int_real(struct x86_emulate_ctxt *ctxt,
/* TODO: Add limit checks */
c-src.val = ctxt-eflags;
emulate_push(ctxt, ops);
+   rc = writeback(ctxt, ops);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
 
ctxt-eflags = ~(EFLG_IF | EFLG_TF | EFLG_AC);
 
c-src.val = ops-get_segment_selector(VCPU_SREG_CS, ctxt-vcpu);
emulate_push(ctxt, ops);
+   rc = writeback(ctxt, ops);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
 
c-src.val = c-eip;
emulate_push(ctxt, ops);
+   rc = writeback(ctxt, ops);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   c-dst.type = OP_NONE;
 
ops-get_idt(dt, ctxt-vcpu);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: x86 emulator: fix REPZ/REPNZ termination condition

2010-08-17 Thread Avi Kivity
EFLAGS.ZF needs to be checked after each iteration, not before.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |   42 +++---
 1 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0c0ada9..a2edfb1 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2747,6 +2747,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
int rc = X86EMUL_CONTINUE;
int saved_dst_type = c-dst.type;
int irq; /* Used for int 3, int, and into */
+   ulong old_eip;
 
ctxt-decode.mem_read.pos = 0;
 
@@ -2771,28 +2772,10 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
ctxt-restart = true;
/* All REP prefixes have the same first termination condition */
if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) {
-   string_done:
ctxt-restart = false;
ctxt-eip = c-eip;
goto done;
}
-   /* The second termination condition only applies for REPE
-* and REPNE. Test if the repeat string operation prefix is
-* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
-* corresponding termination condition according to:
-*  - if REPE/REPZ and ZF = 0 then done
-*  - if REPNE/REPNZ and ZF = 1 then done
-*/
-   if ((c-b == 0xa6) || (c-b == 0xa7) ||
-   (c-b == 0xae) || (c-b == 0xaf)) {
-   if ((c-rep_prefix == REPE_PREFIX) 
-   ((ctxt-eflags  EFLG_ZF) == 0))
-   goto string_done;
-   if ((c-rep_prefix == REPNE_PREFIX) 
-   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))
-   goto string_done;
-   }
-   c-eip = ctxt-eip;
}
 
if (c-src.type == OP_MEM) {
@@ -3229,6 +3212,7 @@ special_insn:
}
 
 writeback:
+   old_eip = c-eip;
rc = writeback(ctxt, ops);
if (rc != X86EMUL_CONTINUE)
goto done;
@@ -3250,13 +3234,33 @@ writeback:
if (c-rep_prefix  (c-d  String)) {
struct read_cache *rc = ctxt-decode.io_read;
register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
+   /* The second termination condition only applies for REPE
+* and REPNE. Test if the repeat string operation prefix is
+* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
+* corresponding termination condition according to:
+*  - if REPE/REPZ and ZF = 0 then done
+*  - if REPNE/REPNZ and ZF = 1 then done
+*/
+   if ((c-b == 0xa6) || (c-b == 0xa7) ||
+   (c-b == 0xae) || (c-b == 0xaf)) {
+   trace_printk(c-eip %lx ctxt-eip %lx\n,
+c-eip, ctxt-eip);
+   if (((c-rep_prefix == REPE_PREFIX) 
+((ctxt-eflags  EFLG_ZF) == 0))
+   || ((c-rep_prefix == REPNE_PREFIX) 
+   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))) {
+   ctxt-restart = false;
+   }
+   }
/*
 * Re-enter guest when pio read ahead buffer is empty or,
 * if it is not used, after each 1024 iteration.
 */
if ((rc-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
-   (rc-end != 0  rc-end == rc-pos))
+   (rc-end != 0  rc-end == rc-pos)) {
ctxt-restart = false;
+   c-eip = ctxt-eip;
+   }
}
/*
 * reset read cache here in case string instruction is restared
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: x86 emulator: implement SCAS (opcodes AE, AF)

2010-08-17 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0e8f25e..0c0ada9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2308,7 +2308,7 @@ static struct opcode opcode_table[256] = {
D(DstAcc | SrcImmByte | ByteOp), D(DstAcc | SrcImm),
D(ByteOp | SrcAcc | DstDI | Mov | String), D(SrcAcc | DstDI | Mov | 
String),
D(ByteOp | SrcSI | DstAcc | Mov | String), D(SrcSI | DstAcc | Mov | 
String),
-   D(ByteOp | DstDI | String), D(DstDI | String),
+   D(ByteOp | SrcAcc | DstDI | String), D(SrcAcc | DstDI | String),
/* 0xB0 - 0xB7 */
X8(D(ByteOp | DstReg | SrcImm | Mov)),
/* 0xB8 - 0xBF */
@@ -3068,8 +3068,7 @@ special_insn:
case 0xac ... 0xad: /* lods */
goto mov;
case 0xae ... 0xaf: /* scas */
-   DPRINTF(Urk! I don't handle SCAS.\n);
-   goto cannot_emulate;
+   goto cmp;
case 0xb0 ... 0xbf: /* mov r, imm */
goto mov;
case 0xc0 ... 0xc1:
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: Initialize operand and address sizes before emulating interrupts

2010-08-17 Thread Avi Kivity
The emulator needs the operand and address sizes to be valid.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6a77fa1..f6a31a1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3965,6 +3965,8 @@ int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, 
int irq)
 
init_emulate_ctxt(vcpu);
 
+   vcpu-arch.emulate_ctxt.decode.op_bytes = 2;
+   vcpu-arch.emulate_ctxt.decode.ad_bytes = 2;
ret = emulate_int_real(vcpu-arch.emulate_ctxt, emulate_ops, irq);
 
if (ret != X86EMUL_CONTINUE)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost-net unreleased?

2010-08-17 Thread Christian Theune

On 08/17/2010 10:13 AM, Avi Kivity wrote:

On 08/17/2010 09:58 AM, Christian Theune wrote:

Hi,

I've been plugging through code and presentations trying to find out
whether the KVM/qemu side of vhost-net has been released yet. The git
archive seems to include the vhost-net code since about a year
already, but I could not find any trace of it in 0.12.5.

Can you confirm, that vhost-net on the kvm/qemu side is not released
yet? And if it is so, does anyone have a gut feeling of when it will be?


vhost-net will be supported by qemu-kvm 0.13 which is on track for
release soon.


Ok, thanks for the clarification. I'll hold my breath then. :)


--
Christian Theune · c...@gocept.com
gocept gmbh  co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1
Zope and Plone consulting and development

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] KVM: PIT: free irq source id in handling error path

2010-08-17 Thread Avi Kivity

 On 08/17/2010 10:02 AM, Xiao Guangrong wrote:

Free irq source id if create pit workqueue fail



Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] KVM: PPC: fix leakage of error page in kvmppc_patch_dcbz()

2010-08-17 Thread Avi Kivity
 On 08/17/2010 05:08 AM, Wei Yongjun wrote:
 Add kvm_release_page_clean() after is_error_page() to avoid
 leakage of error page.

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] KVM: x86 emulator: put register operand write back to a function

2010-08-17 Thread Avi Kivity
 On 08/17/2010 04:17 AM, Wei Yongjun wrote:
 Introduce function write_register_operand() to write back the
 register operand.

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Mon, Aug 16, 2010 at 03:34:12PM -0500, Anthony Liguori wrote:
 On 08/16/2010 01:42 PM, Christoph Hellwig wrote:
 On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote:
 Also, ext4 is _very_ slow on O_SYNC writes (which is
 used in kvm with default cache).
 Yeah, we probably need to switch to sync_file_range() to avoid the
 journal commit on every write.
 
 No, we don't.  sync_file_range does not actually provide any data
 integrity.
 
 What do you mean by data integrity?

sync_file_range only does pagecache-level writeout of the file data.
It nevers calls into the actual filesystem, that means any block
allocations (for filling holes / converting preallocated space in normal
filesystems, or every write in COW-based filesstems like qcow2) never
get flushes to disk, and even more importantly the disk write cache is
never flushed.

In short it's completely worthless for any real filesystem.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: explain 'no-kvmclock' kernel parameter

2010-08-17 Thread Avi Kivity

 On 08/16/2010 06:51 PM, Jiri Kosina wrote:

no-kvmclock kernel parameter is missing its explanation in
Documentation/kernel-parameters.txt. Add it.


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: x86 emulator: fix REPZ/REPNZ termination condition

2010-08-17 Thread Gleb Natapov
On Tue, Aug 17, 2010 at 11:26:43AM +0300, Avi Kivity wrote:
 EFLAGS.ZF needs to be checked after each iteration, not before.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/emulate.c |   42 +++---
  1 files changed, 23 insertions(+), 19 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 0c0ada9..a2edfb1 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2747,6 +2747,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
   int rc = X86EMUL_CONTINUE;
   int saved_dst_type = c-dst.type;
   int irq; /* Used for int 3, int, and into */
 + ulong old_eip;
Is never used.

  
   ctxt-decode.mem_read.pos = 0;
  
 @@ -2771,28 +2772,10 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
   ctxt-restart = true;
   /* All REP prefixes have the same first termination condition */
   if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) {
 - string_done:
   ctxt-restart = false;
   ctxt-eip = c-eip;
   goto done;
   }
 - /* The second termination condition only applies for REPE
 -  * and REPNE. Test if the repeat string operation prefix is
 -  * REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
 -  * corresponding termination condition according to:
 -  *  - if REPE/REPZ and ZF = 0 then done
 -  *  - if REPNE/REPNZ and ZF = 1 then done
 -  */
 - if ((c-b == 0xa6) || (c-b == 0xa7) ||
 - (c-b == 0xae) || (c-b == 0xaf)) {
 - if ((c-rep_prefix == REPE_PREFIX) 
 - ((ctxt-eflags  EFLG_ZF) == 0))
 - goto string_done;
 - if ((c-rep_prefix == REPNE_PREFIX) 
 - ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))
 - goto string_done;
 - }
 - c-eip = ctxt-eip;
   }
  
   if (c-src.type == OP_MEM) {
 @@ -3229,6 +3212,7 @@ special_insn:
   }
  
  writeback:
 + old_eip = c-eip;
   rc = writeback(ctxt, ops);
   if (rc != X86EMUL_CONTINUE)
   goto done;
 @@ -3250,13 +3234,33 @@ writeback:
   if (c-rep_prefix  (c-d  String)) {
   struct read_cache *rc = ctxt-decode.io_read;
   register_address_increment(c, c-regs[VCPU_REGS_RCX], -1);
 + /* The second termination condition only applies for REPE
 +  * and REPNE. Test if the repeat string operation prefix is
 +  * REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
 +  * corresponding termination condition according to:
 +  *  - if REPE/REPZ and ZF = 0 then done
 +  *  - if REPNE/REPNZ and ZF = 1 then done
 +  */
 + if ((c-b == 0xa6) || (c-b == 0xa7) ||
 + (c-b == 0xae) || (c-b == 0xaf)) {
 + trace_printk(c-eip %lx ctxt-eip %lx\n,
 +  c-eip, ctxt-eip);
 + if (((c-rep_prefix == REPE_PREFIX) 
 +  ((ctxt-eflags  EFLG_ZF) == 0))
 + || ((c-rep_prefix == REPNE_PREFIX) 
 + ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))) {
 + ctxt-restart = false;
Why not jump to string_done label here?

 + }
 + }
   /*
* Re-enter guest when pio read ahead buffer is empty or,
* if it is not used, after each 1024 iteration.
*/
   if ((rc-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
 - (rc-end != 0  rc-end == rc-pos))
 + (rc-end != 0  rc-end == rc-pos)) {
   ctxt-restart = false;
 + c-eip = ctxt-eip;
We can get here when instruction is completed by above if, so same
instruction will reexecute once again.


 + }
   }
   /*
* reset read cache here in case string instruction is restared
 -- 
 1.7.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: x86 emulator: fix REPZ/REPNZ termination condition

2010-08-17 Thread Avi Kivity

 On 08/17/2010 12:13 PM, Gleb Natapov wrote:

On Tue, Aug 17, 2010 at 11:26:43AM +0300, Avi Kivity wrote:

EFLAGS.ZF needs to be checked after each iteration, not before.

Signed-off-by: Avi Kivitya...@redhat.com
---
  arch/x86/kvm/emulate.c |   42 +++---
  1 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0c0ada9..a2edfb1 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2747,6 +2747,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
int rc = X86EMUL_CONTINUE;
int saved_dst_type = c-dst.type;
int irq; /* Used for int 3, int, and into */
+   ulong old_eip;

Is never used.



Whoops.


@@ -3250,13 +3234,33 @@ writeback:
if (c-rep_prefix  (c-d  String)) {
struct read_cache *rc =ctxt-decode.io_read;
register_address_increment(c,c-regs[VCPU_REGS_RCX], -1);
+   /* The second termination condition only applies for REPE
+* and REPNE. Test if the repeat string operation prefix is
+* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
+* corresponding termination condition according to:
+*  - if REPE/REPZ and ZF = 0 then done
+*  - if REPNE/REPNZ and ZF = 1 then done
+*/
+   if ((c-b == 0xa6) || (c-b == 0xa7) ||
+   (c-b == 0xae) || (c-b == 0xaf)) {
+   trace_printk(c-eip %lx ctxt-eip %lx\n,
+c-eip, ctxt-eip);
+   if (((c-rep_prefix == REPE_PREFIX)
+((ctxt-eflags  EFLG_ZF) == 0))
+   || ((c-rep_prefix == REPNE_PREFIX)
+   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))) {
+   ctxt-restart = false;

Why not jump to string_done label here?


It does a 'goto done;' which skips a couple of things.


+   }
+   }
/*
 * Re-enter guest when pio read ahead buffer is empty or,
 * if it is not used, after each 1024 iteration.
 */
if ((rc-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
-   (rc-end != 0  rc-end == rc-pos))
+   (rc-end != 0  rc-end == rc-pos)) {
ctxt-restart = false;
+   c-eip = ctxt-eip;

We can get here when instruction is completed by above if, so same
instruction will reexecute once again.


Not good.  Will redo (and write tests).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity

 On 08/17/2010 12:07 PM, Christoph Hellwig wrote:


In short it's completely worthless for any real filesystem.



The documentation should be updated then.  It suggests that it is usable 
for data integrity.


(or maybe, it should be fixed?)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: x86 emulator: fix REPZ/REPNZ termination condition

2010-08-17 Thread Gleb Natapov
On Tue, Aug 17, 2010 at 12:20:34PM +0300, Avi Kivity wrote:
  On 08/17/2010 12:13 PM, Gleb Natapov wrote:
 On Tue, Aug 17, 2010 at 11:26:43AM +0300, Avi Kivity wrote:
 EFLAGS.ZF needs to be checked after each iteration, not before.
 
 Signed-off-by: Avi Kivitya...@redhat.com
 ---
   arch/x86/kvm/emulate.c |   42 +++---
   1 files changed, 23 insertions(+), 19 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 0c0ada9..a2edfb1 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2747,6 +2747,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 int rc = X86EMUL_CONTINUE;
 int saved_dst_type = c-dst.type;
 int irq; /* Used for int 3, int, and into */
 +   ulong old_eip;
 Is never used.
 
 
 Whoops.
 
 @@ -3250,13 +3234,33 @@ writeback:
 if (c-rep_prefix  (c-d  String)) {
 struct read_cache *rc =ctxt-decode.io_read;
 register_address_increment(c,c-regs[VCPU_REGS_RCX], -1);
 +   /* The second termination condition only applies for REPE
 +* and REPNE. Test if the repeat string operation prefix is
 +* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
 +* corresponding termination condition according to:
 +*  - if REPE/REPZ and ZF = 0 then done
 +*  - if REPNE/REPNZ and ZF = 1 then done
 +*/
 +   if ((c-b == 0xa6) || (c-b == 0xa7) ||
 +   (c-b == 0xae) || (c-b == 0xaf)) {
 +   trace_printk(c-eip %lx ctxt-eip %lx\n,
 +c-eip, ctxt-eip);
 +   if (((c-rep_prefix == REPE_PREFIX)
 +((ctxt-eflags  EFLG_ZF) == 0))
 +   || ((c-rep_prefix == REPNE_PREFIX)
 +   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))) {
 +   ctxt-restart = false;
 Why not jump to string_done label here?
 
 It does a 'goto done;' which skips a couple of things.
 
The only thing it skips is:
ctxt-decode.mem_read.end = 0;
as far as I can see. And this is ok if instruction is completed.

 +   }
 +   }
 /*
  * Re-enter guest when pio read ahead buffer is empty or,
  * if it is not used, after each 1024 iteration.
  */
 if ((rc-end == 0  !(c-regs[VCPU_REGS_RCX]  0x3ff)) ||
 -   (rc-end != 0  rc-end == rc-pos))
 +   (rc-end != 0  rc-end == rc-pos)) {
 ctxt-restart = false;
 +   c-eip = ctxt-eip;
 We can get here when instruction is completed by above if, so same
 instruction will reexecute once again.
 
 Not good.  Will redo (and write tests).
 
 
 -- 
 error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0 of 3] Fix KVM on PowerPC 440GP

2010-08-17 Thread Avi Kivity

 On 08/07/2010 08:33 PM, hollis_blanch...@mentor.com wrote:

Hi Avi, these patches make KVM run on 440GP (the only 440-based SoC that wasn't
passing the compatibility check) and fix or enhance a couple very minor issues
in related code. Please apply.


Patches don't apply (at least the first), please rebase.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm] build fail on i386 RHEL5u4

2010-08-17 Thread Avi Kivity

 On 08/16/2010 11:46 AM, Avi Kivity wrote:

 On 08/16/2010 04:27 AM, Hao, Xudong wrote:



Appears to be a gcc bug.  I opened
https://bugzilla.redhat.com/show_bug.cgi?id=624279 to track this.

Meanwhile, installing the gcc44 package and building with it
(./configure --cc=gcc44) appears to work.

Avi,
Gcc44 works for me.
I saw Jakub marked this bug closed with only i486 support that, but 
RHEL5 use -march=i386, so do we have ongoing fix on qemu-kvm?


Should be easy to add a ./configure test for this.




Or, just use --extra-cflags=-march=i686 or similar.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test: Add test for xadd instruction

2010-08-17 Thread Avi Kivity
 On 08/12/2010 04:44 PM, Wei Yongjun wrote:

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0 of 3] Fix KVM on PowerPC 440GP

2010-08-17 Thread Alexander Graf

On 17.08.2010, at 11:27, Avi Kivity wrote:

 On 08/07/2010 08:33 PM, hollis_blanch...@mentor.com wrote:
 Hi Avi, these patches make KVM run on 440GP (the only 440-based SoC that 
 wasn't
 passing the compatibility check) and fix or enhance a couple very minor 
 issues
 in related code. Please apply.
 
 Patches don't apply (at least the first), please rebase.

I have a queue lying around locally anyways that I want to push today, so I'll 
just add them to it.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: guest MAC-address isolation

2010-08-17 Thread Avi Kivity

 On 08/06/2010 08:09 PM, Robert Rebstock wrote:

Hello all,

can anyone recommend a better way to achive (guest agnostic) MAC-address
isolation in qemu/kvm then with user-mode networking?

I have multiple guests requiring the same MAC-address, and user-mode/slirp
networking is quite slow.



You can put the different guests on different bridges, and use IP 
routing to connect the two bridges; or you can use ebtables to mangle 
the MAC addresses.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86 emulator: add setcc instruction emulation

2010-08-17 Thread Avi Kivity
 On 08/06/2010 12:10 PM, Wei Yongjun wrote:
 Add setcc instruction emulation (opcode 0x0f 0x90~0x9f)


Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86 emulator: remove useless label from x86_emulate_insn()

2010-08-17 Thread Avi Kivity
 On 08/06/2010 10:36 AM, Wei Yongjun wrote:
 Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Relationship between libkvm and qemu-kvm.c

2010-08-17 Thread Avi Kivity

 On 08/16/2010 12:31 AM, SHEN Hao wrote:

Hello, everyone,

I am a little bit confusing with the qemu-kvm project in which I found
some similar code in both libkvm and qemu-kvm.c. Is the libkvm really
used by qemu? What's the relationship
between them?


libkvm is no longer used by qemu.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: introduce DstImmUByte for dst operand decode

2010-08-17 Thread Avi Kivity
 On 08/06/2010 06:36 AM, Wei Yongjun wrote:
 Introduce DstImmUByte for dst operand decode, which
 will be used for out instruction.

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] device-assignment: PCI option ROM fixes

2010-08-17 Thread Avi Kivity

 On 07/30/2010 10:40 PM, Alex Williamson wrote:

Changeset b4f8c249 in kvm.git makes the mprotects in device assignment
produce a Bad address hang when a device with an option ROM is
assigned.  We can avoid this by just using the slow mapping path since
ROM access doesn't need to be fast.  Apparently nobody has ever mapped
a ROM via this path, because passing NULL to cpu_register_io_memory()
doesn't work.  I also found we're overly restrictive in copying the
ROM from the host, I must have been lucky and had a ROM that matched
the BAR size when I added this.


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm or bios bug?

2010-08-17 Thread Zhou Peng
Hi all,

I have enabled Virtualization in BIOS. However, when I  modprobe
kvm-intel, I get $ sudo modprobe kvm-intel
FATAL: Error inserting kvm_intel
(/lib/modules/2.6.35+/kernel/arch/x86/kvm/kvm-intel.ko): Operation not
supported

And $ dmesg | grep kvm
[  220.670287] kvm: disabled by bios

egrep '(vmx|svm)' /proc/cpuinfo
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida
tpr_shadow vnmi flexpriority
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida
tpr_shadow vnmi flexpriority


$ sudo biosdecode
# biosdecode 2.9
PNP BIOS 1.0 present.
Event Notification: Not Supported
Real Mode 16-bit Code Address: F000:57F7
Real Mode 16-bit Data Address: F000:
16-bit Protected Mode Code Address: 0x000F57DE
16-bit Protected Mode Data Address: 0x000F
OEM Device Identifier: @d...@3c00
SMBIOS 2.4 present.
Structure Table Length: 970 bytes
Structure Table Address: 0x7FCC3000
Number Of Structures: 20
Maximum Structure Size: 159 bytes
ACPI 2.0 present.
OEM Identifier: HPQOEM
RSD Table 32-bit Address: 0x7FDFE0AC
XSD Table 64-bit Address: 0x7FDFE120


The laptop is hp 4411s with intel T6570 cpu.


How to figure out this problem pls?

Thanks,
-- 
Zhou Peng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm or bios bug?

2010-08-17 Thread Avi Kivity

 On 08/17/2010 01:58 PM, Zhou Peng wrote:

Hi all,

I have enabled Virtualization in BIOS. However, when I  modprobe
kvm-intel, I get $ sudo modprobe kvm-intel
FATAL: Error inserting kvm_intel
(/lib/modules/2.6.35+/kernel/arch/x86/kvm/kvm-intel.ko): Operation not
supported

And $ dmesg | grep kvm
[  220.670287] kvm: disabled by bios


Try to power cycle your machine.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 12:23:01PM +0300, Avi Kivity wrote:
  On 08/17/2010 12:07 PM, Christoph Hellwig wrote:
 
 In short it's completely worthless for any real filesystem.
 
 
 The documentation should be updated then.  It suggests that it is
 usable for data integrity.

The manpage has a warning section documenting what I said above since
I added it in January.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] test: Add imul real mode test

2010-08-17 Thread Avi Kivity

 On 08/08/2010 09:13 PM, Mohammed Gamal wrote:

Signed-off-by: Mohammed Gamalm.gamal...@gmail.com



Applied, thanks.  Note it is better to use emulator.flat for 
instructions that access memory, since it is a lot easier to use that 
framework.


Also, you don't IDIV #DE exceptions (and we currently don't emulate that 
condition correctly).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm PATCH 0/3] small qemu-kvm cleanups

2010-08-17 Thread Avi Kivity

 On 08/12/2010 06:29 PM, Paolo Bonzini wrote:

Nothing earth shattering. :)

Paolo Bonzini (3):
   move kvm_set_irqfd to kvm-stub.c


This touches kvm-all.c, so should be against uq/master.


   remove unused function
   make kvm_mutex_*lock static


Those two applied.  Thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for August 17

2010-08-17 Thread Anthony Liguori

On 08/16/2010 05:08 PM, Chris Wright wrote:

Please send in any agenda items you are interested in covering.
   


I would be able to attend this week.  I imagine most people are sick of 
hearing from me anyway post KVM Forum ;-)


Regards,

Anthony Liguori


thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori

On 08/17/2010 04:07 AM, Christoph Hellwig wrote:

On Mon, Aug 16, 2010 at 03:34:12PM -0500, Anthony Liguori wrote:
   

On 08/16/2010 01:42 PM, Christoph Hellwig wrote:
 

On Mon, Aug 16, 2010 at 09:43:09AM -0500, Anthony Liguori wrote:
   

Also, ext4 is _very_ slow on O_SYNC writes (which is
used in kvm with default cache).
   

Yeah, we probably need to switch to sync_file_range() to avoid the
journal commit on every write.

 

No, we don't.  sync_file_range does not actually provide any data
integrity.
   

What do you mean by data integrity?
 

sync_file_range only does pagecache-level writeout of the file data.
It nevers calls into the actual filesystem, that means any block
allocations (for filling holes / converting preallocated space in normal
filesystems, or every write in COW-based filesstems like qcow2) never
get flushes to disk,


But assuming that you had a preallocated disk image, it would 
effectively flush the page cache so it sounds like the only real issue 
is sparse and growable files.



  and even more importantly the disk write cache is
never flushed.
   


The point is that we don't want to flush the disk write cache.  The 
intention of writethrough is not to make the disk cache writethrough but 
to treat the host's cache as writethrough.


Regards,

Anthony Liguori


In short it's completely worthless for any real filesystem.

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 07:56:04AM -0500, Anthony Liguori wrote:
 But assuming that you had a preallocated disk image, it would
 effectively flush the page cache so it sounds like the only real
 issue is sparse and growable files.

For preallocated as in using fallocate() we still converting unwritten
to regular extents and do have metadata updates.  For preallocated as
in writining zeroes into the whole image earlier we do indeed only
care about the data, and will not have metadata for most filesystems.
That still leaves COW based filesystems that need to allocate new blocks
on every write, and from my reading NFS also needs the -fsync callout
to actually commit unstable data to disk.

   and even more importantly the disk write cache is
 never flushed.
 
 The point is that we don't want to flush the disk write cache.  The
 intention of writethrough is not to make the disk cache writethrough
 but to treat the host's cache as writethrough.

We need to make sure data is not in the disk write cache if want to
provide data integrity.  It has nothing to do with the qemu caching
mode - for data=writeback or none it's commited as part of the fdatasync
call, and for data=writethrough it's commited as part of the O_SYNC
write.  Note that both these path end up calling the filesystems -fsync
method which is what's require to make writes stable.  That's exactly
what is missing out in sync_file_range, and that's why that API is not
useful at all for data integrity operations.  It's also what makes
fsync slow on extN - but the fix to that is not to not provide data
integrity but rather to make fsync fast.  There's various other
filesystems that can already do it, and if you insist on using those
that are slow for this operation you'll have to suffer until that
issue is fixed for them.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm or bios bug?

2010-08-17 Thread Zhou Peng
Hi Avi,

Power cycle really resolves the problem.
Amazing!
Thanks,

2010/8/17 Avi Kivity a...@redhat.com:
  On 08/17/2010 01:58 PM, Zhou Peng wrote:

 Hi all,

 I have enabled Virtualization in BIOS. However, when I  modprobe
 kvm-intel, I get $ sudo modprobe kvm-intel
 FATAL: Error inserting kvm_intel
 (/lib/modules/2.6.35+/kernel/arch/x86/kvm/kvm-intel.ko): Operation not
 supported

 And $ dmesg | grep kvm
 [  220.670287] kvm: disabled by bios

 Try to power cycle your machine.

 --
 error compiling committee.c: too many arguments to function





-- 
Zhou Peng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call cancelled [was: KVM call agenda for August 17]

2010-08-17 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

Today's call is cancelled.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/26] KVM: PPC: Remove unused define

2010-08-17 Thread Alexander Graf
The define VSID_ALL is unused. Let's remove it.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index e7c4d00..4040c8d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -30,7 +30,6 @@
 #include trace.h
 
 #define PTE_SIZE 12
-#define VSID_ALL 0
 
 void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/26] KVM: PPC: Preload magic page when in kernel mode

2010-08-17 Thread Alexander Graf
When the guest jumps into kernel mode and has the magic page mapped, theres a
very high chance that it will also use it. So let's detect that scenario and
map the segment accordingly.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index f8b9aab..b3c1dde 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -145,6 +145,16 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
   (old_msr  (MSR_PR|MSR_IR|MSR_DR))) {
kvmppc_mmu_flush_segments(vcpu);
kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
+
+   /* Preload magic page segment when in kernel mode */
+   if (!(msr  MSR_PR)  vcpu-arch.magic_page_pa) {
+   struct kvm_vcpu_arch *a = vcpu-arch;
+
+   if (msr  MSR_DR)
+   kvmppc_mmu_map_segment(vcpu, a-magic_page_ea);
+   else
+   kvmppc_mmu_map_segment(vcpu, a-magic_page_pa);
+   }
}
 
/* Preload FPU if it's enabled */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/26] KVM: PPC: Mid-August patch queue

2010-08-17 Thread Alexander Graf
Howdy,

This is my local patch queue with stuff that has accumulated over the last
weeks on KVM for PPC with some last minute fixes, speedups and debugging help
that I needed for the KVM Forum ;-).

The highlights of this set are:

  - Converted most important debug points to tracepoints
  - Flush less PTEs (speedup)
  - Go back to our own hash (less duplicates)
  - Make SRs guest settable (speedup for 32 bit guests)
  - Remove r30/r31 restrictions from PV hooks (speedup!)
  - Fix random breakages
  - Fix random guest stalls
  - 440GP host support (Thanks Hollis!)

Keep in mind that this is the first version that is stable on PPC32 hosts.
All versions prior to this could occupy otherwise used segment entries and
thus crash your machine :-).

After finally meeting Avi again, we also agreed to give pulls a try. So
here we go - this is my tree online:

git://github.com/agraf/linux-2.6.git kvm-ppc-next


Have fun with more accurate, faster and less buggy KVM on PowerPC!


Alexander Graf (23):
  KVM: PPC: Move EXIT_DEBUG partially to tracepoints
  KVM: PPC: Move book3s_64 mmu map debug print to trace point
  KVM: PPC: Add tracepoint for generic mmu map
  KVM: PPC: Move pte invalidate debug code to tracepoint
  KVM: PPC: Fix sid map search after flush
  KVM: PPC: Add tracepoints for generic spte flushes
  KVM: PPC: Preload magic page when in kernel mode
  KVM: PPC: Don't flush PTEs on NX/RO hit
  KVM: PPC: Make invalidation code more reliable
  KVM: PPC: Move slb debugging to tracepoints
  KVM: PPC: Revert KVM: PPC: Use kernel hash function
  KVM: PPC: Remove unused define
  KVM: PPC: Add feature bitmap for magic page
  KVM: PPC: Move BAT handling code into spr handler
  KVM: PPC: Interpret SR registers on demand
  KVM: PPC: Put segment registers in shared page
  KVM: PPC: Add mtsrin PV code
  KVM: PPC: Make PV mtmsr work with r30 and r31
  KVM: PPC: Update int_pending also on dequeue
  KVM: PPC: Make PV mtmsrd L=1 work with r30 and r31
  KVM: PPC: Force enable nap on KVM
  KVM: PPC: Implement correct SID mapping on Book3s_32
  KVM: PPC: Don't put MSR_POW in MSR

Hollis Blanchard (3):
  KVM: PPC: initialize IVORs in addition to IVPR
  KVM: PPC: fix compilation of dump tlbs debug function
  KVM: PPC: allow ppc440gp to pass the compatibility check

 arch/powerpc/include/asm/kvm_book3s.h |   25 ++--
 arch/powerpc/include/asm/kvm_para.h   |3 +
 arch/powerpc/kernel/asm-offsets.c |1 +
 arch/powerpc/kernel/kvm.c |  144 ++---
 arch/powerpc/kernel/kvm_emul.S|   75 +--
 arch/powerpc/kvm/44x.c|3 +-
 arch/powerpc/kvm/44x_tlb.c|1 +
 arch/powerpc/kvm/book3s.c |   54 
 arch/powerpc/kvm/book3s_32_mmu.c  |   83 +++--
 arch/powerpc/kvm/book3s_32_mmu_host.c |   67 ++
 arch/powerpc/kvm/book3s_64_mmu_host.c |   59 +++--
 arch/powerpc/kvm/book3s_emulate.c |   48 +++-
 arch/powerpc/kvm/book3s_mmu_hpte.c|   38 ++
 arch/powerpc/kvm/booke.c  |8 +-
 arch/powerpc/kvm/powerpc.c|5 +-
 arch/powerpc/kvm/trace.h  |  230 +
 16 files changed, 614 insertions(+), 230 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/26] KVM: PPC: Move BAT handling code into spr handler

2010-08-17 Thread Alexander Graf
The current approach duplicates the spr-bat finding logic and makes it harder
to reuse the actually used variables. So let's move everything down to the spr
handler.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_emulate.c |   48 
 1 files changed, 16 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index f333cb4..4668465 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -264,7 +264,7 @@ void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat, bool upper,
}
 }
 
-static u32 kvmppc_read_bat(struct kvm_vcpu *vcpu, int sprn)
+static struct kvmppc_bat *kvmppc_find_bat(struct kvm_vcpu *vcpu, int sprn)
 {
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
struct kvmppc_bat *bat;
@@ -286,35 +286,7 @@ static u32 kvmppc_read_bat(struct kvm_vcpu *vcpu, int sprn)
BUG();
}
 
-   if (sprn % 2)
-   return bat-raw  32;
-   else
-   return bat-raw;
-}
-
-static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u32 val)
-{
-   struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
-   struct kvmppc_bat *bat;
-
-   switch (sprn) {
-   case SPRN_IBAT0U ... SPRN_IBAT3L:
-   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2];
-   break;
-   case SPRN_IBAT4U ... SPRN_IBAT7L:
-   bat = vcpu_book3s-ibat[4 + ((sprn - SPRN_IBAT4U) / 2)];
-   break;
-   case SPRN_DBAT0U ... SPRN_DBAT3L:
-   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2];
-   break;
-   case SPRN_DBAT4U ... SPRN_DBAT7L:
-   bat = vcpu_book3s-dbat[4 + ((sprn - SPRN_DBAT4U) / 2)];
-   break;
-   default:
-   BUG();
-   }
-
-   kvmppc_set_bat(vcpu, bat, !(sprn % 2), val);
+   return bat;
 }
 
 int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
@@ -339,12 +311,16 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
case SPRN_IBAT4U ... SPRN_IBAT7L:
case SPRN_DBAT0U ... SPRN_DBAT3L:
case SPRN_DBAT4U ... SPRN_DBAT7L:
-   kvmppc_write_bat(vcpu, sprn, (u32)spr_val);
+   {
+   struct kvmppc_bat *bat = kvmppc_find_bat(vcpu, sprn);
+
+   kvmppc_set_bat(vcpu, bat, !(sprn % 2), (u32)spr_val);
/* BAT writes happen so rarely that we're ok to flush
 * everything here */
kvmppc_mmu_pte_flush(vcpu, 0, 0);
kvmppc_mmu_flush_segments(vcpu);
break;
+   }
case SPRN_HID0:
to_book3s(vcpu)-hid[0] = spr_val;
break;
@@ -434,8 +410,16 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
case SPRN_IBAT4U ... SPRN_IBAT7L:
case SPRN_DBAT0U ... SPRN_DBAT3L:
case SPRN_DBAT4U ... SPRN_DBAT7L:
-   kvmppc_set_gpr(vcpu, rt, kvmppc_read_bat(vcpu, sprn));
+   {
+   struct kvmppc_bat *bat = kvmppc_find_bat(vcpu, sprn);
+
+   if (sprn % 2)
+   kvmppc_set_gpr(vcpu, rt, bat-raw  32);
+   else
+   kvmppc_set_gpr(vcpu, rt, bat-raw);
+
break;
+   }
case SPRN_SDR1:
kvmppc_set_gpr(vcpu, rt, to_book3s(vcpu)-sdr1);
break;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/26] KVM: PPC: Make PV mtmsrd L=1 work with r30 and r31

2010-08-17 Thread Alexander Graf
We had an arbitrary limitation in mtmsrd L=1 that kept us from using r30 and
r31 as input registers. Let's get rid of that and get more potential speedups!

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c  |   21 +
 arch/powerpc/kernel/kvm_emul.S |8 +++-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 517967d..517da39 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -159,6 +159,7 @@ static u32 *kvm_alloc(int len)
 
 extern u32 kvm_emulate_mtmsrd_branch_offs;
 extern u32 kvm_emulate_mtmsrd_reg_offs;
+extern u32 kvm_emulate_mtmsrd_orig_ins_offs;
 extern u32 kvm_emulate_mtmsrd_len;
 extern u32 kvm_emulate_mtmsrd[];
 
@@ -187,7 +188,21 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
/* Modify the chunk to fit the invocation */
memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4);
p[kvm_emulate_mtmsrd_branch_offs] |= distance_end  KVM_INST_B_MASK;
-   p[kvm_emulate_mtmsrd_reg_offs] |= rt;
+   switch (get_rt(rt)) {
+   case 30:
+   kvm_patch_ins_ll(p[kvm_emulate_mtmsrd_reg_offs],
+magic_var(scratch2), KVM_RT_30);
+   break;
+   case 31:
+   kvm_patch_ins_ll(p[kvm_emulate_mtmsrd_reg_offs],
+magic_var(scratch1), KVM_RT_30);
+   break;
+   default:
+   p[kvm_emulate_mtmsrd_reg_offs] |= rt;
+   break;
+   }
+
+   p[kvm_emulate_mtmsrd_orig_ins_offs] = *inst;
flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4);
 
/* Patch the invocation */
@@ -424,9 +439,7 @@ static void kvm_check_ins(u32 *inst, u32 features)
 
/* Rewrites */
case KVM_INST_MTMSRD_L1:
-   /* We use r30 and r31 during the hook */
-   if (get_rt(inst_rt)  30)
-   kvm_patch_ins_mtmsrd(inst, inst_rt);
+   kvm_patch_ins_mtmsrd(inst, inst_rt);
break;
case KVM_INST_MTMSR:
case KVM_INST_MTMSRD_L0:
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 6530532..f2b1b25 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -78,7 +78,8 @@ kvm_emulate_mtmsrd:
 
/* OR the register's (MSR_EE|MSR_RI) on MSR */
 kvm_emulate_mtmsrd_reg:
-   andi.   r30, r0, (MSR_EE|MSR_RI)
+   ori r30, r0, 0
+   andi.   r30, r30, (MSR_EE|MSR_RI)
or  r31, r31, r30
 
/* Put MSR back into magic page */
@@ -96,6 +97,7 @@ kvm_emulate_mtmsrd_reg:
SCRATCH_RESTORE
 
/* Nag hypervisor */
+kvm_emulate_mtmsrd_orig_ins:
tlbsync
 
b   kvm_emulate_mtmsrd_branch
@@ -117,6 +119,10 @@ kvm_emulate_mtmsrd_branch_offs:
 kvm_emulate_mtmsrd_reg_offs:
.long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4
 
+.global kvm_emulate_mtmsrd_orig_ins_offs
+kvm_emulate_mtmsrd_orig_ins_offs:
+   .long (kvm_emulate_mtmsrd_orig_ins - kvm_emulate_mtmsrd) / 4
+
 .global kvm_emulate_mtmsrd_len
 kvm_emulate_mtmsrd_len:
.long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 22/26] KVM: PPC: Implement correct SID mapping on Book3s_32

2010-08-17 Thread Alexander Graf
Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64
we were using a trick where we know that a single mmu_context gives us 16 bits
of context ids.

The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs
across the available range, so a context id really only gives us 16 available
VSIDs.

To keep at least a few guest processes in the SID shadow, let's map a number of
contexts that we can use as VSID pool. This makes the code be actually correct
and shouldn't hurt performance too much.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |   15 +++-
 arch/powerpc/kvm/book3s_32_mmu_host.c |   57 ++---
 arch/powerpc/kvm/book3s_64_mmu_host.c |8 ++--
 3 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index be8aac2..d62e703 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -60,6 +60,13 @@ struct kvmppc_sid_map {
 #define SID_MAP_NUM (1  SID_MAP_BITS)
 #define SID_MAP_MASK(SID_MAP_NUM - 1)
 
+#ifdef CONFIG_PPC_BOOK3S_64
+#define SID_CONTEXTS   1
+#else
+#define SID_CONTEXTS   128
+#define VSID_POOL_SIZE (SID_CONTEXTS * 16)
+#endif
+
 struct kvmppc_vcpu_book3s {
struct kvm_vcpu vcpu;
struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
@@ -78,10 +85,14 @@ struct kvmppc_vcpu_book3s {
u64 sdr1;
u64 hior;
u64 msr_mask;
-   u64 vsid_first;
u64 vsid_next;
+#ifdef CONFIG_PPC_BOOK3S_32
+   u32 vsid_pool[VSID_POOL_SIZE];
+#else
+   u64 vsid_first;
u64 vsid_max;
-   int context_id;
+#endif
+   int context_id[SID_CONTEXTS];
ulong prog_flags; /* flags to inject when giving a 700 trap */
 };
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 57dddeb..9fecbfb 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -275,18 +275,15 @@ static struct kvmppc_sid_map *create_sid_map(struct 
kvm_vcpu *vcpu, u64 gvsid)
backwards_map = !backwards_map;
 
/* Uh-oh ... out of mappings. Let's flush! */
-   if (vcpu_book3s-vsid_next = vcpu_book3s-vsid_max) {
-   vcpu_book3s-vsid_next = vcpu_book3s-vsid_first;
+   if (vcpu_book3s-vsid_next = VSID_POOL_SIZE) {
+   vcpu_book3s-vsid_next = 0;
memset(vcpu_book3s-sid_map, 0,
   sizeof(struct kvmppc_sid_map) * SID_MAP_NUM);
kvmppc_mmu_pte_flush(vcpu, 0, 0);
kvmppc_mmu_flush_segments(vcpu);
}
-   map-host_vsid = vcpu_book3s-vsid_next;
-
-   /* Would have to be 111 to be completely aligned with the rest of
-  Linux, but that is just way too little space! */
-   vcpu_book3s-vsid_next+=1;
+   map-host_vsid = vcpu_book3s-vsid_pool[vcpu_book3s-vsid_next];
+   vcpu_book3s-vsid_next++;
 
map-guest_vsid = gvsid;
map-valid = true;
@@ -333,40 +330,38 @@ void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
 
 void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
 {
+   int i;
+
kvmppc_mmu_hpte_destroy(vcpu);
preempt_disable();
-   __destroy_context(to_book3s(vcpu)-context_id);
+   for (i = 0; i  SID_CONTEXTS; i++)
+   __destroy_context(to_book3s(vcpu)-context_id[i]);
preempt_enable();
 }
 
 /* From mm/mmu_context_hash32.c */
-#define CTX_TO_VSID(ctx) (((ctx) * (897 * 16))  0xff)
+#define CTX_TO_VSID(c, id) c) * (897 * 16)) + (id * 0x111))  0xff)
 
 int kvmppc_mmu_init(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
int err;
ulong sdr1;
+   int i;
+   int j;
 
-   err = __init_new_context();
-   if (err  0)
-   return -1;
-   vcpu3s-context_id = err;
-
-   vcpu3s-vsid_max = CTX_TO_VSID(vcpu3s-context_id + 1) - 1;
-   vcpu3s-vsid_first = CTX_TO_VSID(vcpu3s-context_id);
-
-#if 0 /* XXX still doesn't guarantee uniqueness */
-   /* We could collide with the Linux vsid space because the vsid
-* wraps around at 24 bits. We're safe if we do our own space
-* though, so let's always set the highest bit. */
+   for (i = 0; i  SID_CONTEXTS; i++) {
+   err = __init_new_context();
+   if (err  0)
+   goto init_fail;
+   vcpu3s-context_id[i] = err;
 
-   vcpu3s-vsid_max |= 0x0080;
-   vcpu3s-vsid_first |= 0x0080;
-#endif
-   BUG_ON(vcpu3s-vsid_max  vcpu3s-vsid_first);
+   /* Remember context id for this combination */
+   for (j = 0; j  16; j++)
+   vcpu3s-vsid_pool[(i * 16) + j] = CTX_TO_VSID(err, j);
+   }
 
-   vcpu3s-vsid_next = vcpu3s-vsid_first;
+   vcpu3s-vsid_next = 0;
 
/* Remember where the HTAB is */
  

[PATCH 25/26] KVM: PPC: fix compilation of dump tlbs debug function

2010-08-17 Thread Alexander Graf
From: Hollis Blanchard hollis_blanch...@mentor.com

Missing local variable.

Signed-off-by: Hollis Blanchard hollis_blanch...@mentor.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/44x_tlb.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index 9f71b8d..5f3cff8 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -47,6 +47,7 @@
 #ifdef DEBUG
 void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
 {
+   struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
struct kvmppc_44x_tlbe *tlbe;
int i;
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/26] KVM: PPC: initialize IVORs in addition to IVPR

2010-08-17 Thread Alexander Graf
From: Hollis Blanchard hollis_blanch...@mentor.com

Developers can now tell at a glace the exact type of the premature interrupt,
instead of just knowing that there was some premature interrupt.

Signed-off-by: Hollis Blanchard hollis_blanch...@mentor.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c604277..835f6d0 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -497,15 +497,19 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 /* Initial guest state: 16MB mapping 0 - 0, PC = 0, MSR = 0, R1 = 16MB */
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
 {
+   int i;
+
vcpu-arch.pc = 0;
vcpu-arch.shared-msr = 0;
kvmppc_set_gpr(vcpu, 1, (1620) - 8); /* -8 for the callee-save LR 
slot */
 
vcpu-arch.shadow_pid = 1;
 
-   /* Eye-catching number so we know if the guest takes an interrupt
-* before it's programmed its own IVPR. */
+   /* Eye-catching numbers so we know if the guest takes an interrupt
+* before it's programmed its own IVPR/IVORs. */
vcpu-arch.ivpr = 0x;
+   for (i = 0; i  BOOKE_IRQPRIO_MAX; i++)
+   vcpu-arch.ivor[i] = 0x7700 | i * 4;
 
kvmppc_init_timing_stats(vcpu);
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/26] KVM: PPC: allow ppc440gp to pass the compatibility check

2010-08-17 Thread Alexander Graf
From: Hollis Blanchard hollis_blanch...@mentor.com

Match only the first part of cur_cpu_spec-platform.

440GP (the first 440 processor) is identified by the string ppc440gp, while
all later 440 processors use simply ppc440.

Signed-off-by: Hollis Blanchard hollis_blanch...@mentor.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/44x.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index e7b1f3f..74d0e74 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -43,7 +43,7 @@ int kvmppc_core_check_processor_compat(void)
 {
int r;
 
-   if (strcmp(cur_cpu_spec-platform, ppc440) == 0)
+   if (strncmp(cur_cpu_spec-platform, ppc440, 6) == 0)
r = 0;
else
r = -ENOTSUPP;
@@ -72,6 +72,7 @@ int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)
/* Since the guest can directly access the timebase, it must know the
 * real timebase frequency. Accordingly, it must see the state of
 * CCR1[TCS]. */
+   /* XXX CCR1 doesn't exist on all 440 SoCs. */
vcpu-arch.ccr1 = mfspr(SPRN_CCR1);
 
for (i = 0; i  ARRAY_SIZE(vcpu_44x-shadow_refs); i++)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/26] KVM: PPC: Force enable nap on KVM

2010-08-17 Thread Alexander Graf
There are some heuristics in the PPC power management code that try to find
out if the particular hardware we're running on supports proper power management
or just hangs the machine when going into nap mode.

Since we know that KVM is safe with nap, let's force enable it in the PV code
once we're certain that we are on a KVM VM.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 517da39..95aed6b 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -583,6 +583,9 @@ static int __init kvm_guest_init(void)
if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
kvm_use_magic_page();
 
+   /* Enable napping */
+   powersave_nap = 1;
+
 free_tmp:
kvm_free_tmp();
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/26] KVM: PPC: Fix sid map search after flush

2010-08-17 Thread Alexander Graf
After a flush the sid map contained lots of entries with 0 for their gvsid and
hvsid value. Unfortunately, 0 can be a real value the guest searches for when
looking up a vsid so it would incorrectly find the host's 0 hvsid mapping which
doesn't belong to our sid space.

So let's also check for the valid bit that indicated that the sid we're
looking at actually contains useful data.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index aa516ad..ebb1b5d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -65,14 +65,14 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu 
*vcpu, u64 gvsid)
 
sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
map = to_book3s(vcpu)-sid_map[sid_map_mask];
-   if (map-guest_vsid == gvsid) {
+   if (map-valid  (map-guest_vsid == gvsid)) {
dprintk_slb(SLB: Searching: 0x%llx - 0x%llx\n,
gvsid, map-host_vsid);
return map;
}
 
map = to_book3s(vcpu)-sid_map[SID_MAP_MASK - sid_map_mask];
-   if (map-guest_vsid == gvsid) {
+   if (map-valid  (map-guest_vsid == gvsid)) {
dprintk_slb(SLB: Searching 0x%llx - 0x%llx\n,
gvsid, map-host_vsid);
return map;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/26] KVM: PPC: Put segment registers in shared page

2010-08-17 Thread Alexander Graf
Now that the actual mtsr doesn't do anything anymore, we can move the sr
contents over to the shared page, so a guest can directly read and write
its sr contents from guest context.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |1 -
 arch/powerpc/include/asm/kvm_para.h   |1 +
 arch/powerpc/kvm/book3s.c |7 +++
 arch/powerpc/kvm/book3s_32_mmu.c  |   12 ++--
 arch/powerpc/kvm/powerpc.c|2 +-
 5 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 0884652..be8aac2 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -70,7 +70,6 @@ struct kvmppc_vcpu_book3s {
u64 vsid;
} slb_shadow[64];
u8 slb_shadow_max;
-   u32 sr[16];
struct kvmppc_bat ibat[8];
struct kvmppc_bat dbat[8];
u64 hid[6];
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 43c1b22..d79fd09 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -38,6 +38,7 @@ struct kvm_vcpu_arch_shared {
__u64 msr;
__u32 dsisr;
__u32 int_pending;  /* Tells the guest if we have an interrupt */
+   __u32 sr[16];
 };
 
 #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 082ec62..5fbe949 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1159,10 +1159,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
sregs-u.s.ppc64.slb[i].slbv = vcpu3s-slb[i].origv;
}
} else {
-   for (i = 0; i  16; i++) {
-   sregs-u.s.ppc32.sr[i] = vcpu3s-sr[i];
-   sregs-u.s.ppc32.sr[i] = vcpu3s-sr[i];
-   }
+   for (i = 0; i  16; i++)
+   sregs-u.s.ppc32.sr[i] = vcpu-arch.shared-sr[i];
+
for (i = 0; i  8; i++) {
sregs-u.s.ppc32.ibat[i] = vcpu3s-ibat[i].raw;
sregs-u.s.ppc32.dbat[i] = vcpu3s-dbat[i].raw;
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index d4ff76f..c8cefdd 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -88,9 +88,9 @@ static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu 
*vcpu, gva_t eaddr,
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 u64 *vsid);
 
-static u32 find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr)
+static u32 find_sr(struct kvm_vcpu *vcpu, gva_t eaddr)
 {
-   return vcpu_book3s-sr[(eaddr  28)  0xf];
+   return vcpu-arch.shared-sr[(eaddr  28)  0xf];
 }
 
 static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
@@ -211,7 +211,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
int i;
int found = 0;
 
-   sre = find_sr(vcpu_book3s, eaddr);
+   sre = find_sr(vcpu, eaddr);
 
dprintk_pte(SR 0x%lx: vsid=0x%x, raw=0x%x\n, eaddr  28,
sr_vsid(sre), sre);
@@ -335,13 +335,13 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
 
 static u32 kvmppc_mmu_book3s_32_mfsrin(struct kvm_vcpu *vcpu, u32 srnum)
 {
-   return to_book3s(vcpu)-sr[srnum];
+   return vcpu-arch.shared-sr[srnum];
 }
 
 static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
ulong value)
 {
-   to_book3s(vcpu)-sr[srnum] = value;
+   vcpu-arch.shared-sr[srnum] = value;
kvmppc_mmu_map_segment(vcpu, srnum  SID_SHIFT);
 }
 
@@ -358,7 +358,7 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct 
kvm_vcpu *vcpu, ulong esid,
u64 gvsid = esid;
 
if (vcpu-arch.shared-msr  (MSR_DR|MSR_IR)) {
-   sr = find_sr(to_book3s(vcpu), ea);
+   sr = find_sr(vcpu, ea);
if (sr_valid(sr))
gvsid = sr_vsid(sr);
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 496d7a5..028891c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -66,7 +66,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
vcpu-arch.magic_page_pa = param1;
vcpu-arch.magic_page_ea = param2;
 
-   r2 = 0;
+   r2 = KVM_MAGIC_FEAT_SR;
 
r = HC_EV_SUCCESS;
break;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/26] KVM: PPC: Interpret SR registers on demand

2010-08-17 Thread Alexander Graf
Right now we're examining the contents of Book3s_32's segment registers when
the register is written and put the interpreted contents into a struct.

There are two reasons this is bad. For starters, the struct has worse real-time
performance, as it occupies more ram. But the more important part is that with
segment registers being interpreted from their raw values, we can put them in
the shared page, allowing guests to mess with them directly.

This patch makes the internal representation of SRs be u32s.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |   11 +
 arch/powerpc/kvm/book3s.c |4 +-
 arch/powerpc/kvm/book3s_32_mmu.c  |   79 ++---
 3 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index f04f516..0884652 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -38,15 +38,6 @@ struct kvmppc_slb {
bool class  : 1;
 };
 
-struct kvmppc_sr {
-   u32 raw;
-   u32 vsid;
-   bool Ks : 1;
-   bool Kp : 1;
-   bool nx : 1;
-   bool valid  : 1;
-};
-
 struct kvmppc_bat {
u64 raw;
u32 bepi;
@@ -79,7 +70,7 @@ struct kvmppc_vcpu_book3s {
u64 vsid;
} slb_shadow[64];
u8 slb_shadow_max;
-   struct kvmppc_sr sr[16];
+   u32 sr[16];
struct kvmppc_bat ibat[8];
struct kvmppc_bat dbat[8];
u64 hid[6];
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 3e017da..082ec62 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1160,8 +1160,8 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
}
} else {
for (i = 0; i  16; i++) {
-   sregs-u.s.ppc32.sr[i] = vcpu3s-sr[i].raw;
-   sregs-u.s.ppc32.sr[i] = vcpu3s-sr[i].raw;
+   sregs-u.s.ppc32.sr[i] = vcpu3s-sr[i];
+   sregs-u.s.ppc32.sr[i] = vcpu3s-sr[i];
}
for (i = 0; i  8; i++) {
sregs-u.s.ppc32.ibat[i] = vcpu3s-ibat[i].raw;
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 5bf4bf8..d4ff76f 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -58,14 +58,39 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
 #endif
 }
 
+static inline u32 sr_vsid(u32 sr_raw)
+{
+   return sr_raw  0x0fff;
+}
+
+static inline bool sr_valid(u32 sr_raw)
+{
+   return (sr_raw  0x8000) ? false : true;
+}
+
+static inline bool sr_ks(u32 sr_raw)
+{
+   return (sr_raw  0x4000) ? true: false;
+}
+
+static inline bool sr_kp(u32 sr_raw)
+{
+   return (sr_raw  0x2000) ? true: false;
+}
+
+static inline bool sr_nx(u32 sr_raw)
+{
+   return (sr_raw  0x1000) ? true: false;
+}
+
 static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data);
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
 u64 *vsid);
 
-static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t 
eaddr)
+static u32 find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr)
 {
-   return vcpu_book3s-sr[(eaddr  28)  0xf];
+   return vcpu_book3s-sr[(eaddr  28)  0xf];
 }
 
 static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
@@ -87,7 +112,7 @@ static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu 
*vcpu)
 }
 
 static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s 
*vcpu_book3s,
- struct kvmppc_sr *sre, gva_t eaddr,
+ u32 sre, gva_t eaddr,
  bool primary)
 {
u32 page, hash, pteg, htabmask;
@@ -96,7 +121,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct 
kvmppc_vcpu_book3s *vcpu_book3
page = (eaddr  0x0FFF)  12;
htabmask = ((vcpu_book3s-sdr1  0x1FF)  16) | 0xFFC0;
 
-   hash = ((sre-vsid ^ page)  6);
+   hash = ((sr_vsid(sre) ^ page)  6);
if (!primary)
hash = ~hash;
hash = htabmask;
@@ -105,7 +130,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct 
kvmppc_vcpu_book3s *vcpu_book3
 
dprintk(MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n,
kvmppc_get_pc(vcpu_book3s-vcpu), eaddr, vcpu_book3s-sdr1, 
pteg,
-   sre-vsid);
+   sr_vsid(sre));
 
r = gfn_to_hva(vcpu_book3s-vcpu.kvm, pteg  PAGE_SHIFT);
if (kvm_is_error_hva(r))
@@ -113,10 +138,9 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct 
kvmppc_vcpu_book3s *vcpu_book3
return r | (pteg  

[PATCH 11/26] KVM: PPC: Revert KVM: PPC: Use kernel hash function

2010-08-17 Thread Alexander Graf
It turns out the in-kernel hash function is sub-optimal for our subtle
hash inputs where every bit is significant. So let's revert to the original
hash functions.

This reverts commit 05340ab4f9a6626f7a2e8f9fe5397c61d494f445.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_32_mmu_host.c |   10 --
 arch/powerpc/kvm/book3s_64_mmu_host.c |   11 +--
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 343452c..57dddeb 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -19,7 +19,6 @@
  */
 
 #include linux/kvm_host.h
-#include linux/hash.h
 
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
@@ -77,7 +76,14 @@ void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct 
hpte_cache *pte)
  * a hash, so we don't waste cycles on looping */
 static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
 {
-   return hash_64(gvsid, SID_MAP_BITS);
+   return (u16)(((gvsid  (SID_MAP_BITS * 7))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 6))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 5))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 4))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 3))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 2))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 1))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 0))  SID_MAP_MASK));
 }
 
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 321c931..e7c4d00 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -20,7 +20,6 @@
  */
 
 #include linux/kvm_host.h
-#include linux/hash.h
 
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
@@ -44,9 +43,17 @@ void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct 
hpte_cache *pte)
  * a hash, so we don't waste cycles on looping */
 static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
 {
-   return hash_64(gvsid, SID_MAP_BITS);
+   return (u16)(((gvsid  (SID_MAP_BITS * 7))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 6))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 5))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 4))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 3))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 2))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 1))  SID_MAP_MASK) ^
+((gvsid  (SID_MAP_BITS * 0))  SID_MAP_MASK));
 }
 
+
 static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
 {
struct kvmppc_sid_map *map;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/26] KVM: PPC: Don't put MSR_POW in MSR

2010-08-17 Thread Alexander Graf
On Book3S a mtmsr with the MSR_POW bit set indicates that the OS is in
idle and only needs to be waked up on the next interrupt.

Now, unfortunately we let that bit slip into the stored MSR value which
is not what the real CPU does, so that we ended up executing code like
this:

r = mfmsr();
/* r containts MSR_POW */
mtmsr(r | MSR_EE);

This obviously breaks, as we're going into idle mode in code sections that
don't expect to be idling.

This patch masks MSR_POW out of the stored MSR value on wakeup, making
guests happy again.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 8138d31..35f9199 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -134,10 +134,14 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
vcpu-arch.shared-msr = msr;
kvmppc_recalc_shadow_msr(vcpu);
 
-   if (msr  (MSR_WE|MSR_POW)) {
+   if (msr  MSR_POW) {
if (!vcpu-arch.pending_exceptions) {
kvm_vcpu_block(vcpu);
vcpu-stat.halt_wakeup++;
+
+   /* Unset POW bit after we woke up */
+   msr = ~MSR_POW;
+   vcpu-arch.shared-msr = msr;
}
}
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/26] KVM: PPC: Make invalidation code more reliable

2010-08-17 Thread Alexander Graf
There is a race condition in the pte invalidation code path where we can't
be sure if a pte was invalidated already. So let's move the spin lock around
to get rid of the race.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_mmu_hpte.c |   14 --
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c 
b/arch/powerpc/kvm/book3s_mmu_hpte.c
index bd6a767..79751d8 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -92,10 +92,6 @@ static void free_pte_rcu(struct rcu_head *head)
 
 static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
-   /* pte already invalidated? */
-   if (hlist_unhashed(pte-list_pte))
-   return;
-
trace_kvm_book3s_mmu_invalidate(pte);
 
/* Different for 32 and 64 bit */
@@ -103,18 +99,24 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct 
hpte_cache *pte)
 
spin_lock(vcpu-arch.mmu_lock);
 
+   /* pte already invalidated in between? */
+   if (hlist_unhashed(pte-list_pte)) {
+   spin_unlock(vcpu-arch.mmu_lock);
+   return;
+   }
+
hlist_del_init_rcu(pte-list_pte);
hlist_del_init_rcu(pte-list_pte_long);
hlist_del_init_rcu(pte-list_vpte);
hlist_del_init_rcu(pte-list_vpte_long);
 
-   spin_unlock(vcpu-arch.mmu_lock);
-
if (pte-pte.may_write)
kvm_release_pfn_dirty(pte-pfn);
else
kvm_release_pfn_clean(pte-pfn);
 
+   spin_unlock(vcpu-arch.mmu_lock);
+
vcpu-arch.hpte_cache_count--;
call_rcu(pte-rcu_head, free_pte_rcu);
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/26] KVM: PPC: Move slb debugging to tracepoints

2010-08-17 Thread Alexander Graf
This patch moves debugging printks for shadow SLB debugging over to tracepoints.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |   22 ++
 arch/powerpc/kvm/trace.h  |   73 +
 2 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index ebb1b5d..321c931 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -33,14 +33,6 @@
 #define PTE_SIZE 12
 #define VSID_ALL 0
 
-/* #define DEBUG_SLB */
-
-#ifdef DEBUG_SLB
-#define dprintk_slb(a, ...) printk(KERN_INFO a, __VA_ARGS__)
-#else
-#define dprintk_slb(a, ...) do { } while(0)
-#endif
-
 void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
ppc_md.hpte_invalidate(pte-slot, pte-host_va,
@@ -66,20 +58,17 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu 
*vcpu, u64 gvsid)
sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
map = to_book3s(vcpu)-sid_map[sid_map_mask];
if (map-valid  (map-guest_vsid == gvsid)) {
-   dprintk_slb(SLB: Searching: 0x%llx - 0x%llx\n,
-   gvsid, map-host_vsid);
+   trace_kvm_book3s_slb_found(gvsid, map-host_vsid);
return map;
}
 
map = to_book3s(vcpu)-sid_map[SID_MAP_MASK - sid_map_mask];
if (map-valid  (map-guest_vsid == gvsid)) {
-   dprintk_slb(SLB: Searching 0x%llx - 0x%llx\n,
-   gvsid, map-host_vsid);
+   trace_kvm_book3s_slb_found(gvsid, map-host_vsid);
return map;
}
 
-   dprintk_slb(SLB: Searching %d/%d: 0x%llx - not found\n,
-   sid_map_mask, SID_MAP_MASK - sid_map_mask, gvsid);
+   trace_kvm_book3s_slb_fail(sid_map_mask, gvsid);
return NULL;
 }
 
@@ -205,8 +194,7 @@ static struct kvmppc_sid_map *create_sid_map(struct 
kvm_vcpu *vcpu, u64 gvsid)
map-guest_vsid = gvsid;
map-valid = true;
 
-   dprintk_slb(SLB: New mapping at %d: 0x%llx - 0x%llx\n,
-   sid_map_mask, gvsid, map-host_vsid);
+   trace_kvm_book3s_slb_map(sid_map_mask, gvsid, map-host_vsid);
 
return map;
 }
@@ -278,7 +266,7 @@ int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong 
eaddr)
to_svcpu(vcpu)-slb[slb_index].esid = slb_esid;
to_svcpu(vcpu)-slb[slb_index].vsid = slb_vsid;
 
-   dprintk_slb(slbmte %#llx, %#llx\n, slb_vsid, slb_esid);
+   trace_kvm_book3s_slbmte(slb_vsid, slb_esid);
 
return 0;
 }
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index df15d02..705c63d 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -255,6 +255,79 @@ TRACE_EVENT(kvm_book3s_mmu_flush,
  __entry-count, __entry-type, __entry-p1, __entry-p2)
 );
 
+TRACE_EVENT(kvm_book3s_slb_found,
+   TP_PROTO(unsigned long long gvsid, unsigned long long hvsid),
+   TP_ARGS(gvsid, hvsid),
+
+   TP_STRUCT__entry(
+   __field(unsigned long long, gvsid   )
+   __field(unsigned long long, hvsid   )
+   ),
+
+   TP_fast_assign(
+   __entry-gvsid  = gvsid;
+   __entry-hvsid  = hvsid;
+   ),
+
+   TP_printk(%llx - %llx, __entry-gvsid, __entry-hvsid)
+);
+
+TRACE_EVENT(kvm_book3s_slb_fail,
+   TP_PROTO(u16 sid_map_mask, unsigned long long gvsid),
+   TP_ARGS(sid_map_mask, gvsid),
+
+   TP_STRUCT__entry(
+   __field(unsigned short, sid_map_mask)
+   __field(unsigned long long, gvsid   )
+   ),
+
+   TP_fast_assign(
+   __entry-sid_map_mask   = sid_map_mask;
+   __entry-gvsid  = gvsid;
+   ),
+
+   TP_printk(%x/%x: %llx, __entry-sid_map_mask,
+ SID_MAP_MASK - __entry-sid_map_mask, __entry-gvsid)
+);
+
+TRACE_EVENT(kvm_book3s_slb_map,
+   TP_PROTO(u16 sid_map_mask, unsigned long long gvsid,
+unsigned long long hvsid),
+   TP_ARGS(sid_map_mask, gvsid, hvsid),
+
+   TP_STRUCT__entry(
+   __field(unsigned short, sid_map_mask)
+   __field(unsigned long long, guest_vsid  )
+   __field(unsigned long long, host_vsid   )
+   ),
+
+   TP_fast_assign(
+   __entry-sid_map_mask   = sid_map_mask;
+   __entry-guest_vsid = gvsid;
+   __entry-host_vsid  = hvsid;
+   ),
+
+   TP_printk(%x: %llx - %llx, __entry-sid_map_mask,
+ __entry-guest_vsid, __entry-host_vsid)
+);
+
+TRACE_EVENT(kvm_book3s_slbmte,
+   TP_PROTO(u64 slb_vsid, u64 slb_esid),
+   TP_ARGS(slb_vsid, slb_esid),
+
+   TP_STRUCT__entry(
+   __field(u64,slb_vsid   

[PATCH 13/26] KVM: PPC: Add feature bitmap for magic page

2010-08-17 Thread Alexander Graf
We will soon add SR PV support to the shared page, so we need some
infrastructure that allows the guest to query for features KVM exports.

This patch adds a second return value to the magic mapping that
indicated to the guest which features are available.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |2 ++
 arch/powerpc/kernel/kvm.c   |   21 +++--
 arch/powerpc/kvm/powerpc.c  |5 -
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 7438ab3..43c1b22 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -47,6 +47,8 @@ struct kvm_vcpu_arch_shared {
 
 #define KVM_FEATURE_MAGIC_PAGE 1
 
+#define KVM_MAGIC_FEAT_SR  (1  0)
+
 #ifdef __KERNEL__
 
 #ifdef CONFIG_KVM_GUEST
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index e936817..f48144f 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -267,12 +267,20 @@ static void kvm_patch_ins_wrteei(u32 *inst)
 
 static void kvm_map_magic_page(void *data)
 {
-   kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
-  KVM_MAGIC_PAGE,  /* Physical Address */
-  KVM_MAGIC_PAGE); /* Effective Address */
+   u32 *features = data;
+
+   ulong in[8];
+   ulong out[8];
+
+   in[0] = KVM_MAGIC_PAGE;
+   in[1] = KVM_MAGIC_PAGE;
+
+   kvm_hypercall(in, out, HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE);
+
+   *features = out[0];
 }
 
-static void kvm_check_ins(u32 *inst)
+static void kvm_check_ins(u32 *inst, u32 features)
 {
u32 _inst = *inst;
u32 inst_no_rt = _inst  ~KVM_MASK_RT;
@@ -368,9 +376,10 @@ static void kvm_use_magic_page(void)
u32 *p;
u32 *start, *end;
u32 tmp;
+   u32 features;
 
/* Tell the host to map the magic page to -4096 on all CPUs */
-   on_each_cpu(kvm_map_magic_page, NULL, 1);
+   on_each_cpu(kvm_map_magic_page, features, 1);
 
/* Quick self-test to see if the mapping works */
if (__get_user(tmp, (u32*)KVM_MAGIC_PAGE)) {
@@ -383,7 +392,7 @@ static void kvm_use_magic_page(void)
end = (void*)_etext;
 
for (p = start; p  end; p++)
-   kvm_check_ins(p);
+   kvm_check_ins(p, features);
 
printk(KERN_INFO KVM: Live patching for a fast VM %s\n,
 kvm_patching_worked ? worked : failed);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6a53a3f..496d7a5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -66,6 +66,8 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
vcpu-arch.magic_page_pa = param1;
vcpu-arch.magic_page_ea = param2;
 
+   r2 = 0;
+
r = HC_EV_SUCCESS;
break;
}
@@ -76,13 +78,14 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 #endif
 
/* Second return value is in r4 */
-   kvmppc_set_gpr(vcpu, 4, r2);
break;
default:
r = HC_EV_UNIMPLEMENTED;
break;
}
 
+   kvmppc_set_gpr(vcpu, 4, r2);
+
return r;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/26] KVM: PPC: Update int_pending also on dequeue

2010-08-17 Thread Alexander Graf
When having a decrementor interrupt pending, the dequeuing happens manually
through an mtdec instruction. This instruction simply calls dequeue on that
interrupt, so the int_pending hint doesn't get updated.

This patch enables updating the int_pending hint also on dequeue, thus
correctly enabling guests to stay in guest contexts more often.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 5fbe949..8138d31 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -201,6 +201,9 @@ static void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu 
*vcpu,
 {
clear_bit(kvmppc_book3s_vec2irqprio(vec),
  vcpu-arch.pending_exceptions);
+
+   if (!vcpu-arch.pending_exceptions)
+   vcpu-arch.shared-int_pending = 0;
 }
 
 void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/26] KVM: PPC: Add mtsrin PV code

2010-08-17 Thread Alexander Graf
This is the guest side of the mtsr acceleration. Using this a guest can now
call mtsrin with almost no overhead as long as it ensures that it only uses
it with (MSR_IR|MSR_DR) == 0. Linux does that, so we're good.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/asm-offsets.c |1 +
 arch/powerpc/kernel/kvm.c |   60 +
 arch/powerpc/kernel/kvm_emul.S|   50 ++
 3 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index e3e740b..5e54d0f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -478,6 +478,7 @@ int main(void)
DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared,
critical));
+   DEFINE(KVM_MAGIC_SR, offsetof(struct kvm_vcpu_arch_shared, sr));
 #endif
 
 #ifdef CONFIG_44x
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index f48144f..43ec78a 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -43,6 +43,7 @@
 #define KVM_INST_B_MAX 0x01ff
 
 #define KVM_MASK_RT0x03e0
+#define KVM_MASK_RB0xf800
 #define KVM_INST_MFMSR 0x7ca6
 #define KVM_INST_MFSPR_SPRG0   0x7c1042a6
 #define KVM_INST_MFSPR_SPRG1   0x7c1142a6
@@ -70,6 +71,8 @@
 #define KVM_INST_WRTEEI_0  0x7c000146
 #define KVM_INST_WRTEEI_1  0x7c008146
 
+#define KVM_INST_MTSRIN0x7c0001e4
+
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
 static int kvm_tmp_index;
@@ -265,6 +268,51 @@ static void kvm_patch_ins_wrteei(u32 *inst)
 
 #endif
 
+#ifdef CONFIG_PPC_BOOK3S_32
+
+extern u32 kvm_emulate_mtsrin_branch_offs;
+extern u32 kvm_emulate_mtsrin_reg1_offs;
+extern u32 kvm_emulate_mtsrin_reg2_offs;
+extern u32 kvm_emulate_mtsrin_orig_ins_offs;
+extern u32 kvm_emulate_mtsrin_len;
+extern u32 kvm_emulate_mtsrin[];
+
+static void kvm_patch_ins_mtsrin(u32 *inst, u32 rt, u32 rb)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_mtsrin_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)p[kvm_emulate_mtsrin_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start  KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_mtsrin, kvm_emulate_mtsrin_len * 4);
+   p[kvm_emulate_mtsrin_branch_offs] |= distance_end  KVM_INST_B_MASK;
+   p[kvm_emulate_mtsrin_reg1_offs] |= (rb  10);
+   p[kvm_emulate_mtsrin_reg2_offs] |= rt;
+   p[kvm_emulate_mtsrin_orig_ins_offs] = *inst;
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtsrin_len * 4);
+
+   /* Patch the invocation */
+   kvm_patch_ins_b(inst, distance_start);
+}
+
+#endif
+
 static void kvm_map_magic_page(void *data)
 {
u32 *features = data;
@@ -361,6 +409,18 @@ static void kvm_check_ins(u32 *inst, u32 features)
break;
}
 
+   switch (inst_no_rt  ~KVM_MASK_RB) {
+#ifdef CONFIG_PPC_BOOK3S_32
+   case KVM_INST_MTSRIN:
+   if (features  KVM_MAGIC_FEAT_SR) {
+   u32 inst_rb = _inst  KVM_MASK_RB;
+   kvm_patch_ins_mtsrin(inst, inst_rt, inst_rb);
+   }
+   break;
+   break;
+#endif
+   }
+
switch (_inst) {
 #ifdef CONFIG_BOOKE
case KVM_INST_WRTEEI_0:
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 3199f65..a6e97e7 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -245,3 +245,53 @@ kvm_emulate_wrteei_ee_offs:
 .global kvm_emulate_wrteei_len
 kvm_emulate_wrteei_len:
.long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4
+
+
+.global kvm_emulate_mtsrin
+kvm_emulate_mtsrin:
+
+   SCRATCH_SAVE
+
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+   andi.   r31, r31, MSR_DR | MSR_IR
+   beq kvm_emulate_mtsrin_reg1
+
+   SCRATCH_RESTORE
+
+kvm_emulate_mtsrin_orig_ins:
+   nop
+   b   kvm_emulate_mtsrin_branch
+
+kvm_emulate_mtsrin_reg1:
+   /* rX  26 */
+   rlwinm  r30,r0,6,26,29
+
+kvm_emulate_mtsrin_reg2:
+   stw r0, (KVM_MAGIC_PAGE + KVM_MAGIC_SR)(r30)
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller */
+kvm_emulate_mtsrin_branch:
+   b   .
+kvm_emulate_mtsrin_end:
+
+.global kvm_emulate_mtsrin_branch_offs
+kvm_emulate_mtsrin_branch_offs:
+   .long 

[PATCH 18/26] KVM: PPC: Make PV mtmsr work with r30 and r31

2010-08-17 Thread Alexander Graf
So far we've been restricting ourselves to r0-r29 as registers an mtmsr
instruction could use. This was bad, as there are some code paths in
Linux actually using r30.

So let's instead handle all registers gracefully and get rid of that
stupid limitation

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c  |   39 ---
 arch/powerpc/kernel/kvm_emul.S |   17 -
 2 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 43ec78a..517967d 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -43,6 +43,7 @@
 #define KVM_INST_B_MAX 0x01ff
 
 #define KVM_MASK_RT0x03e0
+#define KVM_RT_30  0x03c0
 #define KVM_MASK_RB0xf800
 #define KVM_INST_MFMSR 0x7ca6
 #define KVM_INST_MFSPR_SPRG0   0x7c1042a6
@@ -83,6 +84,15 @@ static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
flush_icache_range((ulong)inst, (ulong)inst + 4);
 }
 
+static void kvm_patch_ins_ll(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+   kvm_patch_ins(inst, KVM_INST_LD | rt | (addr  0xfffc));
+#else
+   kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr  0xfffc));
+#endif
+}
+
 static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
 {
 #ifdef CONFIG_64BIT
@@ -187,7 +197,6 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
 extern u32 kvm_emulate_mtmsr_branch_offs;
 extern u32 kvm_emulate_mtmsr_reg1_offs;
 extern u32 kvm_emulate_mtmsr_reg2_offs;
-extern u32 kvm_emulate_mtmsr_reg3_offs;
 extern u32 kvm_emulate_mtmsr_orig_ins_offs;
 extern u32 kvm_emulate_mtmsr_len;
 extern u32 kvm_emulate_mtmsr[];
@@ -217,9 +226,27 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
/* Modify the chunk to fit the invocation */
memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4);
p[kvm_emulate_mtmsr_branch_offs] |= distance_end  KVM_INST_B_MASK;
-   p[kvm_emulate_mtmsr_reg1_offs] |= rt;
-   p[kvm_emulate_mtmsr_reg2_offs] |= rt;
-   p[kvm_emulate_mtmsr_reg3_offs] |= rt;
+
+   /* Make clobbered registers work too */
+   switch (get_rt(rt)) {
+   case 30:
+   kvm_patch_ins_ll(p[kvm_emulate_mtmsr_reg1_offs],
+magic_var(scratch2), KVM_RT_30);
+   kvm_patch_ins_ll(p[kvm_emulate_mtmsr_reg2_offs],
+magic_var(scratch2), KVM_RT_30);
+   break;
+   case 31:
+   kvm_patch_ins_ll(p[kvm_emulate_mtmsr_reg1_offs],
+magic_var(scratch1), KVM_RT_30);
+   kvm_patch_ins_ll(p[kvm_emulate_mtmsr_reg2_offs],
+magic_var(scratch1), KVM_RT_30);
+   break;
+   default:
+   p[kvm_emulate_mtmsr_reg1_offs] |= rt;
+   p[kvm_emulate_mtmsr_reg2_offs] |= rt;
+   break;
+   }
+
p[kvm_emulate_mtmsr_orig_ins_offs] = *inst;
flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4);
 
@@ -403,9 +430,7 @@ static void kvm_check_ins(u32 *inst, u32 features)
break;
case KVM_INST_MTMSR:
case KVM_INST_MTMSRD_L0:
-   /* We use r30 and r31 during the hook */
-   if (get_rt(inst_rt)  30)
-   kvm_patch_ins_mtmsr(inst, inst_rt);
+   kvm_patch_ins_mtmsr(inst, inst_rt);
break;
}
 
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index a6e97e7..6530532 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -135,7 +135,8 @@ kvm_emulate_mtmsr:
 
/* Find the changed bits between old and new MSR */
 kvm_emulate_mtmsr_reg1:
-   xor r31, r0, r31
+   ori r30, r0, 0
+   xor r31, r30, r31
 
/* Check if we need to really do mtmsr */
LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS)
@@ -156,14 +157,17 @@ kvm_emulate_mtmsr_orig_ins:
 
 maybe_stay_in_guest:
 
+   /* Get the target register in r30 */
+kvm_emulate_mtmsr_reg2:
+   ori r30, r0, 0
+
/* Check if we have to fetch an interrupt */
lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
cmpwi   r31, 0
beq+no_mtmsr
 
/* Check if we may trigger an interrupt */
-kvm_emulate_mtmsr_reg2:
-   andi.   r31, r0, MSR_EE
+   andi.   r31, r30, MSR_EE
beq no_mtmsr
 
b   do_mtmsr
@@ -171,8 +175,7 @@ kvm_emulate_mtmsr_reg2:
 no_mtmsr:
 
/* Put MSR into magic page because we don't call mtmsr */
-kvm_emulate_mtmsr_reg3:
-   STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+   STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
 
SCRATCH_RESTORE
 
@@ -193,10 +196,6 @@ kvm_emulate_mtmsr_reg1_offs:
 kvm_emulate_mtmsr_reg2_offs:
.long (kvm_emulate_mtmsr_reg2 - kvm_emulate_mtmsr) / 4
 
-.global 

[PATCH 02/26] KVM: PPC: Move book3s_64 mmu map debug print to trace point

2010-08-17 Thread Alexander Graf
This patch moves Book3s MMU debugging over to tracepoints.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |   13 +--
 arch/powerpc/kvm/trace.h  |   34 +
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 672b149..aa516ad 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -28,19 +28,13 @@
 #include asm/machdep.h
 #include asm/mmu_context.h
 #include asm/hw_irq.h
+#include trace.h
 
 #define PTE_SIZE 12
 #define VSID_ALL 0
 
-/* #define DEBUG_MMU */
 /* #define DEBUG_SLB */
 
-#ifdef DEBUG_MMU
-#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
-#else
-#define dprintk_mmu(a, ...) do { } while(0)
-#endif
-
 #ifdef DEBUG_SLB
 #define dprintk_slb(a, ...) printk(KERN_INFO a, __VA_ARGS__)
 #else
@@ -156,10 +150,7 @@ map_again:
} else {
struct hpte_cache *pte = kvmppc_mmu_hpte_cache_next(vcpu);
 
-   dprintk_mmu(KVM: %c%c Map 0x%lx: [%lx] 0x%lx (0x%llx) - 
%lx\n,
-   ((rflags  HPTE_R_PP) == 3) ? '-' : 'w',
-   (rflags  HPTE_R_N) ? '-' : 'x',
-   orig_pte-eaddr, hpteg, va, orig_pte-vpage, 
hpaddr);
+   trace_kvm_book3s_64_mmu_map(rflags, hpteg, va, hpaddr, 
orig_pte);
 
/* The ppc_md code may give us a secondary entry even though we
   asked for a primary. Fix up. */
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 56cd162..3b9169c 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -140,6 +140,40 @@ TRACE_EVENT(kvm_book3s_reenter,
TP_printk(reentry r=%d | pc=0x%lx, __entry-r, __entry-pc)
 );
 
+#ifdef CONFIG_PPC_BOOK3S_64
+
+TRACE_EVENT(kvm_book3s_64_mmu_map,
+   TP_PROTO(int rflags, ulong hpteg, ulong va, pfn_t hpaddr,
+struct kvmppc_pte *orig_pte),
+   TP_ARGS(rflags, hpteg, va, hpaddr, orig_pte),
+
+   TP_STRUCT__entry(
+   __field(unsigned char,  flag_w  )
+   __field(unsigned char,  flag_x  )
+   __field(unsigned long,  eaddr   )
+   __field(unsigned long,  hpteg   )
+   __field(unsigned long,  va  )
+   __field(unsigned long long, vpage   )
+   __field(unsigned long,  hpaddr  )
+   ),
+
+   TP_fast_assign(
+   __entry-flag_w = ((rflags  HPTE_R_PP) == 3) ? '-' : 'w';
+   __entry-flag_x = (rflags  HPTE_R_N) ? '-' : 'x';
+   __entry-eaddr  = orig_pte-eaddr;
+   __entry-hpteg  = hpteg;
+   __entry-va = va;
+   __entry-vpage  = orig_pte-vpage;
+   __entry-hpaddr = hpaddr;
+   ),
+
+   TP_printk(KVM: %c%c Map 0x%lx: [%lx] 0x%lx (0x%llx) - %lx,
+ __entry-flag_w, __entry-flag_x, __entry-eaddr,
+ __entry-hpteg, __entry-va, __entry-vpage, __entry-hpaddr)
+);
+
+#endif
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/26] KVM: PPC: Move pte invalidate debug code to tracepoint

2010-08-17 Thread Alexander Graf
This patch moves the SPTE flush debug printk over to tracepoints.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_mmu_hpte.c |3 +--
 arch/powerpc/kvm/trace.h   |   29 +
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c 
b/arch/powerpc/kvm/book3s_mmu_hpte.c
index ac94bd9..3397152 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -104,8 +104,7 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct 
hpte_cache *pte)
if (hlist_unhashed(pte-list_pte))
return;
 
-   dprintk_mmu(KVM: Flushing SPT: 0x%lx (0x%llx) - 0x%llx\n,
-   pte-pte.eaddr, pte-pte.vpage, pte-host_va);
+   trace_kvm_book3s_mmu_invalidate(pte);
 
/* Different for 32 and 64 bit */
kvmppc_mmu_invalidate_pte(vcpu, pte);
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index ee6ac88..4ab1c72 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -203,6 +203,35 @@ TRACE_EVENT(kvm_book3s_mmu_map,
  __entry-vpage, __entry-raddr, __entry-flags)
 );
 
+TRACE_EVENT(kvm_book3s_mmu_invalidate,
+   TP_PROTO(struct hpte_cache *pte),
+   TP_ARGS(pte),
+
+   TP_STRUCT__entry(
+   __field(u64,host_va )
+   __field(u64,pfn )
+   __field(ulong,  eaddr   )
+   __field(u64,vpage   )
+   __field(ulong,  raddr   )
+   __field(int,flags   )
+   ),
+
+   TP_fast_assign(
+   __entry-host_va= pte-host_va;
+   __entry-pfn= pte-pfn;
+   __entry-eaddr  = pte-pte.eaddr;
+   __entry-vpage  = pte-pte.vpage;
+   __entry-raddr  = pte-pte.raddr;
+   __entry-flags  = (pte-pte.may_read ? 0x4 : 0) |
+ (pte-pte.may_write ? 0x2 : 0) |
+ (pte-pte.may_execute ? 0x1 : 0);
+   ),
+
+   TP_printk(Flush: hva=%llx pfn=%llx ea=%lx vp=%llx ra=%lx [%x],
+ __entry-host_va, __entry-pfn, __entry-eaddr,
+ __entry-vpage, __entry-raddr, __entry-flags)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/26] KVM: PPC: Don't flush PTEs on NX/RO hit

2010-08-17 Thread Alexander Graf
When hitting a no-execute or read-only data/inst storage interrupt we were
flushing the respective PTE so we're sure it gets properly overwritten next.

According to the spec, this is unnecessary though. The guest issues a tlbie
anyways, so we're safe to just keep the PTE around and have it manually removed
from the guest, saving us a flush.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index b3c1dde..3e017da 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -885,7 +885,6 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
vcpu-arch.shared-msr |=
to_svcpu(vcpu)-shadow_srr1  0x5800;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFUL);
r = RESUME_GUEST;
}
break;
@@ -911,7 +910,6 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
vcpu-arch.shared-dar = dar;
vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, dar, ~0xFFFUL);
r = RESUME_GUEST;
}
break;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/26] KVM: PPC: Add tracepoint for generic mmu map

2010-08-17 Thread Alexander Graf
This patch moves the generic mmu map debugging over to tracepoints.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_mmu_hpte.c |3 +++
 arch/powerpc/kvm/trace.h   |   29 +
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c 
b/arch/powerpc/kvm/book3s_mmu_hpte.c
index 02c64ab..ac94bd9 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -21,6 +21,7 @@
 #include linux/kvm_host.h
 #include linux/hash.h
 #include linux/slab.h
+#include trace.h
 
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
@@ -66,6 +67,8 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct 
hpte_cache *pte)
 {
u64 index;
 
+   trace_kvm_book3s_mmu_map(pte);
+
spin_lock(vcpu-arch.mmu_lock);
 
/* Add to ePTE list */
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 3b9169c..ee6ac88 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -174,6 +174,35 @@ TRACE_EVENT(kvm_book3s_64_mmu_map,
 
 #endif
 
+TRACE_EVENT(kvm_book3s_mmu_map,
+   TP_PROTO(struct hpte_cache *pte),
+   TP_ARGS(pte),
+
+   TP_STRUCT__entry(
+   __field(u64,host_va )
+   __field(u64,pfn )
+   __field(ulong,  eaddr   )
+   __field(u64,vpage   )
+   __field(ulong,  raddr   )
+   __field(int,flags   )
+   ),
+
+   TP_fast_assign(
+   __entry-host_va= pte-host_va;
+   __entry-pfn= pte-pfn;
+   __entry-eaddr  = pte-pte.eaddr;
+   __entry-vpage  = pte-pte.vpage;
+   __entry-raddr  = pte-pte.raddr;
+   __entry-flags  = (pte-pte.may_read ? 0x4 : 0) |
+ (pte-pte.may_write ? 0x2 : 0) |
+ (pte-pte.may_execute ? 0x1 : 0);
+   ),
+
+   TP_printk(Map: hva=%llx pfn=%llx ea=%lx vp=%llx ra=%lx [%x],
+ __entry-host_va, __entry-pfn, __entry-eaddr,
+ __entry-vpage, __entry-raddr, __entry-flags)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/26] KVM: PPC: Add tracepoints for generic spte flushes

2010-08-17 Thread Alexander Graf
The different ways of flusing shadow ptes have their own debug prints which use
stupid old printk.

Let's move them to tracepoints, making them easier available, faster and
possible to activate on demand

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_mmu_hpte.c |   18 +++---
 arch/powerpc/kvm/trace.h   |   23 +++
 2 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c 
b/arch/powerpc/kvm/book3s_mmu_hpte.c
index 3397152..bd6a767 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -31,14 +31,6 @@
 
 #define PTE_SIZE   12
 
-/* #define DEBUG_MMU */
-
-#ifdef DEBUG_MMU
-#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
-#else
-#define dprintk_mmu(a, ...) do { } while(0)
-#endif
-
 static struct kmem_cache *hpte_cache;
 
 static inline u64 kvmppc_mmu_hash_pte(u64 eaddr)
@@ -186,9 +178,7 @@ static void kvmppc_mmu_pte_flush_long(struct kvm_vcpu 
*vcpu, ulong guest_ea)
 
 void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask)
 {
-   dprintk_mmu(KVM: Flushing %d Shadow PTEs: 0x%lx  0x%lx\n,
-   vcpu-arch.hpte_cache_count, guest_ea, ea_mask);
-
+   trace_kvm_book3s_mmu_flush(, vcpu, guest_ea, ea_mask);
guest_ea = ea_mask;
 
switch (ea_mask) {
@@ -251,8 +241,7 @@ static void kvmppc_mmu_pte_vflush_long(struct kvm_vcpu 
*vcpu, u64 guest_vp)
 
 void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
 {
-   dprintk_mmu(KVM: Flushing %d Shadow vPTEs: 0x%llx  0x%llx\n,
-   vcpu-arch.hpte_cache_count, guest_vp, vp_mask);
+   trace_kvm_book3s_mmu_flush(v, vcpu, guest_vp, vp_mask);
guest_vp = vp_mask;
 
switch(vp_mask) {
@@ -274,8 +263,7 @@ void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong 
pa_start, ulong pa_end)
struct hpte_cache *pte;
int i;
 
-   dprintk_mmu(KVM: Flushing %d Shadow pPTEs: 0x%lx - 0x%lx\n,
-   vcpu-arch.hpte_cache_count, pa_start, pa_end);
+   trace_kvm_book3s_mmu_flush(p, vcpu, pa_start, pa_end);
 
rcu_read_lock();
 
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 4ab1c72..df15d02 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -232,6 +232,29 @@ TRACE_EVENT(kvm_book3s_mmu_invalidate,
  __entry-vpage, __entry-raddr, __entry-flags)
 );
 
+TRACE_EVENT(kvm_book3s_mmu_flush,
+   TP_PROTO(const char *type, struct kvm_vcpu *vcpu, unsigned long long p1,
+unsigned long long p2),
+   TP_ARGS(type, vcpu, p1, p2),
+
+   TP_STRUCT__entry(
+   __field(int,count   )
+   __field(unsigned long long, p1  )
+   __field(unsigned long long, p2  )
+   __field(const char *,   type)
+   ),
+
+   TP_fast_assign(
+   __entry-count  = vcpu-arch.hpte_cache_count;
+   __entry-p1 = p1;
+   __entry-p2 = p2;
+   __entry-type   = type;
+   ),
+
+   TP_printk(Flush %d %sPTEs: %llx - %llx,
+ __entry-count, __entry-type, __entry-p1, __entry-p2)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/26] KVM: PPC: Move EXIT_DEBUG partially to tracepoints

2010-08-17 Thread Alexander Graf
We have a debug printk on every exit that is usually #ifdef'ed out. Using
tracepoints makes a lot more sense here though, as they can be dynamically
enabled.

This patch converts the most commonly used debug printks of EXIT_DEBUG to
tracepoints.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |   26 --
 arch/powerpc/kvm/trace.h  |   42 ++
 2 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index eee97b5..f8b9aab 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -17,6 +17,7 @@
 #include linux/kvm_host.h
 #include linux/err.h
 #include linux/slab.h
+#include trace.h
 
 #include asm/reg.h
 #include asm/cputable.h
@@ -35,7 +36,6 @@
 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
 
 /* #define EXIT_DEBUG */
-/* #define EXIT_DEBUG_SIMPLE */
 /* #define DEBUG_EXT */
 
 static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
@@ -105,14 +105,6 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
kvmppc_giveup_ext(vcpu, MSR_VSX);
 }
 
-#if defined(EXIT_DEBUG)
-static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
-{
-   u64 jd = mftb() - vcpu-arch.dec_jiffies;
-   return vcpu-arch.dec - jd;
-}
-#endif
-
 static void kvmppc_recalc_shadow_msr(struct kvm_vcpu *vcpu)
 {
ulong smsr = vcpu-arch.shared-msr;
@@ -848,16 +840,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
run-exit_reason = KVM_EXIT_UNKNOWN;
run-ready_for_interrupt_injection = 1;
-#ifdef EXIT_DEBUG
-   printk(KERN_EMERG exit_nr=0x%x | pc=0x%lx | dar=0x%lx | dec=0x%x | 
msr=0x%lx\n,
-   exit_nr, kvmppc_get_pc(vcpu), kvmppc_get_fault_dar(vcpu),
-   kvmppc_get_dec(vcpu), to_svcpu(vcpu)-shadow_srr1);
-#elif defined (EXIT_DEBUG_SIMPLE)
-   if ((exit_nr != 0x900)  (exit_nr != 0x500))
-   printk(KERN_EMERG exit_nr=0x%x | pc=0x%lx | dar=0x%lx | 
msr=0x%lx\n,
-   exit_nr, kvmppc_get_pc(vcpu), 
kvmppc_get_fault_dar(vcpu),
-   vcpu-arch.shared-msr);
-#endif
+
+   trace_kvm_book3s_exit(exit_nr, vcpu);
kvm_resched(vcpu);
switch (exit_nr) {
case BOOK3S_INTERRUPT_INST_STORAGE:
@@ -1089,9 +1073,7 @@ program_interrupt:
}
}
 
-#ifdef EXIT_DEBUG
-   printk(KERN_EMERG KVM exit: vcpu=0x%p pc=0x%lx r=0x%x\n, vcpu, 
kvmppc_get_pc(vcpu), r);
-#endif
+   trace_kvm_book3s_reenter(r, vcpu);
 
return r;
 }
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index a8e8400..56cd162 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -98,6 +98,48 @@ TRACE_EVENT(kvm_gtlb_write,
__entry-word1, __entry-word2)
 );
 
+TRACE_EVENT(kvm_book3s_exit,
+   TP_PROTO(unsigned int exit_nr, struct kvm_vcpu *vcpu),
+   TP_ARGS(exit_nr, vcpu),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   exit_nr )
+   __field(unsigned long,  pc  )
+   __field(unsigned long,  msr )
+   __field(unsigned long,  dar )
+   __field(unsigned long,  srr1)
+   ),
+
+   TP_fast_assign(
+   __entry-exit_nr= exit_nr;
+   __entry-pc = kvmppc_get_pc(vcpu);
+   __entry-dar= kvmppc_get_fault_dar(vcpu);
+   __entry-msr= vcpu-arch.shared-msr;
+   __entry-srr1   = to_svcpu(vcpu)-shadow_srr1;
+   ),
+
+   TP_printk(exit=0x%x | pc=0x%lx | msr=0x%lx | dar=0x%lx | srr1=0x%lx,
+ __entry-exit_nr, __entry-pc, __entry-msr, __entry-dar,
+ __entry-srr1)
+);
+
+TRACE_EVENT(kvm_book3s_reenter,
+   TP_PROTO(int r, struct kvm_vcpu *vcpu),
+   TP_ARGS(r, vcpu),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   r   )
+   __field(unsigned long,  pc  )
+   ),
+
+   TP_fast_assign(
+   __entry-r  = r;
+   __entry-pc = kvmppc_get_pc(vcpu);
+   ),
+
+   TP_printk(reentry r=%d | pc=0x%lx, __entry-r, __entry-pc)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori

On 08/17/2010 08:07 AM, Christoph Hellwig wrote:

The point is that we don't want to flush the disk write cache.  The
intention of writethrough is not to make the disk cache writethrough
but to treat the host's cache as writethrough.
 


We need to make sure data is not in the disk write cache if want to
provide data integrity.


When the guest explicitly flushes the emulated disk's write cache.  Not 
on every single write completion.



   It has nothing to do with the qemu caching
mode - for data=writeback or none it's commited as part of the fdatasync
call, and for data=writethrough it's commited as part of the O_SYNC
write.  Note that both these path end up calling the filesystems -fsync
method which is what's require to make writes stable.  That's exactly
what is missing out in sync_file_range, and that's why that API is not
useful at all for data integrity operations.


For normal writes from a guest, we don't need to follow the write with 
an fsync().  We should only need to issue an fsync() given an explicit 
flush from the guest.



   It's also what makes
fsync slow on extN - but the fix to that is not to not provide data
integrity but rather to make fsync fast.  There's various other
filesystems that can already do it, and if you insist on using those
that are slow for this operation you'll have to suffer until that
issue is fixed for them.
   


fsync() being slow is orthogonal to my point.  I don't see why we need 
to do an fsync() on *every* write.  It should only be necessary when a 
guest injects an actual barrier.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote:
 On 08/17/2010 08:07 AM, Christoph Hellwig wrote:
 The point is that we don't want to flush the disk write cache.  The
 intention of writethrough is not to make the disk cache writethrough
 but to treat the host's cache as writethrough.
 
 We need to make sure data is not in the disk write cache if want to
 provide data integrity.
 
 When the guest explicitly flushes the emulated disk's write cache.
 Not on every single write completion.

That depends on the cache= mode.  For cache=none and cache=writeback
we present a write-back cache to the guest, and the guest does explicit
cache flushes.  For cache=writethrough we present a writethrough cache
to the guest, and we need to make sure data actually has hit the disk
before returning I/O completion to the guest.

It has nothing to do with the qemu caching
 mode - for data=writeback or none it's commited as part of the fdatasync
 call, and for data=writethrough it's commited as part of the O_SYNC
 write.  Note that both these path end up calling the filesystems -fsync
 method which is what's require to make writes stable.  That's exactly
 what is missing out in sync_file_range, and that's why that API is not
 useful at all for data integrity operations.
 
 For normal writes from a guest, we don't need to follow the write
 with an fsync().  We should only need to issue an fsync() given an
 explicit flush from the guest.

Define normal writes.  For cache=none and cache=writeback we don't
have to, and instead do explicit calls to fsync()/fdatasync() calls
when a we a cache flush from the guest.  For data=writethrough we
guarantee data has made it to disk, and we implement this using
O_DSYNC/O_SYNC when opening the file.  That tells the operating system
to not return until data has hit the disk.   For Linux this is
internally implement using a range-fsync/fdatasync after the actual
write.

 fsync() being slow is orthogonal to my point.  I don't see why we
 need to do an fsync() on *every* write.  It should only be necessary
 when a guest injects an actual barrier.

See above.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori

On 08/17/2010 09:28 AM, Christoph Hellwig wrote:

On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote:
   

On 08/17/2010 08:07 AM, Christoph Hellwig wrote:
 

The point is that we don't want to flush the disk write cache.  The
intention of writethrough is not to make the disk cache writethrough
but to treat the host's cache as writethrough.
 

We need to make sure data is not in the disk write cache if want to
provide data integrity.
   

When the guest explicitly flushes the emulated disk's write cache.
Not on every single write completion.
 

That depends on the cache= mode.  For cache=none and cache=writeback
we present a write-back cache to the guest, and the guest does explicit
cache flushes.  For cache=writethrough we present a writethrough cache
to the guest, and we need to make sure data actually has hit the disk
before returning I/O completion to the guest.
   


Why?

The type of cache we present to the guest only should relate to how the 
hypervisor caches the storage.  It should be independent of how data is 
cached by the disk.


There can be many levels of caching in a storage hierarchy and each 
hierarchy cached independently of the next level.


If the user has a disk with a writeback cache, if we expose a 
writethrough cache to the guest, it's not our responsibility to make 
sure that we break through the writeback cache on the disk.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Michael Tokarev
17.08.2010 18:28, Christoph Hellwig wrote:
 On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote:
[]
 For normal writes from a guest, we don't need to follow the write
 with an fsync().  We should only need to issue an fsync() given an
 explicit flush from the guest.
 
 Define normal writes.  For cache=none and cache=writeback we don't
 have to, and instead do explicit calls to fsync()/fdatasync() calls
 when a we a cache flush from the guest.  For data=writethrough we
 guarantee data has made it to disk, and we implement this using
 O_DSYNC/O_SYNC when opening the file.  That tells the operating system
 to not return until data has hit the disk.   For Linux this is
 internally implement using a range-fsync/fdatasync after the actual
 write.

And this is actually what I mentioned in the very beginning,
in a hopefully-single-thread-email I've sent.  Mentioned
that ext4 is very slow when using with O_SYNC (without O_DIRECT).

I still had no opportunity to collect more info on this, and
yes, I've seen your (Christoph's) speed tests of a few FSes
in the famous BTRFS: Unbelievably slow with kvm/qemu thread.
A few users reported _insane_ write speeds of qcow2 files
with default cache mode on ext4.

And this is what prompted all this discussion (which actually
has nothing to do with the $subject line ;), -- an attempt
to think about replacing O_SYNC/fsync() with something
lighter...

 fsync() being slow is orthogonal to my point.  I don't see why we
 need to do an fsync() on *every* write.  It should only be necessary
 when a guest injects an actual barrier.

We don't do sync on every write, but O_SYNC implies that.
And apparently it is what happening behind the scenes in
ext4 O_SYNC case.

But ok

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori

On 08/17/2010 09:40 AM, Michael Tokarev wrote:



fsync() being slow is orthogonal to my point.  I don't see why we
need to do an fsync() on *every* write.  It should only be necessary
when a guest injects an actual barrier.
   

We don't do sync on every write, but O_SYNC implies that.
And apparently it is what happening behind the scenes in
ext4 O_SYNC case.
   


I think the real issue is we're mixing host configuration with guest 
visible state.


With O_SYNC, we're causing cache=writethrough to do writethrough through 
two layers of the storage heirarchy.  I don't think that's necessary or 
desirable though.


Regards,

Anthony Liguori


But ok

/mjt
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:39:15AM -0500, Anthony Liguori wrote:
 The type of cache we present to the guest only should relate to how
 the hypervisor caches the storage.  It should be independent of how
 data is cached by the disk.

It is.

 There can be many levels of caching in a storage hierarchy and each
 hierarchy cached independently of the next level.
 
 If the user has a disk with a writeback cache, if we expose a
 writethrough cache to the guest, it's not our responsibility to make
 sure that we break through the writeback cache on the disk.

The users doesn't know or have to care about the caching.  The
users uses O_SYNC/fsync to tell it wants data on disk, and it's the
operating systems job to make that happen.   The situation with qemu
is the same - if we tell the guest that we do not have a volatile write
cache that needs explicit management the guest can rely on the fact
that it does not have to do manual cache management.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote:
 I think the real issue is we're mixing host configuration with guest
 visible state.

The last time I proposed to decouple the two you and Avi were heavily
opposed to it..

 With O_SYNC, we're causing cache=writethrough to do writethrough
 through two layers of the storage heirarchy.  I don't think that's
 necessary or desirable though.

It's absolutely nessecary if we tell the guest that we do not have
a volatile write cache.  Which is the only good reason to use
data=writethrough anyway - except for dealing with old guests that
can't handle volatile writecache it's an absolutely stupid mode of
operation.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity

 On 08/17/2010 05:45 PM, Christoph Hellwig wrote:


The users doesn't know or have to care about the caching.  The
users uses O_SYNC/fsync to tell it wants data on disk, and it's the
operating systems job to make that happen.   The situation with qemu
is the same - if we tell the guest that we do not have a volatile write
cache that needs explicit management the guest can rely on the fact
that it does not have to do manual cache management.



In the general case this is correct, however sometimes we want to 
explicitly lie (cache=unsafe, or say that we have a write-back cache 
when we don't to preserve the guest's view of things after a migration).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori

On 08/17/2010 09:45 AM, Christoph Hellwig wrote:

On Tue, Aug 17, 2010 at 09:39:15AM -0500, Anthony Liguori wrote:
   

The type of cache we present to the guest only should relate to how
the hypervisor caches the storage.  It should be independent of how
data is cached by the disk.
 

It is.

   

There can be many levels of caching in a storage hierarchy and each
hierarchy cached independently of the next level.

If the user has a disk with a writeback cache, if we expose a
writethrough cache to the guest, it's not our responsibility to make
sure that we break through the writeback cache on the disk.
 

The users doesn't know or have to care about the caching.  The
users uses O_SYNC/fsync to tell it wants data on disk, and it's the
operating systems job to make that happen.   The situation with qemu
is the same - if we tell the guest that we do not have a volatile write
cache that needs explicit management the guest can rely on the fact
that it does not have to do manual cache management.
   


This is simply unrealistic.  O_SYNC might force data to be on a platter 
when using a directly attached disk but many NAS's actually do writeback 
caching and relying on having an UPS to preserve data integrity.  
There's really no way in the general case to ensure that data is 
actually on a platter once you've involved a complex storage setup or 
you assume FUA


Let me put it another way.  If an admin knows the disks on a machine 
have battery backed cache, he's likely to leave writeback caching enabled.


We are currently giving the admin two choices with QEMU, either ignore 
the fact that the disk is battery backed and do write through caching of 
the disk or do writeback caching in the host which expands the disk 
cache from something very small and non-volatile (the on-disk cache) to 
something very large and volatile (the page cache).  To make the page 
cache non-volatile, you would need to have an UPS for the hypervisor 
with enough power to flush the page cache.


So basically, we're not presenting a model that makes sensible use of 
reliable disks.


cache=none does the right thing here but doesn't benefit from the host's 
page cache for reads.  This is really the missing behavior.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Anthony Liguori

On 08/17/2010 09:46 AM, Christoph Hellwig wrote:

On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote:
   

I think the real issue is we're mixing host configuration with guest
visible state.
 

The last time I proposed to decouple the two you and Avi were heavily
opposed to it..

   

With O_SYNC, we're causing cache=writethrough to do writethrough
through two layers of the storage heirarchy.  I don't think that's
necessary or desirable though.
 

It's absolutely nessecary if we tell the guest that we do not have
a volatile write cache.  Which is the only good reason to use
data=writethrough anyway - except for dealing with old guests that
can't handle volatile writecache it's an absolutely stupid mode of
operation.
   


You can lose an awful lot of data with cache=writeback because the host 
page cache is volatile.  In a perfect world, this would only be 
non-critical data because everyone would be using fsync() properly but 
1) even non-critical data is important when there's a lot of it 2) we 
don't live in a perfect world.  The fact of the matter is, there is a 
huge amount of crappy filesystems and applications today that don't 
submit barriers appropriately.


We make the situation much worse with virtualization because of the 
shear size of the cache we introduce.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity

 On 08/17/2010 05:46 PM, Christoph Hellwig wrote:

On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote:

I think the real issue is we're mixing host configuration with guest
visible state.

The last time I proposed to decouple the two you and Avi were heavily
opposed to it..


I wasn't that I can recall.


With O_SYNC, we're causing cache=writethrough to do writethrough
through two layers of the storage heirarchy.  I don't think that's
necessary or desirable though.

It's absolutely nessecary if we tell the guest that we do not have
a volatile write cache.  Which is the only good reason to use
data=writethrough anyway - except for dealing with old guests that
can't handle volatile writecache it's an absolutely stupid mode of
operation.


I agree, but there's another case: tell the guest that we have a write 
cache, use O_DSYNC, but only flush the disk cache on guest flushes.


The reason for this is that if we don't use O_DSYNC the page cache can 
grow to huge proportions.  While this is allowed by the contract between 
virtual drive and guest, guest software and users won't expect a huge 
data loss on power fail, only a minor data loss from the last fraction 
of a second before the failure.


I believe this can be approximated by mounting the host filesystem with 
barrier=0?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Avi Kivity

 On 08/17/2010 05:54 PM, Anthony Liguori wrote:


This is simply unrealistic.  O_SYNC might force data to be on a 
platter when using a directly attached disk but many NAS's actually do 
writeback caching and relying on having an UPS to preserve data 
integrity.  There's really no way in the general case to ensure that 
data is actually on a platter once you've involved a complex storage 
setup or you assume FUA


That's fine.  Memory backed up by a UPS is a disk platter as far as the 
user is concerned, if the NAS is reliable.




Let me put it another way.  If an admin knows the disks on a machine 
have battery backed cache, he's likely to leave writeback caching 
enabled.


In this case, as far as the host is concerned, there is no cache.  Data 
written is guaranteed to reach the disk eventually even without a 
flush.  Hopefully the disk advertises itself as not having a volatile cache.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 09:54:07AM -0500, Anthony Liguori wrote:
 This is simply unrealistic.  O_SYNC might force data to be on a
 platter when using a directly attached disk but many NAS's actually
 do writeback caching and relying on having an UPS to preserve data
 integrity.  There's really no way in the general case to ensure that
 data is actually on a platter once you've involved a complex storage
 setup or you assume FUA

Yes, there is.  If you have an array that has batter backup it handles
this internally.  The normal case is to not set the WCE bit in the
mode page, which tells the operating system not ever send
SYNCHRONIZE_CACHE commands.  I have one array that sets a WCE bit
neveless, but it also doesn't flush it's non-volatile cache in
SYNCHRONIZE_CACHE, but rather implements it as an effective no-op.

 Let me put it another way.  If an admin knows the disks on a machine
 have battery backed cache, he's likely to leave writeback caching
 enabled.
 
 We are currently giving the admin two choices with QEMU, either
 ignore the fact that the disk is battery backed and do write through
 caching of the disk or do writeback caching in the host which

Again, this is not qemu's business at all.  Qemu is not different from
any other application requiring data integrity.  If that admin really
thinks he needs to overide the storage provided settings he can
mount the filesystem using -o nobarrier and we will not send cache
flushes.  I would in general recommend against this, as an external
UPS still has lots of failure modes that this doesn't account for.
Arrays with internal non-volatile memory already do the right thing
anyway.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFYI: ext4 bug triggerable by kvm

2010-08-17 Thread Christoph Hellwig
On Tue, Aug 17, 2010 at 05:59:07PM +0300, Avi Kivity wrote:
 I agree, but there's another case: tell the guest that we have a
 write cache, use O_DSYNC, but only flush the disk cache on guest
 flushes.

O_DSYNC flushes the disk write cache and any filesystem that supports
non-volatile cache.   The disk cache is not an abstraction
exposed to applications.

 I believe this can be approximated by mounting the host filesystem
 with barrier=0?

Mounting the host filesystem with nobarrier means we will never explicit
flush the volatile write cache on the disk.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >