Re: cache write back barriers

2013-06-13 Thread Stefan Hajnoczi
On Wed, Jun 12, 2013 at 10:03:10AM +0200, folkert wrote:
 In virt-manager I saw that there's the option for cache writeback for
 storage devices.
 I'm wondering: does this also make kvm to ignore write barriers invoked
 by the virtual machine?

No, that would be unsafe.  When the guest issues a flush then QEMU will
ensure that data reaches the disk with -drive cache=writeback.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cache write back barriers

2013-06-13 Thread folkert
Hi,

  In virt-manager I saw that there's the option for cache writeback for
  storage devices.
  I'm wondering: does this also make kvm to ignore write barriers invoked
  by the virtual machine?
 
 No, that would be unsafe.  When the guest issues a flush then QEMU will
 ensure that data reaches the disk with -drive cache=writeback.

Aha so the writeback behaves like the consume harddisks with write-cache
on them.
In that case maybe an extra note could be added to the virt-manager
(excellent software by the way!) that if the client vm supports
barriers, that write-back in that case then is safe. Agree?


Folkert van Heusden

-- 
Ever wonder what is out there? Any alien races? Then please support
the seti@home project: setiathome.ssl.berkeley.edu
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Commit f9afbd45b0d0 broke mips r4k.

2013-06-13 Thread Ralf Baechle
On Wed, Jun 12, 2013 at 09:35:16PM -0500, Rob Landley wrote:

 My aboriginal linux project builds tiny linux systems to run under
 qemu, producing as close to the same system as possible across a
 bunch of different architectures. The above change broke the mips
 r4k build I've been running under qemu.
 
 Here's a toolchain and reproduction sequence:
 
   wget http://landley.net/aboriginal/bin/cross-compiler-mips.tar.bz2
   tar xvjf cross-compiler-mips.tar.bz2
   export PATH=$PWD/cross-compiler-mips/bin:$PATH
   make ARCH=mips allnoconfig KCONFIG_ALLCONFIG=miniconfig.mips
   make CROSS_COMPILE=mips- ARCH=mips
 
 (The file miniconfig.mips is attached.)
 
 It ends:
 
   CC  init/version.o
   LD  init/built-in.o
 arch/mips/built-in.o: In function `local_r4k_flush_cache_page':
 c-r4k.c:(.text+0xe278): undefined reference to `kvm_local_flush_tlb_all'
 c-r4k.c:(.text+0xe278): relocation truncated to fit: R_MIPS_26
 against `kvm_local_flush_tlb_all'
 arch/mips/built-in.o: In function `local_flush_tlb_range':
 (.text+0xe938): undefined reference to `kvm_local_flush_tlb_all'
 arch/mips/built-in.o: In function `local_flush_tlb_range':
 (.text+0xe938): relocation truncated to fit: R_MIPS_26 against
 `kvm_local_flush_tlb_all'
 arch/mips/built-in.o: In function `local_flush_tlb_mm':
 (.text+0xed38): undefined reference to `kvm_local_flush_tlb_all'
 arch/mips/built-in.o: In function `local_flush_tlb_mm':
 (.text+0xed38): relocation truncated to fit: R_MIPS_26 against
 `kvm_local_flush_tlb_all'
 kernel/built-in.o: In function `__schedule':
 core.c:(.sched.text+0x16a0): undefined reference to
 `kvm_local_flush_tlb_all'
 core.c:(.sched.text+0x16a0): relocation truncated to fit: R_MIPS_26
 against `kvm_local_flush_tlb_all'
 mm/built-in.o: In function `use_mm':
 (.text+0x182c8): undefined reference to `kvm_local_flush_tlb_all'
 mm/built-in.o: In function `use_mm':
 (.text+0x182c8): relocation truncated to fit: R_MIPS_26 against
 `kvm_local_flush_tlb_all'
 fs/built-in.o:(.text+0x7b50): more undefined references to
 `kvm_local_flush_tlb_all' follow
 fs/built-in.o: In function `flush_old_exec':
 (.text+0x7b50): relocation truncated to fit: R_MIPS_26 against
 `kvm_local_flush_tlb_all'
 
 Revert the above commit and it builds to the end.

Commit d414976d1ca721456f7b7c603a8699d117c2ec07 [MIPS: include:
mmu_context.h: Replace VIRTUALIZATION with KVM] fixes the issue and
was pulled by Linus only yesterday.  I cannot reproduce the error
following your receipe using the latest Linux/MIPS tree.

  Ralf
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm-unit-tests: Add a func to run instruction in emulator

2013-06-13 Thread 李春奇
Hi Gleb,
I'm trying to solve these problems in the past days and meet many
difficulties. You want to save all the general registers in calling
insn_page, so registers should be saved to (save) in insn_page.
Because all the instructions should be generated outside and copy to
insn_page, and the instructions generated outside is RIP-relative, so
inside insn_page (save) will be wrong pointed with RIP-relative code.

I have tried to move (save) into insn_page. But when calling
insn_page, data in it can only be read and any instructions like xchg
%%rax, 0+%[save] may cause error, because at this time read is from
TLB but write will cause inconsistent.

Another way is disabling RIP-relative code, but I failed when using
-mcmodel-large -fno-pic, the binary is also using RIP-relative mode.
Is there any way to totally disable RIP-relative code? Besides, using
this feature may specified to some newer C compiler. This may not be a
good solution.

If we don't set %rsp and %rbp when executing emulator code, we can
just use “push/pop to save other general registers.

If you have any better solutions, please let me know.

Thanks,
Arthur

On Thu, Jun 13, 2013 at 12:50 PM, 李春奇 Arthur Chunqi Li
yzt...@gmail.com wrote:
 On Thu, Jun 13, 2013 at 4:50 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 06/06/2013 11:24, Arthur Chunqi Li ha scritto:
 Add a function trap_emulator to run an instruction in emulator.
 Set inregs first (%rax is invalid because it is used as return
 address), put instruction codec in alt_insn and call func with
 alt_insn_length. Get results in outregs.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/emulator.c |   81 
 
  1 file changed, 81 insertions(+)

 diff --git a/x86/emulator.c b/x86/emulator.c
 index 96576e5..8ab9904 100644
 --- a/x86/emulator.c
 +++ b/x86/emulator.c
 @@ -11,6 +11,14 @@ int fails, tests;

  static int exceptions;

 +struct regs {
 + u64 rax, rbx, rcx, rdx;
 + u64 rsi, rdi, rsp, rbp;
 + u64 rip, rflags;
 +};
 +
 +static struct regs inregs, outregs;
 +
  void report(const char *name, int result)
  {
   ++tests;
 @@ -685,6 +693,79 @@ static void test_shld_shrd(u32 *mem)
  report(shrd (cl), *mem == ((0x12345678  3) | (5u  29)));
  }

 +static void trap_emulator(uint64_t *mem, uint8_t *insn_page,
 +  uint8_t *alt_insn_page, void *insn_ram,
 +  uint8_t *alt_insn, int alt_insn_length)
 +{
 + ulong *cr3 = (ulong *)read_cr3();
 + int i;
 +
 + // Pad with RET instructions
 + memset(insn_page, 0xc3, 4096);
 + memset(alt_insn_page, 0xc3, 4096);
 +
 + // Place a trapping instruction in the page to trigger a VMEXIT
 + insn_page[0] = 0x89; // mov %eax, (%rax)
 + insn_page[1] = 0x00;
 + insn_page[2] = 0x90; // nop
 + insn_page[3] = 0xc3; // ret
 +
 + // Place the instruction we want the hypervisor to see in the 
 alternate page
 + for (i=0; ialt_insn_length; i++)
 + alt_insn_page[i] = alt_insn[i];
 +
 + // Save general registers
 + asm volatile(
 + push %rax\n\r
 + push %rbx\n\r
 + push %rcx\n\r
 + push %rdx\n\r
 + push %rsi\n\r
 + push %rdi\n\r
 + );

 This will not work if GCC is using rsp-relative addresses to access
 local variables.  You need to use mov instructions to load from inregs,
 and put the push/pop sequences inside the main asm that does the call
 *%1.
 Is there any way to let gcc use absolute address to access variables?
 I move variant save to the global and use xchg %%rax, 0+%[save]
 and it seems that addressing for save is wrong.

 Arthur

 Paolo

 + // Load the code TLB with insn_page, but point the page tables at
 + // alt_insn_page (and keep the data TLB clear, for AMD decode assist).
 + // This will make the CPU trap on the insn_page instruction but the
 + // hypervisor will see alt_insn_page.
 + install_page(cr3, virt_to_phys(insn_page), insn_ram);
 + invlpg(insn_ram);
 + // Load code TLB
 + asm volatile(call *%0 : : r(insn_ram + 3));
 + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
 + // Trap, let hypervisor emulate at alt_insn_page
 + asm volatile(
 + call *%1\n\r
 +
 + mov %%rax, 0+%[outregs] \n\t
 + mov %%rbx, 8+%[outregs] \n\t
 + mov %%rcx, 16+%[outregs] \n\t
 + mov %%rdx, 24+%[outregs] \n\t
 + mov %%rsi, 32+%[outregs] \n\t
 + mov %%rdi, 40+%[outregs] \n\t
 + mov %%rsp,48+ %[outregs] \n\t
 + mov %%rbp, 56+%[outregs] \n\t
 +
 + /* Save RFLAGS in outregs*/
 + pushf \n\t
 + popq 72+%[outregs] \n\t
 + : [outregs]+m(outregs)
 + : r(insn_ram),
 + a(mem), b(inregs.rbx),
 + c(inregs.rcx), d(inregs.rdx),
 + 

Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-13 Thread Stefan Pietsch
On 09.06.2013 11:43, Gleb Natapov wrote:
 On Thu, Jun 06, 2013 at 02:10:39PM +0200, Stefan Pietsch wrote:
 On 06.06.2013 13:40, Gleb Natapov wrote:
 On Thu, Jun 06, 2013 at 01:35:13PM +0200, Stefan Pietsch wrote:

 I had no success with the Debian kernel 3.10~rc4-1~exp1 (3.10-rc4-686-pae).

 The machine hangs after Enabling APIC mode:  Flat.  Using 1 I/O APICs.
 OK, since it looks like it hangs during timer initialization can you try
 to disable kvmclock? Add -cpu qemu64,-kvmclock to your command line.
 Also can you provide the output of cat /proc/cpuinfo on your host? And
 complete serial output before hang.


 command line:
 qemu-system-i386 -machine accel=kvm -m 512 -cpu qemu64,-kvmclock -cdrom
 grml32-full_2013.02.iso -serial file:ttyS0.log


 ttyS0.log:
 ##

 
 Nothing out of ordinary here. Since you can reproduce the hang and I
 cannot, can you try and bisect it? Also can trace kvm during the hang
 http://www.linux-kvm.org/page/Tracing? Start the trace as close to hang
 as possible and stop it as quick after it as possible too to make trace
 file smaller.


git bisect tells me:
79fd50c67f91136add9726fb7719b57a66c6f763 is the first bad commit


This is my bisect log:

git bisect start
git bisect bad 9626357371b519f2b955fef399647181034a77fe
git bisect good ef4e359d9b9e2dc022f79840fd207796b524a893
git bisect good b5c78e04dd061b776978dad61dd85357081147b0
git bisect good 9e2d59ad580d590134285f361a0e80f0e98c0207
git bisect bad 69086a78bdc973ec0b722be790b146e84ba8a8c4
git bisect good 9ecf9b085a0926e07c78c08a07296bbfd1c37d07
git bisect bad 21fbd5809ad126b949206d78e0a0e07ec872ea11
git bisect bad 79fd50c67f91136add9726fb7719b57a66c6f763
git bisect good 66cdd0ceaf65a18996f561b770eedde1d123b019

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm-unit-tests: Add a func to run instruction in emulator

2013-06-13 Thread Paolo Bonzini
Il 13/06/2013 05:30, 李春奇 Arthur Chunqi Li ha scritto:
 Hi Gleb,
 I'm trying to solve these problems in the past days and meet many
 difficulties. You want to save all the general registers in calling
 insn_page, so registers should be saved to (save) in insn_page.
 Because all the instructions should be generated outside and copy to
 insn_page, and the instructions generated outside is RIP-relative, so
 inside insn_page (save) will be wrong pointed with RIP-relative code.
 
 I have tried to move (save) into insn_page. But when calling
 insn_page, data in it can only be read and any instructions like xchg
 %%rax, 0+%[save] may cause error, because at this time read is from
 TLB but write will cause inconsistent.
 
 Another way is disabling RIP-relative code, but I failed when using
 -mcmodel-large -fno-pic, the binary is also using RIP-relative mode.
 Is there any way to totally disable RIP-relative code? Besides, using
 this feature may specified to some newer C compiler. This may not be a
 good solution.
 
 If we don't set %rsp and %rbp when executing emulator code, we can
 just use “push/pop to save other general registers.

%rbp should not be a problem, on the other hand it's okay not to include
%rsp in the registers struct (and assume insn_page/alt_insn_page do not
touch it).  Interestingly, both VMX and SVM put the guest RSP in the VM
control information so that the switch occurs atomically with the start
of the guest.

Paolo

 If you have any better solutions, please let me know.

 Thanks,
 Arthur
 
 On Thu, Jun 13, 2013 at 12:50 PM, 李春奇 Arthur Chunqi Li
 yzt...@gmail.com wrote:
 On Thu, Jun 13, 2013 at 4:50 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 06/06/2013 11:24, Arthur Chunqi Li ha scritto:
 Add a function trap_emulator to run an instruction in emulator.
 Set inregs first (%rax is invalid because it is used as return
 address), put instruction codec in alt_insn and call func with
 alt_insn_length. Get results in outregs.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/emulator.c |   81 
 
  1 file changed, 81 insertions(+)

 diff --git a/x86/emulator.c b/x86/emulator.c
 index 96576e5..8ab9904 100644
 --- a/x86/emulator.c
 +++ b/x86/emulator.c
 @@ -11,6 +11,14 @@ int fails, tests;

  static int exceptions;

 +struct regs {
 + u64 rax, rbx, rcx, rdx;
 + u64 rsi, rdi, rsp, rbp;
 + u64 rip, rflags;
 +};
 +
 +static struct regs inregs, outregs;
 +
  void report(const char *name, int result)
  {
   ++tests;
 @@ -685,6 +693,79 @@ static void test_shld_shrd(u32 *mem)
  report(shrd (cl), *mem == ((0x12345678  3) | (5u  29)));
  }

 +static void trap_emulator(uint64_t *mem, uint8_t *insn_page,
 +  uint8_t *alt_insn_page, void *insn_ram,
 +  uint8_t *alt_insn, int alt_insn_length)
 +{
 + ulong *cr3 = (ulong *)read_cr3();
 + int i;
 +
 + // Pad with RET instructions
 + memset(insn_page, 0xc3, 4096);
 + memset(alt_insn_page, 0xc3, 4096);
 +
 + // Place a trapping instruction in the page to trigger a VMEXIT
 + insn_page[0] = 0x89; // mov %eax, (%rax)
 + insn_page[1] = 0x00;
 + insn_page[2] = 0x90; // nop
 + insn_page[3] = 0xc3; // ret
 +
 + // Place the instruction we want the hypervisor to see in the 
 alternate page
 + for (i=0; ialt_insn_length; i++)
 + alt_insn_page[i] = alt_insn[i];
 +
 + // Save general registers
 + asm volatile(
 + push %rax\n\r
 + push %rbx\n\r
 + push %rcx\n\r
 + push %rdx\n\r
 + push %rsi\n\r
 + push %rdi\n\r
 + );

 This will not work if GCC is using rsp-relative addresses to access
 local variables.  You need to use mov instructions to load from inregs,
 and put the push/pop sequences inside the main asm that does the call
 *%1.
 Is there any way to let gcc use absolute address to access variables?
 I move variant save to the global and use xchg %%rax, 0+%[save]
 and it seems that addressing for save is wrong.

 Arthur

 Paolo

 + // Load the code TLB with insn_page, but point the page tables at
 + // alt_insn_page (and keep the data TLB clear, for AMD decode 
 assist).
 + // This will make the CPU trap on the insn_page instruction but the
 + // hypervisor will see alt_insn_page.
 + install_page(cr3, virt_to_phys(insn_page), insn_ram);
 + invlpg(insn_ram);
 + // Load code TLB
 + asm volatile(call *%0 : : r(insn_ram + 3));
 + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
 + // Trap, let hypervisor emulate at alt_insn_page
 + asm volatile(
 + call *%1\n\r
 +
 + mov %%rax, 0+%[outregs] \n\t
 + mov %%rbx, 8+%[outregs] \n\t
 + mov %%rcx, 16+%[outregs] \n\t
 + mov %%rdx, 24+%[outregs] \n\t
 + mov %%rsi, 32+%[outregs] \n\t
 + mov %%rdi, 40+%[outregs] \n\t
 + 

Re: KVM MMU: why write-protect for the pages containing PML4/PDPT/PDT (page directory) of the guest?

2013-06-13 Thread Paolo Bonzini
Il 12/06/2013 23:28, yongcheng...@i-soft.com.cn ha scritto:
 I have a problem for shadow page table. why is write-protect for the
 pages containing PML4/PDPT/PDT (page directory) of the guest? In
 other words, need to synchronize the change of the page directory of
 the guest?

Shadow page tables are the combination of both the host and guest page
tables into a single translation.  So they need to be updated every time
the host or the guest change the page tables.  Updates for the host page
tables are tracked with MMU notifiers; updates for the guest page tables
are tracked with write protection.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bottleneck in KVM

2013-06-13 Thread ankit
Hello All

I am relatively new to kvm.
I have installed a web-server in kvm machine and pushing different request
rates on kvm through httperf. While on a bare host , i can go till 6000
request rates per second, the performance in kvm does not increase beyond
3500 request rates, i have checked CPU usage, for the different modules
and no module is getting exhausted in CPU. Enough CPUin VM remains idle in
this case.
I doubt whether i am exhausting on some buffer. Please provide details 
what could be the problem and bottlleneck in this case.

Thanks and Regards
Ankit Anand


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-13 Thread Paolo Bonzini
Il 13/06/2013 07:57, Stefan Pietsch ha scritto:
 git bisect tells me:
 79fd50c67f91136add9726fb7719b57a66c6f763 is the first bad commit

This is an s390 commit, so the bisect somehow went wrong.  Can you
confirm that 3.7 works and 3.8 doesn't?

Please check these pairs:

9e2d59a and 89f883372fa60f604d136924baf3e89ff1870e9e
39ab967 and 875b7679abbb232b584f2eec59fa6e45690dd6c4
10b3866 and ea4a0ce11160200410abbabd44ec9e75e93a95be
4ffd4eb and ccae663cd4f62890d862c660e5ed762eb9821c14
896ea17 and 66cdd0ceaf65a18996f561b770eedde1d123b019

Please tell us which pair introduced the failure.  Then:

- if you get a bad and bad pair, tell us and we'll figure out what's
next :)

- if you get a good and bad pair, do a git bisect between the two
commits in that pair.

Thanks!

Paolo

 This is my bisect log:
 
 git bisect start
 git bisect bad 9626357371b519f2b955fef399647181034a77fe
 git bisect good ef4e359d9b9e2dc022f79840fd207796b524a893
 git bisect good b5c78e04dd061b776978dad61dd85357081147b0
 git bisect good 9e2d59ad580d590134285f361a0e80f0e98c0207
 git bisect bad 69086a78bdc973ec0b722be790b146e84ba8a8c4
 git bisect good 9ecf9b085a0926e07c78c08a07296bbfd1c37d07
 git bisect bad 21fbd5809ad126b949206d78e0a0e07ec872ea11
 git bisect bad 79fd50c67f91136add9726fb7719b57a66c6f763
 git bisect good 66cdd0ceaf65a18996f561b770eedde1d123b019
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-13 Thread Paolo Bonzini
Il 13/06/2013 09:42, Paolo Bonzini ha scritto:
 Il 13/06/2013 07:57, Stefan Pietsch ha scritto:
 git bisect tells me:
 79fd50c67f91136add9726fb7719b57a66c6f763 is the first bad commit
 
 This is an s390 commit, so the bisect somehow went wrong.  Can you
 confirm that 3.7 works and 3.8 doesn't?

Sorry, 3.8 works and 3.9 doesn't
(66cdd0ceaf65a18996f561b770eedde1d123b019 was the 3.8 merge window
update, and your bisect shows it as good).

Can you double-check this with both normal modprobe kvm_intel and
modprobe kvm_intel emulate_invalid_guest_state=0?

Paolo

 Please check these pairs:
 
 9e2d59a and 89f883372fa60f604d136924baf3e89ff1870e9e
 39ab967 and 875b7679abbb232b584f2eec59fa6e45690dd6c4
 10b3866 and ea4a0ce11160200410abbabd44ec9e75e93a95be
 4ffd4eb and ccae663cd4f62890d862c660e5ed762eb9821c14
 896ea17 and 66cdd0ceaf65a18996f561b770eedde1d123b019
 
 Please tell us which pair introduced the failure.  Then:
 
 - if you get a bad and bad pair, tell us and we'll figure out what's
 next :)
 
 - if you get a good and bad pair, do a git bisect between the two
 commits in that pair.
 
 Thanks!
 
 Paolo
 
 This is my bisect log:

 git bisect start
 git bisect bad 9626357371b519f2b955fef399647181034a77fe
 git bisect good ef4e359d9b9e2dc022f79840fd207796b524a893
 git bisect good b5c78e04dd061b776978dad61dd85357081147b0
 git bisect good 9e2d59ad580d590134285f361a0e80f0e98c0207
 git bisect bad 69086a78bdc973ec0b722be790b146e84ba8a8c4
 git bisect good 9ecf9b085a0926e07c78c08a07296bbfd1c37d07
 git bisect bad 21fbd5809ad126b949206d78e0a0e07ec872ea11
 git bisect bad 79fd50c67f91136add9726fb7719b57a66c6f763
 git bisect good 66cdd0ceaf65a18996f561b770eedde1d123b019

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/7] target-arm: cpregs list for migration, kvm reset

2013-06-13 Thread Peter Maydell
On 3 June 2013 14:47, Peter Maydell peter.mayd...@linaro.org wrote:
 This patch series overhauls how we handle ARM coprocessor registers,
 so that we use a consistent approach for migration, reset and
 QEMU-KVM synchronisation, driven by the kernel's list of supported
 registers.

Applied to target-arm.next. (If these were on somebody's to-review
list, yell and I'll unapply them; but these plus v1 have been on
list for a fair while without attracting any particular interest.)

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cache write back barriers

2013-06-13 Thread Alexandre DERUMIER
I'm wondering: does this also make kvm to ignore write barriers invoked 
by the virtual machine? 

no, cache=writeback is ok, write barriers are working correctly

only with cache=unsafe,it doesn't care about write flush.


- Mail original - 

De: folkert folk...@vanheusden.com 
À: kvm@vger.kernel.org 
Envoyé: Mercredi 12 Juin 2013 10:03:10 
Objet: cache write back  barriers 

Hi, 

In virt-manager I saw that there's the option for cache writeback for 
storage devices. 
I'm wondering: does this also make kvm to ignore write barriers invoked 
by the virtual machine? 


regards, 

Folkert van Heusden 

-- 
Always wondered what the latency of your webserver is? Or how much more 
latency you get when you go through a proxy server/tor? The numbers 
tell the tale and with HTTPing you know them! 
http://www.vanheusden.com/httping/ 
--- 
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com 
-- 
To unsubscribe from this list: send the line unsubscribe kvm in 
the body of a message to majord...@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-13 Thread Stefan Pietsch
On 13.06.2013 15:42, Paolo Bonzini wrote:
 Il 13/06/2013 07:57, Stefan Pietsch ha scritto:
 git bisect tells me:
 79fd50c67f91136add9726fb7719b57a66c6f763 is the first bad commit
 
 This is an s390 commit, so the bisect somehow went wrong.  Can you
 confirm that 3.7 works and 3.8 doesn't?

Confirmed. Something went wrong.
I replayed the bisect log and now I have

git bisect bad 9626357371b519f2b955fef399647181034a77fe
git bisect good ef4e359d9b9e2dc022f79840fd207796b524a893
git bisect good b5c78e04dd061b776978dad61dd85357081147b0
git bisect good 9e2d59ad580d590134285f361a0e80f0e98c0207
git bisect bad 69086a78bdc973ec0b722be790b146e84ba8a8c4
git bisect good 9ecf9b085a0926e07c78c08a07296bbfd1c37d07
git bisect bad 21fbd5809ad126b949206d78e0a0e07ec872ea11
git bisect bad 79fd50c67f91136add9726fb7719b57a66c6f763
git bisect bad aa11e3a8a6d9f92c3fe4b91a9aca5d8c23d55d4d
git bisect good 66cdd0ceaf65a18996f561b770eedde1d123b019
git bisect bad d99e415275dd3f757b75981adad8645cdc26da45

So please wait for my results.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm-unit-tests: Add a func to run instruction in emulator

2013-06-13 Thread Arthur Chunqi Li
Add a function trap_emulator to run an instruction in emulator.
Set inregs first (%rax is invalid because it is used as return
address), put instruction codec in alt_insn and call func with
alt_insn_length. Get results in outregs.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/emulator.c |  132 
 1 file changed, 132 insertions(+)

diff --git a/x86/emulator.c b/x86/emulator.c
index 96576e5..4981bfb 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -11,6 +11,16 @@ int fails, tests;
 
 static int exceptions;
 
+struct regs {
+   u64 rax, rbx, rcx, rdx;
+   u64 rsi, rdi, rsp, rbp;
+   u64 r8, r9, r10, r11;
+   u64 r12, r13, r14, r15;
+   u64 rip, rflags;
+};
+static struct regs inregs, outregs;
+extern struct regs save;
+
 void report(const char *name, int result)
 {
++tests;
@@ -685,6 +695,128 @@ static void test_shld_shrd(u32 *mem)
 report(shrd (cl), *mem == ((0x12345678  3) | (5u  29)));
 }
 
+extern u8 insn_start[], insn_end[];
+extern u8 insn_emulate_start[], insn_emulate_end[];
+
+static void mk_insn_page(uint8_t *insn_page, uint8_t *alt_insn_page,
+   uint8_t *alt_insn, int alt_insn_length)
+{
+   int i, emul_offset;
+   for (i=1; iinsn_emulate_end - insn_emulate_start; i++)
+   insn_emulate_start[i] = 0x90; // nop
+   for (i=0; iinsn_end - insn_start; i++)
+   insn_page[i] = insn_start[i];
+   emul_offset = insn_emulate_start - insn_start;
+   for (i=0; ialt_insn_length; i++)
+   alt_insn_page[i+emul_offset] = alt_insn[i];
+
+   asm volatile(
+   .pushsection .text.insn, \ax\ \n\t
+   insn_start:\n\t
+   ret\n\t
+
+   push %%rax; push %%rbx\n\t
+   push %%rcx; push %%rdx\n\t
+   push %%rsi; push %%rdi\n\t
+   push %%rbp\n\t
+   push %%r8; push %%r9\n\t
+   push %%r10; push %%r11\n\t
+   push %%r12; push %%r13\n\t
+   push %%r14; push %%r15\n\t
+   pushf\n\t
+
+   push 136+%[save] \n\t
+   popf \n\t
+   mov 0+%[save], %%rax \n\t
+   mov 8+%[save], %%rbx \n\t
+   mov 16+%[save], %%rcx \n\t
+   mov 24+%[save], %%rdx \n\t
+   mov 32+%[save], %%rsi \n\t
+   mov 40+%[save], %%rdi \n\t
+   mov 56+%[save], %%rbp \n\t
+   mov 64+%[save], %%r8 \n\t
+   mov 72+%[save], %%r9 \n\t
+   mov 80+%[save], %%r10  \n\t
+   mov 88+%[save], %%r11 \n\t
+   mov 96+%[save], %%r12 \n\t
+   mov 104+%[save], %%r13 \n\t
+   mov 112+%[save], %%r14 \n\t
+   mov 120+%[save], %%r15 \n\t
+
+   insn_emulate_start:\n\t
+   in  (%%dx),%%al\n\t
+   . = . + 31\n\t
+   insn_emulate_end:\n\t
+
+   pushf \n\t
+   pop 136+%[save] \n\t
+   mov %%rax, 0+%[save] \n\t
+   mov %%rbx, 8+%[save] \n\t
+   mov %%rcx, 16+%[save] \n\t
+   mov %%rdx, 24+%[save] \n\t
+   mov %%rsi, 32+%[save] \n\t
+   mov %%rdi, 40+%[save] \n\t
+   mov %%rbp, 56+%[save] \n\t
+   mov %%r8, 64+%[save]\n\t
+   mov %%r9, 72+%[save]\n\t
+   mov %%r10, 80+%[save]\n\t
+   mov %%r11, 88+%[save]\n\t
+   mov %%r12, 96+%[save]\n\t
+   mov %%r13, 104+%[save]\n\t
+   mov %%r14, 112+%[save]\n\t
+   mov %%r15, 120+%[save]\n\t
+
+   popf\n\t
+   pop %%r15; pop %%r14 \n\t
+   pop %%r13; pop %%r12 \n\t
+   pop %%r11; pop %%r10 \n\t
+   pop %%r9; pop %%r8 \n\t
+   pop %%rbp \n\t
+   pop %%rdi; pop %%rsi \n\t
+   pop %%rdx; pop %%rcx \n\t
+   pop %%rbx; pop %%rax \n\t
+
+   ret\n\t
+   
+   save:\n\t
+   . = . + 256\n\t
+   insn_end:\n\t
+   .popsection\n\t
+   : [save]=m(save)
+   : : memory, cc
+   );
+}
+
+static void trap_emulator(uint64_t *mem, uint8_t *insn_page,
+uint8_t *alt_insn_page, void *insn_ram,
+uint8_t* alt_insn, int alt_insn_length, int 
reserve_stack)
+{
+   ulong *cr3 = (ulong *)read_cr3();
+   extern u8 insn_start[];
+   int save_offset = (u8 *)(save) - insn_start;
+   
+   memset(insn_page, 0x90, 4096);
+   memset(alt_insn_page, 0x90, 4096);
+   
+   save = inregs;
+   mk_insn_page(insn_page, alt_insn_page,
+   alt_insn, alt_insn_length);
+   
+   // Load the code TLB with insn_page, but point the page tables at
+   // alt_insn_page (and keep the data TLB clear, for AMD 

[PATCH 2/2] kvm-unit-tests: Change two cases to use trap_emulator

2013-06-13 Thread Arthur Chunqi Li
Change two functions (test_mmx_movq_mf and test_movabs) using
unified trap_emulator.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/emulator.c |   85 +++-
 1 file changed, 23 insertions(+), 62 deletions(-)

diff --git a/x86/emulator.c b/x86/emulator.c
index 4981bfb..7698f56 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -826,73 +826,34 @@ static void advance_rip_by_3_and_note_exception(struct 
ex_regs *regs)
 static void test_mmx_movq_mf(uint64_t *mem, uint8_t *insn_page,
 uint8_t *alt_insn_page, void *insn_ram)
 {
-uint16_t fcw = 0;  // all exceptions unmasked
-ulong *cr3 = (ulong *)read_cr3();
-
-write_cr0(read_cr0()  ~6);  // TS, EM
-// Place a trapping instruction in the page to trigger a VMEXIT
-insn_page[0] = 0x89; // mov %eax, (%rax)
-insn_page[1] = 0x00;
-insn_page[2] = 0x90; // nop
-insn_page[3] = 0xc3; // ret
-// Place the instruction we want the hypervisor to see in the alternate 
page
-alt_insn_page[0] = 0x0f; // movq %mm0, (%rax)
-alt_insn_page[1] = 0x7f;
-alt_insn_page[2] = 0x00;
-alt_insn_page[3] = 0xc3; // ret
-
-exceptions = 0;
-handle_exception(MF_VECTOR, advance_rip_by_3_and_note_exception);
-
-// Load the code TLB with insn_page, but point the page tables at
-// alt_insn_page (and keep the data TLB clear, for AMD decode assist).
-// This will make the CPU trap on the insn_page instruction but the
-// hypervisor will see alt_insn_page.
-install_page(cr3, virt_to_phys(insn_page), insn_ram);
-asm volatile(fninit; fldcw %0 : : m(fcw));
-asm volatile(fldz; fldz; fdivp); // generate exception
-invlpg(insn_ram);
-// Load code TLB
-asm volatile(call *%0 : : r(insn_ram + 3));
-install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
-// Trap, let hypervisor emulate at alt_insn_page
-asm volatile(call *%0 : : r(insn_ram), a(mem));
-// exit MMX mode
-asm volatile(fnclex; emms);
-report(movq mmx generates #MF, exceptions == 1);
-handle_exception(MF_VECTOR, 0);
+   uint16_t fcw = 0;  // all exceptions unmasked
+   uint8_t alt_insn[] = {0x0f, 0x7f, 0x00}; // movq %mm0, (%rax)
+
+   write_cr0(read_cr0()  ~6);  // TS, EM
+   exceptions = 0;
+   handle_exception(MF_VECTOR, advance_rip_by_3_and_note_exception);
+   asm volatile(fninit; fldcw %0 : : m(fcw));
+   asm volatile(fldz; fldz; fdivp); // generate exception
+
+   inregs = (struct regs){ 0 };
+   trap_emulator(mem, insn_page, alt_insn_page, insn_ram,
+   alt_insn, 3, 1);
+   // exit MMX mode
+   asm volatile(fnclex; emms);
+   report(movq mmx generates #MF, exceptions == 1);
+   handle_exception(MF_VECTOR, 0);
 }
 
 static void test_movabs(uint64_t *mem, uint8_t *insn_page,
   uint8_t *alt_insn_page, void *insn_ram)
 {
-uint64_t val = 0;
-ulong *cr3 = (ulong *)read_cr3();
-
-// Pad with RET instructions
-memset(insn_page, 0xc3, 4096);
-memset(alt_insn_page, 0xc3, 4096);
-// Place a trapping instruction in the page to trigger a VMEXIT
-insn_page[0] = 0x89; // mov %eax, (%rax)
-insn_page[1] = 0x00;
-// Place the instruction we want the hypervisor to see in the alternate
-// page. A buggy hypervisor will fetch a 32-bit immediate and return
-// 0xc3c3c3c3.
-alt_insn_page[0] = 0x48; // mov $0xc3c3c3c3c3c3c3c3, %rcx
-alt_insn_page[1] = 0xb9;
-
-// Load the code TLB with insn_page, but point the page tables at
-// alt_insn_page (and keep the data TLB clear, for AMD decode assist).
-// This will make the CPU trap on the insn_page instruction but the
-// hypervisor will see alt_insn_page.
-install_page(cr3, virt_to_phys(insn_page), insn_ram);
-// Load code TLB
-invlpg(insn_ram);
-asm volatile(call *%0 : : r(insn_ram + 3));
-// Trap, let hypervisor emulate at alt_insn_page
-install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
-asm volatile(call *%1 : =c(val) : r(insn_ram), a(mem), c(0));
-report(64-bit mov imm, val == 0xc3c3c3c3c3c3c3c3);
+   // mov $0xc3c3c3c3c3c3c3c3, %rcx
+   uint8_t alt_insn[] = {0x48, 0xb9, 0xc3, 0xc3, 0xc3,
+   0xc3, 0xc3, 0xc3, 0xc3, 0xc3};
+   inregs = (struct regs){ .rbx = 0x5678, .rcx = 0x1234 };
+   trap_emulator(mem, insn_page, alt_insn_page, insn_ram,
+   alt_insn, 10, 1);
+   report(64-bit mov imm2, outregs.rcx == 0xc3c3c3c3c3c3c3c3);
 }
 
 static void test_crosspage_mmio(volatile uint8_t *mem)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] armv7 initial device passthrough support

2013-06-13 Thread Mario Smarduch
Updated Device Passthrough Patch.
- optimized IRQ-CPU-vCPU binding, irq is installed once
- added dynamic IRQ affinity on schedule in
- added documentation and few other coding recommendations.

Per earlier discussion VFIO is our target but we like
something earlier to work with to tackle performance
latency issue (some ARM related) for device passthrough 
while we migrate towards VFIO.

- Mario


Signed-off-by: Mario Smarduch mario.smard...@huawei.com
---
 arch/arm/include/asm/kvm_host.h |   31 +
 arch/arm/include/asm/kvm_vgic.h |   10 ++
 arch/arm/kvm/Makefile   |1 +
 arch/arm/kvm/arm.c  |   80 +
 arch/arm/kvm/assign-dev.c   |  248 +++
 arch/arm/kvm/vgic.c |  134 +
 include/linux/irqchip/arm-gic.h |1 +
 include/uapi/linux/kvm.h|   33 ++
 8 files changed, 538 insertions(+)
 create mode 100644 arch/arm/kvm/assign-dev.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 57cb786..c85c3a0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -67,6 +67,10 @@ struct kvm_arch {
 
/* Interrupt controller */
struct vgic_distvgic;
+
+   /* Device Passthrough Fields */
+   struct list_headassigned_dev_head;
+   struct mutexdev_passthrough_lock;
 };
 
 #define KVM_NR_MEM_OBJS 40
@@ -146,6 +150,13 @@ struct kvm_vcpu_stat {
u32 halt_wakeup;
 };
 
+struct kvm_arm_assigned_dev_kernel {
+   struct list_head list;
+   struct kvm_arm_assigned_device dev;
+   irqreturn_t (*irq_handler)(int, void *);
+   unsigned long vcpuid_irq_arg;
+};
+
 struct kvm_vcpu_init;
 int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
const struct kvm_vcpu_init *init);
@@ -157,6 +168,26 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct 
kvm_one_reg *reg);
 u64 kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
+#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP
+int kvm_arm_get_device_resources(struct kvm *,
+   struct kvm_arm_get_device_resources *);
+int kvm_arm_assign_device(struct kvm *, struct kvm_arm_assigned_device *);
+void kvm_arm_setdev_irq_affinity(struct kvm_vcpu *vcpu, int cpu);
+#else
+static inline int kvm_arm_get_device_resources(struct kvm *k, struct 
kvm_arm_get_device_resources *r)
+{
+   return -1;
+}
+static inline int kvm_arm_assign_device(struct kvm *k, struct 
kvm_arm_assigned_device *d)
+{
+   return -1;
+}
+
+static inline void kvm_arm_setdev_irq_affinity(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+#endif
+
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 struct kvm;
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 343744e..fb6afd2 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -107,6 +107,16 @@ struct vgic_dist {
 
/* Bitmap indicating which CPU has something pending */
unsigned long   irq_pending_on_cpu;
+
+   /* Device passthrough  fields */
+   /* Host irq to guest irq mapping */
+   u8  guest_irq[VGIC_NR_SHARED_IRQS];
+
+   /* Pending passthruogh irq */
+   struct vgic_bitmap  passthrough_spi_pending;
+
+   /* At least one passthrough IRQ pending for some vCPU */
+   u32 passthrough_pending;
 #endif
 };
 
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 53c5ed8..823fc38 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -21,3 +21,4 @@ obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
 obj-y += coproc.o coproc_a15.o mmio.o psci.o perf.o
 obj-$(CONFIG_KVM_ARM_VGIC) += vgic.o
 obj-$(CONFIG_KVM_ARM_TIMER) += arch_timer.o
+obj-$(CONFIG_KVM_ARM_INT_PRIO_DROP) += assign-dev.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 37d216d..ba54c64 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -26,6 +26,8 @@
 #include linux/mman.h
 #include linux/sched.h
 #include linux/kvm.h
+#include linux/interrupt.h
+#include linux/ioport.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -43,6 +45,7 @@
 #include asm/kvm_emulate.h
 #include asm/kvm_coproc.h
 #include asm/kvm_psci.h
+#include asm/kvm_host.h
 
 #ifdef REQUIRES_VIRT
 __asm__(.arch_extension   virt);
@@ -139,6 +142,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
/* Mark the initial VMID generation invalid */
kvm-arch.vmid_gen = 0;
+   /*
+* Initialize Dev Passthrough Fields
+*/
+   INIT_LIST_HEAD(kvm-arch.assigned_dev_head);
+   mutex_init(kvm-arch.dev_passthrough_lock);
 
return ret;
 out_free_stage2_pgd:
@@ -169,6 +177,40 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, 
unsigned long npages)
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
int i;
+   struct list_head 

RE: Bottleneck in KVM

2013-06-13 Thread Ren, Yongjie
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
 On Behalf Of an...@ssl.serc.iisc.in
 Sent: Thursday, June 13, 2013 9:18 PM
 To: kvm@vger.kernel.org
 Subject: Bottleneck in KVM
 
 Hello All
 
 I am relatively new to kvm.
 I have installed a web-server in kvm machine and pushing different request
 rates on kvm through httperf. While on a bare host , i can go till 6000
 request rates per second, the performance in kvm does not increase
 beyond
 3500 request rates, i have checked CPU usage, for the different modules
 and no module is getting exhausted in CPU. Enough CPUin VM remains idle
 in
 this case.
 I doubt whether i am exhausting on some buffer. Please provide details
 what could be the problem and bottlleneck in this case.
 
If your application is CPU intensive, I think it's easy to achieve  90% perf in
a KVM guest.
1. what's your qemu command line to start the guest ?
2. how about your I/O ? Is your service I/O intensive?
3. your hardware ?


 Thanks and Regards
 Ankit Anand
 
 
 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-13 Thread Marcelo Tosatti
On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
 Changelog:
 V3:
   All of these changes are from Gleb's review:
   1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
   2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
  to avoid kvm_memslots-generation overflow.
 
 V2:
   - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
   - use kvm-memslots-generation as kvm global generation-number
   - fix comment and codestyle
   - init kvm generation close to mmio wrap-around value
   - keep kvm_mmu_zap_mmio_sptes
 
 The current way is holding hot mmu-lock and walking all shadow pages, this
 is not scale. This patchset tries to introduce a very simple and scale way
 to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
 any locks.

Hi Xiao,

- Where is the generation number increased?
- Should use spinlock breakable code in kvm_mmu_zap_mmio_sptes()
(picture guest with 512GB of RAM, even walking all those pages is
expensive) (ah, patch to remove kvm_mmu_zap_mmio_sptes does that).
- Is -13 enough to test wraparound? Its highly likely the guest has 
not began executing by the time 13 kvm_set_memory_calls are made
(so no sptes around). Perhaps -2000 is more sensible (should confirm
though).
- Why remove if (change == KVM_MR_CREATE) || (change
==  KVM_MR_MOVE) from kvm_arch_commit_memory_region?
Its instructive.

Otherwise looks good.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: KVM MMU: why write-protect for the pages containing PML4/PDPT/PDT (page directory) of the guest?

2013-06-13 Thread yongcheng . wu
I got it ,  thank you for your help.



2013/6/13 Paolo Bonzini pbonz...@redhat.com

Il 12/06/2013 23:28, yongcheng...@i-soft.com.cn ha scritto:
 I have a problem for shadow page table. why is write-protect for the
 pages containing PML4/PDPT/PDT (page directory) of the guest? In
 other words, need to synchronize the change of the page directory of
 the guest?

Shadow page tables are the combination of both the host and guest page
tables into a single translation.  So they need to be updated every time
the host or the guest change the page tables.  Updates for the host page
tables are tracked with MMU notifiers; updates for the guest page tables
are tracked with write protection.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Book3s_hv KVM HTAB bug

2013-06-13 Thread Alexander Graf
Hi Paul,

We've just seen another KVM bug with 3.8 on p7. It looks as if for some reason 
a bolted HTAB entry for the kernel got evicted.

[   16s] booting kvm ...
[   16s] /usr/bin/qemu-system-ppc64 -no-reboot -nographic -vga none -net none 
-enable-kvm -M pseries -cpu host -kernel /boot/vmlinux -initrd /boot/initrd 
-append root=/dev/vda panic=1 quiet no-kvmclock nmi_watchdog=0 rw elevator=noop 
console=hvc0 init=/.build/build -m 3072 -drive 
file=/obs/worker/root_1/root,if=virtio,cache=unsafe -drive 
file=/obs/worker/root_1/root,if=ide,index=0,cache=unsafe -drive 
file=/obs/worker/root_1.swap,if=virtio,cache=unsafe -smp 1
[   16s] 
[   16s] 
[   16s] SLOF 
**
[   16s] QEMU Starting
[   16s]  Build Date = Jun 10 2013 17:00:23
[   16s]  FW Version = git-f564e52f4418d308
[   16s]  Press s to enter Open Firmware.

[   16s] 
[   16s] C
[   16s] C0100
[   17s] C0120
[   17s] C0140
[   17s] C0200
[   17s] C0201
[   17s] C0220
[   17s] C0240
[   17s] C0260
[   17s] C0270
[   17s] C02E0
[   17s] C0300
[   17s] C0320
[   17s] C0360
[   17s] C0370
[   17s] C0371
[   17s] C0372
[   17s] C0373
[   17s] C0374
[   17s] C0390
[   17s] C03F0
[   17s] C0400
[   17s] C0480
[   17s] C04C0
[   17s] C04D0
[   17s] C0500
[   17s] Populating /vdevice methods
[   17s] Populating /vdevice/vty@7100
[   18s] Populating /vdevice/nvram@7101
[   18s] 
[   18s] NVRAM: size=65536, fetch=200E, store=200F
[   18s] Populating /vdevice/v-scsi@7102
[   18s] VSCSI: Initializing
[   18s] VSCSI: Looking for devices
[   18s]   8200 CD-ROM   : QEMU QEMU CD-ROM  1.5.
[   18s] C0580
[   18s] C05A0
[   18s] Populating /pci@8002000
[   18s]  Adapters on 08002000
[   18s]  00  (D) : 1af4 1001virtio [ block ]
[   18s]  00 0800 (D) : 1af4 1001virtio [ block ]
[   18s] C0600
[   18s] C0640
[   18s] C0690
[   18s] C06A0
[   18s] C06A8
[   18s] C06B0
[   18s] C06B8
[   18s] C06C0
[   18s] C06E0
[   18s] C0700
[   18s] C0800
[   18s] C0880
[   18s] No NVRAM common partition, re-initializing...
[   18s] C0890
[   18s] C08A0
[   19s] C08A8
[   19s] C08B0
[   19s] C08C0
[   19s] C08D0
[   19s] Using default console: /vdevice/vty@7100
[   19s] C08E0
[   19s] C08E8
[   19s] Detected RAM kernel at 40 (16185f0 bytes) C08FF
[   19s]  
[   19s]   Welcome to Open Firmware
[   19s] 
[   19s]   Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
[   19s]   This program and the accompanying materials are made available
[   19s]   under the terms of the BSD License available at
[   19s]   http://www.opensource.org/licenses/bsd-license.php
[   19s] 
[   19s] Booting from memory...
[   19s] OF stdout device is: /vdevice/vty@7100
[   19s] Preparing to boot Linux version 3.8.0-2-default (geeko@buildhost) (gcc 
version 4.5.0 20100414 [gcc-4_5-branch revision 158342] (SUSE Linux) ) #1 SMP 
Wed Feb 20 02:54:06 UTC 2013 (e252f7f)
[   19s] Detected machine type: 0101
[   19s] Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
[   19s] Calling ibm,client-architecture-support... not implemented
[   19s] couldn't open /packages/elf-loader
[   19s] command line: root=/dev/vda panic=1 quiet no-kvmclock nmi_watchdog=0 
rw elevator=noop console=hvc0 init=/.build/build
[   19s] memory layout at init:
[   19s]   memory_limit :  (16 MB aligned)
[   19s]   alloc_bottom : 01a3
[   19s]   alloc_top: 1000
[   19s]   alloc_top_hi : c000
[   19s]   rmo_top  : 1000
[   19s]   ram_top  : c000
[   19s] instantiating rtas at 0x0dbf... done
[   19s] Querying for OPAL presence... not there.
[   19s] boot cpu hw idx 0
[   19s] copying OF device tree...
[   19s] Building dt strings...
[   19s] Building dt structure...
[   19s] Device tree strings 0x01d4 - 0x01d40774
[   19s] Device tree struct  0x01d5 - 0x01d6
[   19s] Calling quiesce...
[   19s] returning from prom_init
[20500s] QEMU 1.5.0 monitor - type 'help' for more information
[20500s] (qemu)
[20505s] (qemu) info registers
[20505s] NIP 0410   LR 00b31240 CTR  XER 

[20505s] MSR 80001000 HID0   HF  idx 1
[20505s] TB   DECR 
[20505s] GPR00 00b31240 c128bde0 01288c40 

[20505s] GPR04  c128bcc0 4001438795007015 
70001194
[20505s] GPR08  2288 b0001032 
c0005d00
[20505s] GPR12 800040001032 cff2  

[20505s] GPR16    

[20505s] GPR20    

[20505s] GPR24 4000 c000 

Re: Book3s_hv KVM HTAB bug

2013-06-13 Thread Paul Mackerras
On Thu, Jun 13, 2013 at 02:34:56PM +0200, Alexander Graf wrote:
 Hi Paul,
 
 We've just seen another KVM bug with 3.8 on p7. It looks as if for some 
 reason a bolted HTAB entry for the kernel got evicted.
 
...
 (gdb) x /i 0xc0005d00
0xc0005d00 instruction_access_common:andi.   r10,r12,16384
 (qemu) xp /i 0x5d00
0x5d00:  andi.   r10,r12,16384
 (qemu) info tlb
SLBESIDVSID
3  0xc800  0xc00838795000
 
 So for some reason QEMU can still resolve the virtual address using the guest 
 HTAB, but the the CPU can not. Otherwise the guest wouldn't get a 0x400 when 
 accessing that page.

When I've seen this sort of thing it has usually been that we failed
to insert a HPTE in htab_bolt_mapping(), called from
htab_initialize().  When that happens we BUG_ON(), which is stupid
because it causes a program interrupt, and the first thing we do is
turn the MMU on, but we don't have a linear mapping set up, so we
start taking continual instruction storage interrupts (because the ISI
handler also wants to turn on interrupts).  Ben has an idea to fix
that, which is to have IR and DR off in paca-kernel_msr until we're
ready to turn the MMU on.  That might help debuggability in the case
you're hitting, whether or not it's htab_bolt_mapping failing.

Are you *absolutely* sure that QEMU is using the guest HTAB to
translate the 0xc... addresses?  If it is actually doing so it would
need to be using the relatively new KVM_PPC_GET_HTAB_FD ioctl, and I
thought the only place that was used was in the migration code.

To debug this sort of thing, what I usually do is patch the guest
kernel to put a branch to self at 0x400.  Then when it hangs you have
some chance of sorting out what happened using info registers etc.

I would be very interested to know how big a HPT the host kernel
allocated for the guest and what was in it.  The host kernel prints a
message telling you the size and location of the HPT, and in this sort
of situation I find it helpful to take a copy of it with dd and dump
it with hexdump.

Also, what page size are you using in the host kernel?  If it's 4k,
then the guest kernel is limited to using 4k pages for the linear
mapping, which can mean it runs out of space in the HPT for the linear
mapping more easily.

Since you don't have my patch to add a flexible allocator for the HPT
and RMA areas (you rejected it, if you recall), you'll be limited to
what you can allocate from the page allocator, which is usually 16MB,
but may be less if free memory is low and/or fragmented.  16MB should
be enough for a 3GB guest, particularly if you're using 64k pages in
the host, but if the host was only able to allocate a much smaller
HPT, that might explain the problem.

Let me know if you discover anything further...

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Book3s_hv KVM HTAB bug

2013-06-13 Thread Alexander Graf

On 14.06.2013, at 01:20, Paul Mackerras wrote:

 On Thu, Jun 13, 2013 at 02:34:56PM +0200, Alexander Graf wrote:
 Hi Paul,
 
 We've just seen another KVM bug with 3.8 on p7. It looks as if for some 
 reason a bolted HTAB entry for the kernel got evicted.
 
 ...
 (gdb) x /i 0xc0005d00
   0xc0005d00 instruction_access_common:andi.   r10,r12,16384
 (qemu) xp /i 0x5d00
   0x5d00:  andi.   r10,r12,16384
 (qemu) info tlb
   SLBESIDVSID
   3  0xc800  0xc00838795000
 
 So for some reason QEMU can still resolve the virtual address using the 
 guest HTAB, but the the CPU can not. Otherwise the guest wouldn't get a 
 0x400 when accessing that page.
 
 When I've seen this sort of thing it has usually been that we failed
 to insert a HPTE in htab_bolt_mapping(), called from
 htab_initialize().  When that happens we BUG_ON(), which is stupid
 because it causes a program interrupt, and the first thing we do is
 turn the MMU on, but we don't have a linear mapping set up, so we
 start taking continual instruction storage interrupts (because the ISI
 handler also wants to turn on interrupts).  Ben has an idea to fix

Ok, that makes sense and sounds like a reasonable possible failure scenario. 
Unfortunately the guest already got killed and right now everything's running 
again without any guest hanging.

However, I did forget to also paste the dump of log_buf on my last email. Does 
that log coincide with what you would expect at this point?

  00 00 00 00 00 00 00 00  00 4c 00 39 00 00 00 37  |.L.9...7|
0010  41 6c 6c 6f 63 61 74 65  64 20 39 31 37 35 30 34  |Allocated 917504|
0020  20 62 79 74 65 73 20 66  6f 72 20 31 30 32 34 20  | bytes for 1024 |
0030  70 61 63 61 73 20 61 74  20 63 30 30 30 30 30 30  |pacas at c00|
0040  30 30 66 66 32 30 30 30  30 00 00 00 00 00 00 00  |00ff2...|
0050  00 00 00 00 00 34 00 21  00 00 00 36 55 73 69 6e  |.4.!...6Usin|
0060  67 20 70 53 65 72 69 65  73 20 6d 61 63 68 69 6e  |g pSeries machin|
0070  65 20 64 65 73 63 72 69  70 74 69 6f 6e 00 00 00  |e description...|
0080  00 00 00 00 00 00 00 00  00 48 00 37 00 00 00 37  |.H.7...7|
0090  50 61 67 65 20 6f 72 64  65 72 73 3a 20 6c 69 6e  |Page orders: lin|
00a0  65 61 72 20 6d 61 70 70  69 6e 67 20 3d 20 31 36  |ear mapping = 16|
00b0  2c 20 76 69 72 74 75 61  6c 20 3d 20 31 36 2c 20  |, virtual = 16, |
00c0  69 6f 20 3d 20 31 32 00  00 00 00 00 00 00 00 00  |io = 12.|
00d0  00 24 00 12 00 00 00 36  55 73 69 6e 67 20 31 54  |.$.6Using 1T|
00e0  42 20 73 65 67 6d 65 6e  74 73 00 00 00 00 00 00  |B segments..|
00f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
000186a0

 that, which is to have IR and DR off in paca-kernel_msr until we're
 ready to turn the MMU on.  That might help debuggability in the case
 you're hitting, whether or not it's htab_bolt_mapping failing.
 
 Are you *absolutely* sure that QEMU is using the guest HTAB to
 translate the 0xc... addresses?  If it is actually doing so it would

No, you're right. I got confused with PR KVM. I'm surprised QEMU is able to 
resolve anything at all really, without access to the HTAB.

But it probably just saw that MSR.DR=0, so it simply used the real mode 
algorithm to read the data which happened to work correctly, as the virtual 
address is a valid real mode address as well.

Sorry for the incorrect assumption.

 need to be using the relatively new KVM_PPC_GET_HTAB_FD ioctl, and I
 thought the only place that was used was in the migration code.
 
 To debug this sort of thing, what I usually do is patch the guest
 kernel to put a branch to self at 0x400.  Then when it hangs you have
 some chance of sorting out what happened using info registers etc.

Now if only it would happen a bit more often ;).

 I would be very interested to know how big a HPT the host kernel
 allocated for the guest and what was in it.  The host kernel prints a
 message telling you the size and location of the HPT, and in this sort

Yes. Unfortunately it doesn't tell me the PID though, so I have a hard time 
correlating the dmesg output with the VM. However, I'm pretty sure it's this 
one:

Jun 13 06:31:16 build65 kernel: KVM guest htab at c0012ae0 (order 19), 
LPID 4

That's a 512kb map, right? Sounds too small to me :).

 of situation I find it helpful to take a copy of it with dd and dump
 it with hexdump.

Too late this time around. I'll try to do it next time I see this happening :).

 
 Also, what page size are you using in the host kernel?  If it's 4k,
 then the guest kernel is limited to using 4k pages for the linear
 mapping, which can mean it runs out of space in the HPT for the linear
 mapping more easily.

In this case the host is running on 64k pages.

 Since you don't have my patch to add a flexible allocator for the HPT
 and RMA areas (you rejected it, if you recall), 

Re: Book3s_hv KVM HTAB bug

2013-06-13 Thread Benjamin Herrenschmidt
On Fri, 2013-06-14 at 01:58 +0200, Alexander Graf wrote:
 I don't think that preallocating yet another potentially fragmented
 pool of bigger memory chunks - which your patch did - is the answer to
 this problem. We just need to defragment normal system memory and
 delay HPT creation until it's ready. It can't be that hard.
   ^
Rght

Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html