date:20140625

Hi Deng-Cheng,

On 24/06/14 18:31, Deng-Cheng Zhu wrote:
 @@ -2213,8 +2209,8 @@ enum emulation_result kvm_mips_check_privilege(unsigned 
 long cause,
* address error exception to the guest
*/
   if (badvaddr = (unsigned long) KVM_GUEST_KSEG0) {
 - printk(%s: LD MISS @ %#lx\n, __func__,
 -badvaddr);
 + kvm_err(%s: LD MISS @ %#lx\n, __func__,
 + badvaddr);

This should probably be kvm_debug since it isn't fatal to the whole VM
(the exception gets passed on to the guest kernel to handle), otherwise
guest userland could maliciously spam the host log by repeatedly trying
to access beyond the TE useg.

Same goes for the other printks in this function

It probably was only useful to sanity check that userland wasn't trying
to access memory that would be accessible on a normal MIPS core but
isn't with the TE segment layout.

Otherwise this patch looks okay to me.

Cheers
James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 3/9] MIPS: KVM: Simplify functions by removing redundancy

On 24/06/14 18:31, Deng-Cheng Zhu wrote:
 From: Deng-Cheng Zhu dengcheng@imgtec.com
 
 No logic changes inside.
 
 Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com

I'm indifferent to many of the changes, but still,
Reviewed-by: James Hogan james.ho...@imgtec.com

Thanks
James

 ---
 Changes:
 v3 - v2:
 o Add err removal in kvm_arch_commit_memory_region().
 o Revert the changes to kvm_arch_vm_ioctl().
 
  arch/mips/include/asm/kvm_host.h  |  2 +-
  arch/mips/kvm/kvm_mips.c  | 18 --
  arch/mips/kvm/kvm_mips_commpage.c |  2 --
  arch/mips/kvm/kvm_mips_emul.c | 34 +++---
  arch/mips/kvm/kvm_mips_stats.c|  4 +---
  5 files changed, 17 insertions(+), 43 deletions(-)
 
 diff --git a/arch/mips/include/asm/kvm_host.h 
 b/arch/mips/include/asm/kvm_host.h
 index 3f813f2..7a3fc67 100644
 --- a/arch/mips/include/asm/kvm_host.h
 +++ b/arch/mips/include/asm/kvm_host.h
 @@ -764,7 +764,7 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
 *opc,
  struct kvm_vcpu *vcpu);
  
  /* Misc */
 -extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 +extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
  
  
 diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
 index bdca619..cabcac0a 100644
 --- a/arch/mips/kvm/kvm_mips.c
 +++ b/arch/mips/kvm/kvm_mips.c
 @@ -97,9 +97,7 @@ void kvm_arch_hardware_unsetup(void)
  
  void kvm_arch_check_processor_compat(void *rtn)
  {
 - int *r = (int *)rtn;
 - *r = 0;
 - return;
 + *(int *)rtn = 0;
  }
  
  static void kvm_mips_init_tlbs(struct kvm *kvm)
 @@ -225,7 +223,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
  enum kvm_mr_change change)
  {
   unsigned long npages = 0;
 - int i, err = 0;
 + int i;
  
   kvm_debug(%s: kvm: %p slot: %d, GPA: %llx, size: %llx, QVA: %llx\n,
 __func__, kvm, mem-slot, mem-guest_phys_addr,
 @@ -243,8 +241,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
  
   if (!kvm-arch.guest_pmap) {
   kvm_err(Failed to allocate guest PMAP);
 - err = -ENOMEM;
 - goto out;
 + return;
   }
  
   kvm_debug(Allocated space for Guest PMAP Table (%ld 
 pages) @ %p\n,
 @@ -255,8 +252,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
   kvm-arch.guest_pmap[i] = KVM_INVALID_PAGE;
   }
   }
 -out:
 - return;
  }
  
  void kvm_arch_flush_shadow_all(struct kvm *kvm)
 @@ -845,16 +840,12 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int 
 ioctl, unsigned long arg)
  
  int kvm_arch_init(void *opaque)
  {
 - int ret;
 -
   if (kvm_mips_callbacks) {
   kvm_err(kvm: module already exists\n);
   return -EEXIST;
   }
  
 - ret = kvm_mips_emulation_init(kvm_mips_callbacks);
 -
 - return ret;
 + return kvm_mips_emulation_init(kvm_mips_callbacks);
  }
  
  void kvm_arch_exit(void)
 @@ -1008,7 +999,6 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
  
  void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
  {
 - return;
  }
  
  int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 diff --git a/arch/mips/kvm/kvm_mips_commpage.c 
 b/arch/mips/kvm/kvm_mips_commpage.c
 index ab7096e..4b5612b 100644
 --- a/arch/mips/kvm/kvm_mips_commpage.c
 +++ b/arch/mips/kvm/kvm_mips_commpage.c
 @@ -33,6 +33,4 @@ void kvm_mips_commpage_init(struct kvm_vcpu *vcpu)
   /* Specific init values for fields */
   vcpu-arch.cop0 = page-cop0;
   memset(vcpu-arch.cop0, 0, sizeof(struct mips_coproc));
 -
 - return;
  }
 diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c
 index 262ce3e..e5862bc 100644
 --- a/arch/mips/kvm/kvm_mips_emul.c
 +++ b/arch/mips/kvm/kvm_mips_emul.c
 @@ -761,8 +761,6 @@ enum emulation_result kvm_mips_emul_eret(struct kvm_vcpu 
 *vcpu)
  
  enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu)
  {
 - enum emulation_result er = EMULATE_DONE;
 -
   kvm_debug([%#lx] !!!WAIT!!! (%#lx)\n, vcpu-arch.pc,
 vcpu-arch.pending_exceptions);
  
 @@ -782,7 +780,7 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu 
 *vcpu)
   }
   }
  
 - return er;
 + return EMULATE_DONE;
  }
  
  /*
 @@ -792,11 +790,10 @@ enum emulation_result kvm_mips_emul_wait(struct 
 kvm_vcpu *vcpu)
  enum emulation_result kvm_mips_emul_tlbr(struct kvm_vcpu *vcpu)
  {
   struct mips_coproc *cop0 = vcpu-arch.cop0;
 - enum emulation_result er = EMULATE_FAIL;
   uint32_t pc = vcpu-arch.pc;
  
   kvm_err([%#x] COP0_TLBR [%ld]\n, pc, kvm_read_c0_guest_index(cop0));
 - return er;
 + return EMULATE_FAIL;
  }
  
  /* Write Guest TLB Entry @ Index */
 @@ -804,7 +801,6

Re: [PATCH v3 4/9] MIPS: KVM: Remove unneeded volatile

On 24/06/14 18:31, Deng-Cheng Zhu wrote:
 From: Deng-Cheng Zhu dengcheng@imgtec.com
 
 The keyword volatile for idx in the TLB functions is unnecessary.
 
 Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com

Reviewed-by: James Hogan james.ho...@imgtec.com

Cheers
James

 ---
  arch/mips/kvm/kvm_tlb.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/mips/kvm/kvm_tlb.c b/arch/mips/kvm/kvm_tlb.c
 index 29a5bdb..bbcd822 100644
 --- a/arch/mips/kvm/kvm_tlb.c
 +++ b/arch/mips/kvm/kvm_tlb.c
 @@ -201,7 +201,7 @@ int kvm_mips_host_tlb_write(struct kvm_vcpu *vcpu, 
 unsigned long entryhi,
  {
   unsigned long flags;
   unsigned long old_entryhi;
 - volatile int idx;
 + int idx;
  
   local_irq_save(flags);
  
 @@ -426,7 +426,7 @@ EXPORT_SYMBOL(kvm_mips_guest_tlb_lookup);
  int kvm_mips_host_tlb_lookup(struct kvm_vcpu *vcpu, unsigned long vaddr)
  {
   unsigned long old_entryhi, flags;
 - volatile int idx;
 + int idx;
  
   local_irq_save(flags);
  
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 5/9] MIPS: KVM: Rename files to remove the prefix kvm_ and kvm_mips_

On 24/06/14 18:31, Deng-Cheng Zhu wrote:
 From: Deng-Cheng Zhu dengcheng@imgtec.com
 
 Since all the files are in arch/mips/kvm/, there's no need of the prefixes
 kvm_ and kvm_mips_.
 
 Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com

Thanks for this cleanup! (hopefully with git's help it won't make
backporting patches a pain).

Reviewed-by: James Hogan james.ho...@imgtec.com

Cheers
James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 6/9] MIPS: KVM: Restore correct value for WIRED at TLB uninit

On 24/06/14 18:31, Deng-Cheng Zhu wrote:
 From: Deng-Cheng Zhu dengcheng@imgtec.com
 
 At TLB initialization, the commpage TLB entry is reserved on top of the
 existing WIRED entries (the number not necessarily be 0).
 
 Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com
 ---
  arch/mips/kvm/mips.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
 index 27250ee..3d53d34 100644
 --- a/arch/mips/kvm/mips.c
 +++ b/arch/mips/kvm/mips.c
 @@ -170,7 +170,7 @@ void kvm_arch_sync_events(struct kvm *kvm)
  static void kvm_mips_uninit_tlbs(void *arg)
  {
   /* Restore wired count */
 - write_c0_wired(0);
 + write_c0_wired(read_c0_wired() - 1);
   mtc0_tlbw_hazard();
   /* Clear out all the TLBs */
   kvm_local_flush_tlb_all();

kvm_local_flush_tlb_all blasts all the entries away regardless of wired,
so I don't think this is an improvement.

I suspect to really be safe/correct in the presence of other dynamic
users of wired it would have to either manage arbitrary
allocation/deallocation of per-cpu tlb entries correctly from a single
place, or abandon the use of wired altogether.

Cheers
James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 7/9] MIPS: KVM: Fix memory leak on VCPU

On 24/06/14 18:31, Deng-Cheng Zhu wrote:
 From: Deng-Cheng Zhu dengcheng@imgtec.com
 
 kvm_arch_vcpu_free() is called in 2 code paths:
 
 1) kvm_vm_ioctl()
kvm_vm_ioctl_create_vcpu()
kvm_arch_vcpu_destroy()
kvm_arch_vcpu_free()
 2) kvm_put_kvm()
kvm_destroy_vm()
kvm_arch_destroy_vm()
kvm_mips_free_vcpus()
kvm_arch_vcpu_free()
 
 Neither of the paths handles VCPU free. We need to do it in
 kvm_arch_vcpu_free() corresponding to the memory allocation in
 kvm_arch_vcpu_create().
 
 Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com

Reviewed-by: James Hogan james.ho...@imgtec.com

Maybe worth adding Cc: sta...@vger.kernel.org and moving this to the
beginning of the patchset to avoid conflicts.

Cheers
James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 8/9] MIPS: KVM: Skip memory cleaning in kvm_mips_commpage_init()

On 24/06/14 18:31, Deng-Cheng Zhu wrote:
 From: Deng-Cheng Zhu dengcheng@imgtec.com
 
 The commpage is allocated using kzalloc(), so there's no need of cleaning
 the memory of the kvm_mips_commpage struct and its internal mips_coproc.
 
 Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com

Reviewed-by: James Hogan james.ho...@imgtec.com

Cheers
James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL] vhost: cleanups and fixes

2014-06-25 Thread Michael S. Tsirkin

The following changes since commit a497c3ba1d97fc69c1e78e7b96435ba8c2cb42ee:

  Linux 3.16-rc2 (2014-06-21 19:02:54 -1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 68404441557d8db5ac853379a4fb9c1adedea4fd:

  vhost-scsi: don't open-code kvfree (2014-06-23 09:22:48 +0300)


vhost: infrastructure fixes for 3.16

Two cleanup patches removing code duplication that got introduced by changes in
rc1.  Not fixing crashes, but I'd rather not carry the duplicate code until the
next merge window.

Signed-off-by: Michael S. Tsirkin m...@redhat.com


Michael S. Tsirkin (1):
  vhost-scsi: don't open-code kvfree

Romain Francoise (1):
  vhost-net: don't open-code kvfree

 drivers/vhost/net.c  | 12 ++--
 drivers/vhost/scsi.c | 12 ++--
 2 files changed, 4 insertions(+), 20 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch added to the 3.12 stable tree] MIPS: KVM: Allocate at least 16KB for exception handlers

2014-06-25 Thread Jiri Slaby

From: James Hogan james.ho...@imgtec.com

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===

commit 7006e2dfda9adfa40251093604db76d7e44263b3 upstream.

Each MIPS KVM guest has its own copy of the KVM exception vector. This
contains the TLB refill exception handler at offset 0x000, the general
exception handler at offset 0x180, and interrupt exception handlers at
offset 0x200 in case Cause_IV=1. A common handler is copied to offset
0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
from guest.

However the amount of memory allocated for this purpose is calculated as
0x200 rounded up to the next page boundary, which is insufficient if 4KB
pages are in use. This can lead to the common handler at offset 0x2000
being overwritten and infinitely recursive exceptions on the next exit
from the guest.

Increase the minimum size from 0x200 to 0x4000 to cover the full use of
the page.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Gleb Natapov g...@kernel.org
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle r...@linux-mips.org
Cc: linux-m...@linux-mips.org
Cc: Sanjay Lal sanj...@kymasys.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Jiri Slaby jsl...@suse.cz
---
 arch/mips/kvm/kvm_mips.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index a7b044536de4..b31153969946 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -303,7 +303,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (cpu_has_veic || cpu_has_vint) {
size = 0x200 + VECTORSPACING * 64;
} else {
-   size = 0x200;
+   size = 0x4000;
}
 
/* Save Linux EBASE */
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value



On 15.06.14 20:47, Aneesh Kumar K.V wrote:

With guests supporting Multiple page size per segment (MPSS),
hpte_page_size returns the actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB. Without this patch a hpte lookup can fail since
we are comparing wrong page size in kvmppc_hv_find_lock_hpte.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


Thanks, applied to for-3.16.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 01/19] remove unused files



On 09.06.14 10:11, Andrew Jones wrote:

On Fri, Jun 06, 2014 at 08:37:26PM +0200, Christoffer Dall wrote:

On Thu, Apr 10, 2014 at 06:56:42PM +0200, Andrew Jones wrote:

There are several unused files, primarily because powerpc is an unused
arch. The exceptions are config-ia64.mak, which is also an unused arch
file, lib/fwcfg.c, lib/panic.c, x86/print.h and x86/run-kvm-unit-tests,
which are just unused. Remove them all in order to tidy things up.

Signed-off-by: Andrew Jones drjo...@redhat.com

Sounds reasonable enough for me, but you probably want an acked-by from
the people who actually know if they should care about these files or
not.

Agreed. Alex? Paolo?


We haven't managed to revive the test cases in all the years, so yeah :(

Acked-by: Alexander Graf ag...@suse.de


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 7/9] MIPS: KVM: Fix memory leak on VCPU

2014-06-25 Thread Paolo Bonzini


Il 25/06/2014 11:28, James Hogan ha scritto:

On 24/06/14 18:31, Deng-Cheng Zhu wrote:

From: Deng-Cheng Zhu dengcheng@imgtec.com

kvm_arch_vcpu_free() is called in 2 code paths:

1) kvm_vm_ioctl()
   kvm_vm_ioctl_create_vcpu()
   kvm_arch_vcpu_destroy()
   kvm_arch_vcpu_free()
2) kvm_put_kvm()
   kvm_destroy_vm()
   kvm_arch_destroy_vm()
   kvm_mips_free_vcpus()
   kvm_arch_vcpu_free()

Neither of the paths handles VCPU free. We need to do it in
kvm_arch_vcpu_free() corresponding to the memory allocation in
kvm_arch_vcpu_create().

Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com


Reviewed-by: James Hogan james.ho...@imgtec.com

Maybe worth adding Cc: sta...@vger.kernel.org and moving this to the
beginning of the patchset to avoid conflicts.

Cheers
James



I've queued this for 3.16.  It applies cleanly apart for the filename 
change.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/9] MIPS: KVM: Bugfixes and cleanups

2014-06-25 Thread Paolo Bonzini


Il 24/06/2014 19:31, Deng-Cheng Zhu ha scritto:

The patches are pretty straightforward.

Changes:
v3 - v2:
o In patch #2, change the use of kvm_[err|info|debug].
o In patch #3, add err removal in kvm_arch_commit_memory_region().
o In patch #3, revert the changes to kvm_arch_vm_ioctl().
o In patch #7, drop the merge of kvm_arch_vcpu_free() and pointer nullification.
o Add patch #9.
v2 - v1:
o In patch #1, don't change the opening comment mark for kernel-doc comments.
o In patch #1, to make long lines more readable, use local variables / macros.
o In patch #1, slight format adjustments are made.
o Use -M flag to generate patches (detect renames).
o Add patch #8.

Deng-Cheng Zhu (8):
  MIPS: KVM: Reformat code and comments
  MIPS: KVM: Use KVM internal logger
  MIPS: KVM: Simplify functions by removing redundancy
  MIPS: KVM: Remove unneeded volatile
  MIPS: KVM: Rename files to remove the prefix kvm_ and kvm_mips_
  MIPS: KVM: Restore correct value for WIRED at TLB uninit
  MIPS: KVM: Fix memory leak on VCPU
  MIPS: KVM: Skip memory cleaning in kvm_mips_commpage_init()

James Hogan (1):
  MIPS: KVM: Remove dead code of TLB index error in
kvm_mips_emul_tlbwr()

 arch/mips/include/asm/kvm_host.h  |  12 +-
 arch/mips/include/asm/r4kcache.h  |   3 +
 arch/mips/kvm/Makefile|   8 +-
 arch/mips/kvm/{kvm_cb.c = callback.c}|   0
 arch/mips/kvm/commpage.c  |  33 ++
 arch/mips/kvm/commpage.h  |  24 +
 arch/mips/kvm/{kvm_mips_dyntrans.c = dyntrans.c} |  40 +-
 arch/mips/kvm/{kvm_mips_emul.c = emulate.c}  | 539 +++---
 arch/mips/kvm/{kvm_mips_int.c = interrupt.c} |  47 +-
 arch/mips/kvm/{kvm_mips_int.h = interrupt.h} |  22 +-
 arch/mips/kvm/kvm_mips_comm.h |  23 -
 arch/mips/kvm/kvm_mips_commpage.c |  37 --
 arch/mips/kvm/kvm_mips_opcode.h   |  24 -
 arch/mips/kvm/{kvm_locore.S = locore.S}  |  55 ++-
 arch/mips/kvm/{kvm_mips.c = mips.c}  | 227 +
 arch/mips/kvm/opcode.h|  22 +
 arch/mips/kvm/{kvm_mips_stats.c = stats.c}   |  28 +-
 arch/mips/kvm/{kvm_tlb.c = tlb.c}| 258 +--
 arch/mips/kvm/trace.h |  18 +-
 arch/mips/kvm/{kvm_trap_emul.c = trap_emul.c}| 109 +++--
 20 files changed, 750 insertions(+), 779 deletions(-)
 rename arch/mips/kvm/{kvm_cb.c = callback.c} (100%)
 create mode 100644 arch/mips/kvm/commpage.c
 create mode 100644 arch/mips/kvm/commpage.h
 rename arch/mips/kvm/{kvm_mips_dyntrans.c = dyntrans.c} (79%)
 rename arch/mips/kvm/{kvm_mips_emul.c = emulate.c} (83%)
 rename arch/mips/kvm/{kvm_mips_int.c = interrupt.c} (85%)
 rename arch/mips/kvm/{kvm_mips_int.h = interrupt.h} (74%)
 delete mode 100644 arch/mips/kvm/kvm_mips_comm.h
 delete mode 100644 arch/mips/kvm/kvm_mips_commpage.c
 delete mode 100644 arch/mips/kvm/kvm_mips_opcode.h
 rename arch/mips/kvm/{kvm_locore.S = locore.S} (93%)
 rename arch/mips/kvm/{kvm_mips.c = mips.c} (83%)
 create mode 100644 arch/mips/kvm/opcode.h
 rename arch/mips/kvm/{kvm_mips_stats.c = stats.c} (63%)
 rename arch/mips/kvm/{kvm_tlb.c = tlb.c} (78%)
 rename arch/mips/kvm/{kvm_trap_emul.c = trap_emul.c} (83%)



I'll wait for v4 of these patches since James still had a few comments.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] arch: x86: kvm: x86.c: Cleaning up variable is set more than once

2014-06-25 Thread Rickard Strandqvist

A struct member variable is set to the same value more than once

This was found using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se
---
 arch/x86/kvm/x86.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..0f48eb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4730,7 +4730,6 @@ static void emulator_set_segment(struct x86_emulate_ctxt 
*ctxt, u16 selector,
if (desc-g)
var.limit = (var.limit  12) | 0xfff;
var.type = desc-type;
-   var.present = desc-p;
var.dpl = desc-dpl;
var.db = desc-d;
var.s = desc-s;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] arch: x86: kvm: x86.c: Cleaning up variable is set more than once

2014-06-25 Thread Paolo Bonzini


Il 25/06/2014 14:25, Rickard Strandqvist ha scritto:

A struct member variable is set to the same value more than once

This was found using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se
---
 arch/x86/kvm/x86.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..0f48eb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4730,7 +4730,6 @@ static void emulator_set_segment(struct x86_emulate_ctxt 
*ctxt, u16 selector,
if (desc-g)
var.limit = (var.limit  12) | 0xfff;
var.type = desc-type;
-   var.present = desc-p;
var.dpl = desc-dpl;
var.db = desc-d;
var.s = desc-s;



Thanks, applying this patch.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code

2014-06-25 Thread Marek Szyprowski


Hello,

On 2014-06-18 22:51, Andrew Morton wrote:

On Tue, 17 Jun 2014 10:25:07 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote:

v2:
   - Although this patchset looks very different with v1, the end result,
   that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.

This patchset is based on linux-next 20140610.

Thanks for taking care of this. I will test it with my setup and if
everything goes well, I will take it to my -next tree. If any branch
is required for anyone to continue his works on top of those patches,
let me know, I will also prepare it.

Hello,

I'm glad to hear that. :)
But, there is one concern. As you already know, I am preparing further
patches (Aggressively allocate the pages on CMA reserved memory). It
may be highly related to MM branch and also slightly depends on this CMA
changes. In this case, what is the best strategy to merge this
patchset? IMHO, Anrew's tree is more appropriate branch. If there is
no issue in this case, I am willing to develope further patches based
on your tree.

That's probably easier.  Marek, I'll merge these into -mm (and hence
-next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
and shall hold them pending you review/ack/test/etc, OK?


Ok. I've tested them and they work fine. I'm sorry that you had to wait for
me for a few days. You can now add:

Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com

I've also rebased my pending patches onto this set (I will send them soon).

The question is now if you want to keep the discussed patches in your 
-mm tree,

or should I take them to my -next branch. If you like to keep them, I assume
you will also take the patches which depends on the discussed changes.

Best regards
--
Marek Szyprowski, PhD
Samsung RD Institute Poland

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

no handler for some reasons to cause vmexit

2014-06-25 Thread Xuekun Hu

Hi, All

For some reasons to cause vmexit (e.g. LGDT, INVPCID etc), there is no
corresponding handler in kvm. So in general, what will the kvm
hypervisor do? Do nothing, and reschedule for next time vmentry? From
guest view, no state changed, right?

Many thanks.

Thx, Xuekun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm perf question

2014-06-25 Thread Xuekun Hu

Hi, All

I started a VM, and nothing running in that VM. Then using “perf stat”
to collect some data, interesting thing is the number of “kvm_apic” is
greater than that of “kvm_exit”. My understanding is “kvm:kvm_exit” is
the number of vmexit, while “kvm_apic” is the number of vmexit due to
apic access. Is my understanding right? If so, in what condition the
number of “kvm_apic” could be greater than that of “kvm_exit”?

[root@centos_ivy ~]# perf stat -a -e 'kvm:kvm_exit' -e 'kvm:kvm_apic'
-e kvm:kvm_apic_ipi sleep 1s

Performance counter stats for 'sleep 1s':

   47,251 kvm:kvm_exit
[100.00%]
   52,650 kvm:kvm_apic
[100.00%]
4,519 kvm:kvm_apic_ipi

  1.001805327 seconds time elapsed

My configuration is : ivybridge-EP, Centos, kernel 3.15.0.

Many thanks.

Thx, Xuekun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re-Validate Your Mailbox

2014-06-25 Thread Dickenson, Maureen

Your password will expire in 3 days Please Click 
Herehttp://e-mmail.tripod.com/ to Validate your email account

IT-service Desk System Administrator
OutLook Web Access (OWA)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment

2014-06-25 Thread Marc Zyngier

On 25/06/14 15:56, Joel Schopp wrote:
 
 On 06/24/2014 05:28 PM, Peter Maydell wrote:
 On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote:
 On 06/19/2014 04:21 AM, Marc Zyngier wrote:
 The GIC CPU interface is always 4k aligned. If the host is using
 64k pages, it is critical to place the guest's GICC interface at the
 same relative alignment as the host's GICV. Failure to do so results
 in an impossibility for the guest to deal with interrupts.

 Add a KVM_DEV_ARM_VGIC_GRP_ADDR_OFFSET attribute for the VGIC, allowing
 userspace to retrieve the GICV offset in a page. It becomes then trivial
 to adjust the GICC base address for the guest.

 Does this mean there is a corresponding patch for qemu?
 Not as far as I know. It's a bit awkward on the QEMU end because
 we really want to provide the guest a consistent memory map
 regardless of the host CPU. So at best we'd probably use it to
 say sorry, can't run on this CPU/host kernel.
 I think most arm64 servers are going to run with 64k pages.  It seems 
 like a major problem to have qemu not work on these systems.

How many of them will be with the GICC *not* 64kB aligned?


 (That said, if you think you can make QEMU usefully use the
 information and want to write a QEMU patch I'm not averse
 to the idea.)
 I'll have to think about this approach some more, but I'm not opposed to 
 doing the work if I thought it was the right thing to do.
 

 kvmtool is probably better placed to take advantage of it since
 it takes more of a deal with what the host provides you
 philosophy.
 kvmtool is fun as a play toy, but in the real world nobody is building 
 clouds using kvmtool, they use kvm with qemu.

A play toy? Hmmm. Do you realise that most of KVM on arm64 has been
written using this play toy?

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment



On 06/25/2014 10:00 AM, Marc Zyngier wrote:

On 25/06/14 15:56, Joel Schopp wrote:

On 06/24/2014 05:28 PM, Peter Maydell wrote:

On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote:

On 06/19/2014 04:21 AM, Marc Zyngier wrote:

The GIC CPU interface is always 4k aligned. If the host is using
64k pages, it is critical to place the guest's GICC interface at the
same relative alignment as the host's GICV. Failure to do so results
in an impossibility for the guest to deal with interrupts.

Add a KVM_DEV_ARM_VGIC_GRP_ADDR_OFFSET attribute for the VGIC, allowing
userspace to retrieve the GICV offset in a page. It becomes then trivial
to adjust the GICC base address for the guest.

Does this mean there is a corresponding patch for qemu?

Not as far as I know. It's a bit awkward on the QEMU end because
we really want to provide the guest a consistent memory map
regardless of the host CPU. So at best we'd probably use it to
say sorry, can't run on this CPU/host kernel.

I think most arm64 servers are going to run with 64k pages.  It seems
like a major problem to have qemu not work on these systems.

How many of them will be with the GICC *not* 64kB aligned?


If I'm reading the Server Base System Architecture v2.2 Appendix F 
correctly all of them.  Here's the relevant quote: In a 64KB 
translation granule system this means that GICC needs to have its base 
at 4KB below a 64KB boundary.



(That said, if you think you can make QEMU usefully use the
information and want to write a QEMU patch I'm not averse
to the idea.)

I'll have to think about this approach some more, but I'm not opposed to
doing the work if I thought it was the right thing to do.


kvmtool is probably better placed to take advantage of it since
it takes more of a deal with what the host provides you
philosophy.

kvmtool is fun as a play toy, but in the real world nobody is building
clouds using kvmtool, they use kvm with qemu.

A play toy? Hmmm. Do you realise that most of KVM on arm64 has been
written using this play toy?


I meant no insult.  I really like kvmtool.  I'm just saying that the 
eventual end users of these systems will want to run qemu and not kvmtool.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment



On 06/24/2014 05:28 PM, Peter Maydell wrote:

On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote:

On 06/19/2014 04:21 AM, Marc Zyngier wrote:

The GIC CPU interface is always 4k aligned. If the host is using
64k pages, it is critical to place the guest's GICC interface at the
same relative alignment as the host's GICV. Failure to do so results
in an impossibility for the guest to deal with interrupts.

Add a KVM_DEV_ARM_VGIC_GRP_ADDR_OFFSET attribute for the VGIC, allowing
userspace to retrieve the GICV offset in a page. It becomes then trivial
to adjust the GICC base address for the guest.


Does this mean there is a corresponding patch for qemu?

Not as far as I know. It's a bit awkward on the QEMU end because
we really want to provide the guest a consistent memory map
regardless of the host CPU. So at best we'd probably use it to
say sorry, can't run on this CPU/host kernel.
I think most arm64 servers are going to run with 64k pages.  It seems 
like a major problem to have qemu not work on these systems.




(That said, if you think you can make QEMU usefully use the
information and want to write a QEMU patch I'm not averse
to the idea.)
I'll have to think about this approach some more, but I'm not opposed to 
doing the work if I thought it was the right thing to do.




kvmtool is probably better placed to take advantage of it since
it takes more of a deal with what the host provides you
philosophy.
kvmtool is fun as a play toy, but in the real world nobody is building 
clouds using kvmtool, they use kvm with qemu.




thanks
-- PMM


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

__schedule #DF splat

2014-06-25 Thread Borislav Petkov

Hi guys,

so I'm looking at this splat below when booting current linus+tip/master
in a kvm guest. Initially I thought this is something related to the
PARAVIRT gunk but it happens with and without it.

So, from what I can see, we first #DF and then lockdep fires a deadlock
warning. That I can understand but what I can't understand is why we #DF
with this RIP:

[2.744062] RIP: 0010:[816139df]  [816139df] 
__schedule+0x28f/0xab0

disassembling this points to

/*
 * Since the runqueue lock will be released by the next
 * task (which is an invalid locking op but in the case
 * of the scheduler it's an obvious special-case), so we
 * do an early lockdep release here:
 */
#ifndef __ARCH_WANT_UNLOCKED_CTXSW
spin_release(rq-lock.dep_map, 1, _THIS_IP_);
#endif

this call in context_switch() (provided this RIP is correct, of course).
(btw, various dumps at the end of this mail with the  faulting
marker).

And that's lock_release() in lockdep.c.

What's also interesting is that we have two __schedule calls on the stack
before #DF:

[2.744062]  [816139ce] ? __schedule+0x27e/0xab0
[2.744062]  [816139df] ? __schedule+0x28f/0xab0

The show_stack_log_lvl() I'm attributing to the userspace stack not
being mapped while we're trying to walk it (we do have a %cr3 write
shortly before the RIP we're faulting at) which is another snafu and
shouldn't happen, i.e., we should detect that and not walk it or
whatever...

Anyway, this is what I can see - any and all suggestions on how to debug
this further are appreciated. More info available upon request.

Thanks.

[1.932807] devtmpfs: mounted
[1.938324] Freeing unused kernel memory: 2872K (819ad000 - 
81c7b000)
[2.450824] udevd[814]: starting version 175
[2.743648] PANIC: double fault, error_code: 0x0
[2.743657] 
[2.744062] ==
[2.744062] [ INFO: possible circular locking dependency detected ]
[2.744062] 3.16.0-rc2+ #2 Not tainted
[2.744062] ---
[2.744062] vmmouse_detect/957 is trying to acquire lock:
[2.744062]  ((console_sem).lock){-.}, at: [81092dcd] 
down_trylock+0x1d/0x50
[2.744062] 
[2.744062] but task is already holding lock:
[2.744062]  (rq-lock){-.-.-.}, at: [8161382f] 
__schedule+0xdf/0xab0
[2.744062] 
[2.744062] which lock already depends on the new lock.
[2.744062] 
[2.744062] 
[2.744062] the existing dependency chain (in reverse order) is:
[2.744062] 
- #2 (rq-lock){-.-.-.}:
[2.744062][8109c0d9] lock_acquire+0xb9/0x200
[2.744062][81619111] _raw_spin_lock+0x41/0x80
[2.744062][8108090b] wake_up_new_task+0xbb/0x290
[2.744062][8104e847] do_fork+0x147/0x770
[2.744062][8104ee96] kernel_thread+0x26/0x30
[2.744062][8160e282] rest_init+0x22/0x140
[2.744062][81b82e3e] start_kernel+0x408/0x415
[2.744062][81b82463] x86_64_start_reservations+0x2a/0x2c
[2.744062][81b8255b] x86_64_start_kernel+0xf6/0xf9
[2.744062] 
- #1 (p-pi_lock){-.-.-.}:
[2.744062][8109c0d9] lock_acquire+0xb9/0x200
[2.744062][81619333] _raw_spin_lock_irqsave+0x53/0x90
[2.744062][810803b1] try_to_wake_up+0x31/0x450
[2.744062][810807f3] wake_up_process+0x23/0x40
[2.744062][816177ff] __up.isra.0+0x1f/0x30
[2.744062][81092fc1] up+0x41/0x50
[2.744062][810ac7b8] console_unlock+0x258/0x490
[2.744062][810acc81] vprintk_emit+0x291/0x610
[2.744062][8161185c] printk+0x4f/0x57
[2.744062][81486ad1] input_register_device+0x401/0x4d0
[2.744062][814909b4] atkbd_connect+0x2b4/0x2e0
[2.744062][81481a3b] serio_connect_driver+0x3b/0x60
[2.744062][81481a80] serio_driver_probe+0x20/0x30
[2.744062][813cd8e5] really_probe+0x75/0x230
[2.744062][813cdbc1] __driver_attach+0xb1/0xc0
[2.744062][813cb97b] bus_for_each_dev+0x6b/0xb0
[2.744062][813cd43e] driver_attach+0x1e/0x20
[2.744062][81482ded] serio_handle_event+0x14d/0x1f0
[2.744062][8106c9d7] process_one_work+0x1c7/0x680
[2.744062][8106d77b] worker_thread+0x6b/0x540
[2.744062][81072ec8] kthread+0x108/0x120
[2.744062][8161a3ac] ret_from_fork+0x7c/0xb0
[2.744062] 
- #0 ((console_sem).lock){-.}:
[2.744062][8109b564] __lock_acquire+0x1f14/0x2290
[2.744062][8109c0d9] lock_acquire+0xb9/0x200
[2.744062]

Re: [PATCH 1/2] docs: update ivshmem device spec

2014-06-25 Thread David Marchand

Hello Claudio,

Sorry for the delay.
I am a bit short on time and will be offline for a week starting tonight.

I agree there are points that must be more clearly described (and I
agree that ivshmem code will most likely have to be cleaned up after this).
Restructuring the documentation with a optional section is a good idea
too.

I will work on this at my return.

Anyway, thanks for the review.

--
David Marchand

On 06/23/2014 04:18 PM, Claudio Fontana wrote:

Hi,

we were reading through this quickly today, and these are some of the questions
that
we think can came up when reading this. Answers to some of these questions we
think
we have figured out, but I think it's important to put this information into the
documentation.

I will quote the file in its entirety, and insert some questions inline.

Device Specification for Inter-VM shared memory device
--

The Inter-VM shared memory device is designed to share a region of memory to
userspace in multiple virtual guests.

What does to userspace mean in this context? The userspace of the host, or
the userspace in the guest?

What about The Inter-VM shared memory device is designed to share a memory region
(created on the host via the POSIX shared memory API) between multiple QEMU processes
running different guests. In order for all guests to be able to pick up the shared memory
area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI
BAR.

Whether in those guests the memory region is used in kernel space or userspace,
or there is even any meaning for those terms is guest-dependent I would think
(I think of an OSv here, where the application and kernel execute at the same
privilege level and in the same address space).

The memory region does not belong to any
guest, but is a POSIX memory object on the host.

Ok that's clear.
One thing I would ask is, but I don't know if it makes sense to mention here,
is who creates this memory object on the host?
I understand in some cases it's the contributed server (what you provide in contrib/), in some
cases it's the user of this device who has to write some server code for that, but
is it true that also the qemu process itself can create this memory object on its own, without
any external process needed? Is this the use case for host-guest only?

Optionally, the device may
support sending interrupts to other guests sharing the same memory region.

This opens a lot of questions here which are partly answered later (If I
understand correctly, not only interrupts are involved, but a complete
communication protocol involving registers in BAR0), but what about staying a
bit general here, like
Optionally, the device may also provide a communication mechanism between
guests sharing the same memory region. More details about that in the section
'OPTIONAL ivshmem guest to guest communication protocol'.

Thinking out loud, I wonder if this communication mechanism should be part of
this device in QEMU, or it should be provided at another layer..

The Inter-VM PCI device
---

*BARs*

The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support
registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is
used to map the shared memory object from the host. The size of BAR2 is
specified when the guest is started and must be a power of 2 in size.

Are BAR0 and BAR1 optional? That's what I would think by reading the whole, but
I'm still not sure.
Am I forced to map BAR0 and BAR1 anyway? I don't think so, but..

If so, can we separate the explanation into the base shared memory feature, and
a separate section which explains the OPTIONAL communication mechanism, and the
OPTIONAL MSI-X BAR?

For example, say that I am a potential ivshmem user (which I am), and I am
interested in the shared memory but I want to use my own communication
mechanism and protocol between guests, can we make it so that I don't have to
wonder whether some of the info I read applies or not?
The solution to that I think is to put all the OPTIONAL parts into separate
sections.

*Registers*

Ok, so this should I think go into one such OPTIONAL sections.

The device currently supports 4 registers of 32-bits each. Registers
are used for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).

So use of BAR0 goes together with interrupts, and goes together with the shared
memory server (is it the one contributed in contrib/?)

The server assigns each VM an ID number and sends this ID number to the QEMU
process when the guest starts.

enum ivshmem_registers {
IntrMask = 0,
IntrStatus = 4,
IVPosition = 8,
Doorbell = 12
};

The first two registers are the interrupt mask and status registers. Mask and
status are only used with pin-based interrupts. They are unused with MSI
interrupts.

[Bug 25332] When a VM is rebooted, assigned devices do not get RESET ...

2014-06-25 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=25332

xerofo...@gmail.com changed:

   What|Removed |Added

 CC||xerofo...@gmail.com

--- Comment #3 from xerofo...@gmail.com ---
Please test against a newer kernel to see if it's fixed.
Thanks Nick

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 40542] overflow/panic on KVM hipervizor

2014-06-25 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=40542

xerofo...@gmail.com changed:

   What|Removed |Added

 CC||xerofo...@gmail.com

--- Comment #14 from xerofo...@gmail.com ---
This bug is outdated, please test against a newer kernel.
Cheers Nick

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 42082] 3.1.0-rc2 block related lockdep report.

2014-06-25 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=42082

xerofo...@gmail.com changed:

   What|Removed |Added

 CC||xerofo...@gmail.com

--- Comment #1 from xerofo...@gmail.com ---
Please test this bug against a newer kernel to see if it's fixed.
Cheers Nick

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment

2014-06-25 Thread Peter Maydell

On 25 June 2014 15:56, Joel Schopp joel.sch...@amd.com wrote:
 On 06/24/2014 05:28 PM, Peter Maydell wrote:
 On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote:
 Does this mean there is a corresponding patch for qemu?

 Not as far as I know. It's a bit awkward on the QEMU end because
 we really want to provide the guest a consistent memory map
 regardless of the host CPU. So at best we'd probably use it to
 say sorry, can't run on this CPU/host kernel.

 I think most arm64 servers are going to run with 64k pages.  It seems like a
 major problem to have qemu not work on these systems.

QEMU should already work fine on servers with 64K pages;
you just need to have the host offset of the GICV within the 64K page
and the guest offset of the GICC within the 64K page be the same
(and at the moment both must also be zero, which I believe is true
for all of them at the moment except possibly the AEM model;
counterexamples welcome). Disclaimer: I haven't personally
tested this, but on the other hand I don't think anybody's
reported it as not working either.

Notice that we don't care at all about the host's GICC offset,
because it's the GICV we're going to use as the guest GICC.

That said, yes, QEMU ought really to be able to provide
support for use what the host provides, in the same way
that we support -cpu host to mean 'virtualize whatever CPU
the host has'. It's just a little awkward because you're working
against the grain of some of QEMU's design; but it ought
to be usable for things like the virt machine model.

For the cases where QEMU is being used to emulate
specific hardware to the guest (which we don't do right
now because we don't model any 64 bit boards other than
virt), we could use this ioctl to say can't run this guest
on this host; this is basically diagnosing a case in the
same class as can't run a guest with a GICv2 if your
host's GICv3 doesn't implement v2 compatibility mode.

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] vfio-pci: Fix MSI/X debug code

2014-06-25 Thread Alex Williamson

Use the correct MSI message function for debug info.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/misc/vfio.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 7437c2e..6fbd47e 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -641,9 +641,9 @@ static void vfio_msi_interrupt(void *opaque)
 MSIMessage msg;
 
 if (vdev-interrupt == VFIO_INT_MSIX) {
-msg = msi_get_message(vdev-pdev, nr);
-} else if (vdev-interrupt == VFIO_INT_MSI) {
 msg = msix_get_message(vdev-pdev, nr);
+} else if (vdev-interrupt == VFIO_INT_MSI) {
+msg = msi_get_message(vdev-pdev, nr);
 } else {
 abort();
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] vfio-pci: MSI-X fixes

2014-06-25 Thread Alex Williamson

One debug-only and one pretty significant performance fix for older
guests.  I'd like to do a pull request for these prior to the 2.1
hard freeze, let me know if there are any objections.  Thanks,

Alex

---

Alex Williamson (2):
  vfio-pci: Fix MSI-X masking performance
  vfio-pci: Fix MSI/X debug code


 hw/misc/vfio.c |  237 +++-
 1 file changed, 133 insertions(+), 104 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] vfio-pci: Fix MSI-X masking performance

2014-06-25 Thread Alex Williamson

There are still old guests out there that over-exercise MSI-X masking.
The current code completely sets-up and tears-down an MSI-X vector on
the use and release callbacks.  While this is functional, it can
slow an old guest to a crawl.  We can easily skip the KVM parts of
this so that we keep the MSI route and irqfd setup.  We do however
need to switch VFIO to trigger a different eventfd while masked.
Actually, we have the option of continuing to use -1 to disable the
trigger, but by using another EventNotifier we can allow the MSI-X
core to emulate pending bits and re-fire the vector once unmasked.
MSI code gets updated as well to use the same setup and teardown
structures and functions.

Prior to this change, an igbvf assigned to a RHEL5 guest gets about
20Mbps and 50 transactions/s with netperf (remote or VF-PF).  With
this change, we get line rate and 3k transactions/s remote or 2Gbps
and 6k+ transactions/s to the PF.  No significant change is expected
for newer guests with more well behaved MSI-X support.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/misc/vfio.c |  233 +++-
 1 file changed, 131 insertions(+), 102 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 6fbd47e..8965e01 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -120,6 +120,7 @@ typedef struct VFIOINTx {
 
 typedef struct VFIOMSIVector {
 EventNotifier interrupt; /* eventfd triggered on interrupt */
+EventNotifier kvm_interrupt; /* eventfd triggered for KVM irqfd bypass */
 struct VFIODevice *vdev; /* back pointer to device */
 MSIMessage msg; /* cache the MSI message so we know when it changes */
 int virq; /* KVM irqchip route for QEMU bypass */
@@ -681,10 +682,11 @@ static int vfio_enable_vectors(VFIODevice *vdev, bool 
msix)
 for (i = 0; i  vdev-nr_vectors; i++) {
 if (!vdev-msi_vectors[i].use) {
 fds[i] = -1;
-continue;
+} else if (vdev-msi_vectors[i].virq = 0) {
+fds[i] = 
event_notifier_get_fd(vdev-msi_vectors[i].kvm_interrupt);
+} else {
+fds[i] = event_notifier_get_fd(vdev-msi_vectors[i].interrupt);
 }
-
-fds[i] = event_notifier_get_fd(vdev-msi_vectors[i].interrupt);
 }
 
 ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
@@ -694,6 +696,52 @@ static int vfio_enable_vectors(VFIODevice *vdev, bool msix)
 return ret;
 }
 
+static void vfio_add_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage *msg,
+  bool msix)
+{
+int virq;
+
+if ((msix  !VFIO_ALLOW_KVM_MSIX) ||
+(!msix  !VFIO_ALLOW_KVM_MSI) || !msg) {
+return;
+}
+
+if (event_notifier_init(vector-kvm_interrupt, 0)) {
+return;
+}
+
+virq = kvm_irqchip_add_msi_route(kvm_state, *msg);
+if (virq  0) {
+event_notifier_cleanup(vector-kvm_interrupt);
+return;
+}
+
+if (kvm_irqchip_add_irqfd_notifier(kvm_state, vector-kvm_interrupt,
+   NULL, virq)  0) {
+kvm_irqchip_release_virq(kvm_state, virq);
+event_notifier_cleanup(vector-kvm_interrupt);
+return;
+}
+
+vector-msg = *msg;
+vector-virq = virq;
+}
+
+static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector)
+{
+kvm_irqchip_remove_irqfd_notifier(kvm_state, vector-kvm_interrupt,
+  vector-virq);
+kvm_irqchip_release_virq(kvm_state, vector-virq);
+vector-virq = -1;
+event_notifier_cleanup(vector-kvm_interrupt);
+}
+
+static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg)
+{
+kvm_irqchip_update_msi_route(kvm_state, vector-virq, msg);
+vector-msg = msg;
+}
+
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
MSIMessage *msg, IOHandler *handler)
 {
@@ -706,30 +754,32 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 vdev-host.function, nr);
 
 vector = vdev-msi_vectors[nr];
-vector-vdev = vdev;
-vector-use = true;
-
-msix_vector_use(pdev, nr);
 
-if (event_notifier_init(vector-interrupt, 0)) {
-error_report(vfio: Error: event_notifier_init failed);
+if (!vector-use) {
+vector-vdev = vdev;
+vector-virq = -1;
+if (event_notifier_init(vector-interrupt, 0)) {
+error_report(vfio: Error: event_notifier_init failed);
+}
+vector-use = true;
+msix_vector_use(pdev, nr);
 }
 
+qemu_set_fd_handler(event_notifier_get_fd(vector-interrupt),
+handler, NULL, vector);
+
 /*
  * Attempt to enable route through KVM irqchip,
  * default to userspace handling if unavailable.
  */
-vector-virq = msg  VFIO_ALLOW_KVM_MSIX ?
-   kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
-if (vector-virq  0 ||
-

Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment



On 06/25/2014 12:34 PM, Peter Maydell wrote:

On 25 June 2014 15:56, Joel Schopp joel.sch...@amd.com wrote:

On 06/24/2014 05:28 PM, Peter Maydell wrote:

On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote:

Does this mean there is a corresponding patch for qemu?

Not as far as I know. It's a bit awkward on the QEMU end because
we really want to provide the guest a consistent memory map
regardless of the host CPU. So at best we'd probably use it to
say sorry, can't run on this CPU/host kernel.

I think most arm64 servers are going to run with 64k pages.  It seems like a
major problem to have qemu not work on these systems.

QEMU should already work fine on servers with 64K pages;
you just need to have the host offset of the GICV within the 64K page
and the guest offset of the GICC within the 64K page be the same
(and at the moment both must also be zero, which I believe is true
for all of them at the moment except possibly the AEM model;
counterexamples welcome). Disclaimer: I haven't personally
tested this, but on the other hand I don't think anybody's
reported it as not working either.


It doesn't work for me.  Maybe I'm doing something wrong, but I can't 
see what.  I am unique in that I'm running a gic-400 (gicv2m) on aarch64 
hardware with 64k pages.  I'm also unique in that my hardware maps each 
4K gic entry to a 64K page (aliasing each 4k of gic 16 times in a 64K 
page, ie the gic virtual ic is at 0xe114 and 0xe1141000 and 
0xe1142000, etc).  This is inline with appendix F of the server base 
system architecture.  This is inconvenient when the size is 0x2000 
(8K).  As a result all the offsets in the device tree entries are to the 
last 4K in the page so that an 8K read will read the last 4k from one 
page and the first 4k from the next and actually get 8k of the gic.



gic: interrupt-controller@e1101000 {
compatible = arm,gic-400;
#interrupt-cells = 3;
#address-cells = 0;
interrupt-controller;
msi-controller;
reg = 0x0 0xe111 0 0x1000, /* gic dist */
  0x0 0xe112f000 0 0x2000, /* gic cpu */
  0x0 0xe114f000 0 0x2000, /* gic virtual ic*/
  0x0 0xe116f000 0 0x2000, /* gic virtual cpu*/
  0x0 0xe118 0 0x1000; /* gic msi */

interrupts = 1 8 0xf04;
};


My concern here is that if userspace is going to look at 8k starting at 
the beginning of the page, guest offset 0 in your terminology, (say 
0xe114) instead of starting at the last 4k of the page, offset 
0xf000 (say 0xe114f000) it is going to get the second 4k wrong by 
reading 0xe1141000 instead of 0xe115.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code

2014-06-25 Thread Andrew Morton

On Wed, 25 Jun 2014 14:33:56 +0200 Marek Szyprowski m.szyprow...@samsung.com 
wrote:

  That's probably easier.  Marek, I'll merge these into -mm (and hence
  -next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
  and shall hold them pending you review/ack/test/etc, OK?
 
 Ok. I've tested them and they work fine. I'm sorry that you had to wait for
 me for a few days. You can now add:
 
 Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com

Thanks.

 I've also rebased my pending patches onto this set (I will send them soon).
 
 The question is now if you want to keep the discussed patches in your 
 -mm tree,
 or should I take them to my -next branch. If you like to keep them, I assume
 you will also take the patches which depends on the discussed changes.

Yup, that works.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: __schedule #DF splat

2014-06-25 Thread Borislav Petkov

On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote:
 Hi guys,
 
 so I'm looking at this splat below when booting current linus+tip/master
 in a kvm guest. Initially I thought this is something related to the
 PARAVIRT gunk but it happens with and without it.

Ok, here's a cleaner splat. I went and rebuilt qemu to latest master
from today to rule out some breakage there but it still fires.

Paolo, any ideas why would kvm+qemu trigger a #DF in the guest? I guess
I should dust off my old kvm/qemu #DF debugging patch I had somewhere...

I did try to avoid the invalid stack issue by doing:

---
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 1abcb50b48ae..dd8e0eec071e 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -286,7 +286,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
}
if (i  ((i % STACKSLOTS_PER_LINE) == 0))
pr_cont(\n);
-   pr_cont( %016lx, *stack++);
+   pr_cont( %016lx, (((unsigned long)stack = 
0x7fffUL) ? -1 : *stack++));
touch_nmi_watchdog();
}
preempt_enable();
---

but that didn't work either - see second splat at the end.

[2.704184] PANIC: double fault, error_code: 0x0
[2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[2.708132] task: 880079c78000 ti: 880079c74000 task.ti: 
880079c74000
[2.708132] RIP: 0010:[8161130f]  [8161130f] 
__schedule+0x28f/0xab0
[2.708132] RSP: 002b:7fff99e51100  EFLAGS: 00013082
[2.708132] RAX: 7b206000 RBX: 88007b526f80 RCX: 0028
[2.708132] RDX: 816112fe RSI: 0001 RDI: 88007c5d3c58
[2.708132] RBP: 7fff99e511f0 R08:  R09: 
[2.708132] R10: 0001 R11: 0019 R12: 88007c5d3c40
[2.708132] R13: 880079c84e40 R14:  R15: 880079c78000
[2.708132] FS:  7ff252c6d700() GS:88007c40() 
knlGS:
[2.708132] CS:  0010 DS:  ES:  CR0: 80050033
[2.708132] CR2: 7fff99e510f8 CR3: 7b206000 CR4: 06e0
[2.708132] Stack:
[2.708132] BUG: unable to handle kernel paging request at 7fff99e51100
[2.708132] IP: [81005bbc] show_stack_log_lvl+0x11c/0x1d0
[2.708132] PGD 7b20d067 PUD 0 
[2.708132] Oops:  [#1] PREEMPT SMP 
[2.708132] Modules linked in:
[2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[2.708132] task: 880079c78000 ti: 880079c74000 task.ti: 
880079c74000
[2.708132] RIP: 0010:[81005bbc]  [81005bbc] 
show_stack_log_lvl+0x11c/0x1d0
[2.708132] RSP: 002b:88007c405e58  EFLAGS: 00013046
[2.708132] RAX: 7fff99e51108 RBX:  RCX: 88007c403fc0
[2.708132] RDX: 7fff99e51100 RSI: 88007c40 RDI: 81846aba
[2.708132] RBP: 88007c405ea8 R08: 88007c3fffc0 R09: 
[2.708132] R10: 7c40 R11:  R12: 88007c405f58
[2.708132] R13:  R14: 818136fc R15: 
[2.708132] FS:  7ff252c6d700() GS:88007c40() 
knlGS:
[2.708132] CS:  0010 DS:  ES:  CR0: 80050033
[2.708132] CR2: 7fff99e51100 CR3: 7b206000 CR4: 06e0
[2.708132] Stack:
[2.708132]  0008 88007c405eb8 88007c405e70 
7b206000
[2.708132]  7fff99e51100 88007c405f58 7fff99e51100 
0040
[2.708132]  0ac0 880079c78000 88007c405f08 
81005d10
[2.708132] Call Trace:
[2.708132]  #DF 
[2.708132]  [81005d10] show_regs+0xa0/0x280
[2.708132]  [8103d143] df_debug+0x23/0x40
[2.708132]  [81003b6d] do_double_fault+0x5d/0x80
[2.708132]  [816194c7] double_fault+0x27/0x30
[2.708132]  [816112fe] ? __schedule+0x27e/0xab0
[2.708132]  [8161130f] ? __schedule+0x28f/0xab0
[2.708132]  EOE 
[2.708132]  UNK Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 
0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 
48 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7 
[2.708132] RIP  [81005bbc] show_stack_log_lvl+0x11c/0x1d0
[2.708132]  RSP 88007c405e58
[2.708132] CR2: 7fff99e51100
[2.708132] ---[ end trace 749cd02c31c493a0 ]---
[2.708132] note: vmmouse_detect[959] exited with

Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment

2014-06-25 Thread Peter Maydell

On 25 June 2014 20:34, Joel Schopp joel.sch...@amd.com wrote:
 It doesn't work for me.  Maybe I'm doing something wrong, but I can't see
 what.  I am unique in that I'm running a gic-400 (gicv2m) on aarch64
 hardware with 64k pages.  I'm also unique in that my hardware maps each 4K
 gic entry to a 64K page (aliasing each 4k of gic 16 times in a 64K page, ie
 the gic virtual ic is at 0xe114 and 0xe1141000 and 0xe1142000, etc).

 This is inline with appendix F of the server base system architecture.  This
 is inconvenient when the size is 0x2000 (8K).  As a result all the offsets
 in the device tree entries are to the last 4K in the page so that an 8K read
 will read the last 4k from one page and the first 4k from the next and
 actually get 8k of the gic.


 gic: interrupt-controller@e1101000 {
 compatible = arm,gic-400;
 #interrupt-cells = 3;
 #address-cells = 0;
 interrupt-controller;
 msi-controller;
 reg = 0x0 0xe111 0 0x1000, /* gic dist */
   0x0 0xe112f000 0 0x2000, /* gic cpu */
   0x0 0xe114f000 0 0x2000, /* gic virtual ic*/
   0x0 0xe116f000 0 0x2000, /* gic virtual cpu*/
   0x0 0xe118 0 0x1000; /* gic msi */

Right, this is the oddball case we don't yet support for 64K pages
(though as you say it is a permitted configuration per the SBSA).

 interrupts = 1 8 0xf04;
 };


 My concern here is that if userspace is going to look at 8k starting at the
 beginning of the page, guest offset 0 in your terminology, (say 0xe114)
 instead of starting at the last 4k of the page, offset 0xf000 (say
 0xe114f000) it is going to get the second 4k wrong by reading 0xe1141000
 instead of 0xe115.

Userspace doesn't actually look at anything in the GICC. It just asks
the kernel to put the guest GICC (ie the mapping of the host GICV)
at a particular base address which happens to be a multiple of 64K.
In this case if the host kernel is using 64K pages then the KVM
kernel code ought to say sorry, can't do that when we tell it the
base address. (That is, it's impossible to give the guest a VM
where the GICC it sees is at a 64K boundary on your hardware
and host kernel config, and hopefully we report that in a not totally
opaque fashion.)

If you hack QEMU's memory map for the virt board so instead of
[VIRT_GIC_CPU] = { 0x801, 0x1 },
we have
[VIRT_GIC_CPU] = { 0x801f000, 0x2000 },

does it work? If QEMU supported this VGIC_GRP_ADDR_OFFSET
query then all it would do would be to change that offset and size.
It would be good to know if there are other problems beyond that...

(Conveniently, Linux guests won't currently try to look at the second
4K page of their GICC...)

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration



On 06.06.14 02:20, Alexey Kardashevskiy wrote:

On 06/05/2014 09:57 PM, Alexander Graf wrote:

On 05.06.14 09:25, Alexey Kardashevskiy wrote:

This reserves 2 capability numbers.

This implements an extended version of KVM_CREATE_SPAPR_TCE_64 ioctl.

Please advise how to proceed with these patches as I suspect that
first two should go via Paolo's tree while the last one via Alex Graf's tree
(correct?).

They would just go via my tree, but only be actually allocated (read:
mergable to qemu) when they hit Paolo's tree.

In fact, I don't think it makes sense to split them off at all.


So? Are these patches going anywhere? Thanks.


So? Are you going to address the comments?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment



On 06/25/2014 03:45 PM, Peter Maydell wrote:

On 25 June 2014 20:34, Joel Schopp joel.sch...@amd.com wrote:

It doesn't work for me.  Maybe I'm doing something wrong, but I can't see
what.  I am unique in that I'm running a gic-400 (gicv2m) on aarch64
hardware with 64k pages.  I'm also unique in that my hardware maps each 4K
gic entry to a 64K page (aliasing each 4k of gic 16 times in a 64K page, ie
the gic virtual ic is at 0xe114 and 0xe1141000 and 0xe1142000, etc).

This is inline with appendix F of the server base system architecture.  This
is inconvenient when the size is 0x2000 (8K).  As a result all the offsets
in the device tree entries are to the last 4K in the page so that an 8K read
will read the last 4k from one page and the first 4k from the next and
actually get 8k of the gic.


 gic: interrupt-controller@e1101000 {
 compatible = arm,gic-400;
 #interrupt-cells = 3;
 #address-cells = 0;
 interrupt-controller;
 msi-controller;
 reg = 0x0 0xe111 0 0x1000, /* gic dist */
   0x0 0xe112f000 0 0x2000, /* gic cpu */
   0x0 0xe114f000 0 0x2000, /* gic virtual ic*/
   0x0 0xe116f000 0 0x2000, /* gic virtual cpu*/
   0x0 0xe118 0 0x1000; /* gic msi */

Right, this is the oddball case we don't yet support for 64K pages
(though as you say it is a permitted configuration per the SBSA).

At least I know I'm not going crazy.



 interrupts = 1 8 0xf04;
 };


My concern here is that if userspace is going to look at 8k starting at the
beginning of the page, guest offset 0 in your terminology, (say 0xe114)
instead of starting at the last 4k of the page, offset 0xf000 (say
0xe114f000) it is going to get the second 4k wrong by reading 0xe1141000
instead of 0xe115.

Userspace doesn't actually look at anything in the GICC. It just asks
the kernel to put the guest GICC (ie the mapping of the host GICV)
at a particular base address which happens to be a multiple of 64K.
In this case if the host kernel is using 64K pages then the KVM
kernel code ought to say sorry, can't do that when we tell it the
base address. (That is, it's impossible to give the guest a VM
where the GICC it sees is at a 64K boundary on your hardware
and host kernel config, and hopefully we report that in a not totally
opaque fashion.)

The errors I'm seeing look like:
from qemu:
error: kvm run failed Bad address
Aborted (core dumped)

from kvm:
[ 7931.722965] kvm [1208]: Unsupported fault status: EC=0x20 DFCS=0x14

from kvmtool:
from lkvm (kvmtool):
  Warning: /extra/rootfs/boot/Image is not a bzImage. Trying to load it 
as a flat binary...

  Info: Loaded kernel to 0x8008 (10212384 bytes)
  Info: Placing fdt at 0x8fe0 - 0x8fff
  Info: virtio-mmio.devices=0x200@0x1:36

KVM_RUN failed: Bad address




If you hack QEMU's memory map for the virt board so instead of
 [VIRT_GIC_CPU] = { 0x801, 0x1 },
we have
 [VIRT_GIC_CPU] = { 0x801f000, 0x2000 },
No change in result, not to say that this wouldn't work if some other 
unknown problem were fixed.


does it work? If QEMU supported this VGIC_GRP_ADDR_OFFSET
query then all it would do would be to change that offset and size.
It would be good to know if there are other problems beyond that...

(Conveniently, Linux guests won't currently try to look at the second
4K page of their GICC...)

That's handy.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] New PAPR hypercall plus individual hypercall enables, v3



On 02.06.14 03:02, Paul Mackerras wrote:

This patch series adds a way for userspace to control which sPAPR
hypercalls get handled by kernel handlers vs. being sent up to
userspace, and then adds an implementation of a new hypercall,
H_SET_MODE.

This version updates the documentation in api.txt as requested.

The series is against the queue branch of the kvm tree.  I would like
these patches to go into 3.16 if possible.


Thanks, applied to kvm-ppc-queue. I don't think there's a bug fix in 
here that would warrant them in 3.16 still :).



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL 16/19] target-i386: block migration and savevm if invariant tsc is exposed

2014-06-25 Thread Andreas Färber

From: Marcelo Tosatti mtosa...@redhat.com

Invariant TSC documentation mentions that invariant TSC will run at a
constant rate in all ACPI P-, C-. and T-states.

This is not the case if migration to a host with different TSC frequency
is allowed, or if savevm is performed. So block migration/savevm.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Reviewed-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Reviewed-by: Juan Quintela quint...@redhat.com
[AF+mtosatti: Updated error message]
Signed-off-by: Andreas Färber afaer...@suse.de
---
 target-i386/cpu-qom.h |  2 +-
 target-i386/kvm.c | 15 +++
 target-i386/machine.c |  2 +-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index ff3a5de..71a1b97 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -121,7 +121,7 @@ static inline X86CPU *x86_env_get_cpu(CPUX86State *env)
 #define ENV_OFFSET offsetof(X86CPU, env)
 
 #ifndef CONFIG_USER_ONLY
-extern const struct VMStateDescription vmstate_x86_cpu;
+extern struct VMStateDescription vmstate_x86_cpu;
 #endif
 
 /**
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 4bf0ac9..097fe11 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -35,6 +35,8 @@
 #include exec/ioport.h
 #include asm/hyperv.h
 #include hw/pci/pci.h
+#include migration/migration.h
+#include qapi/qmp/qerror.h
 
 //#define DEBUG_KVM
 
@@ -448,6 +450,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu-hyperv_relaxed_timing);
 }
 
+static Error *invtsc_mig_blocker;
+
 #define KVM_MAX_CPUID_ENTRIES  100
 
 int kvm_arch_init_vcpu(CPUState *cs)
@@ -705,6 +709,17 @@ int kvm_arch_init_vcpu(CPUState *cs)
   !!(c-ecx  CPUID_EXT_SMX);
 }
 
+c = cpuid_find_entry(cpuid_data.cpuid, 0x8007, 0);
+if (c  (c-edx  18)  invtsc_mig_blocker == NULL) {
+/* for migration */
+error_setg(invtsc_mig_blocker,
+   State blocked by non-migratable CPU device
+(invtsc flag));
+migrate_add_blocker(invtsc_mig_blocker);
+/* for savevm */
+vmstate_x86_cpu.unmigratable = 1;
+}
+
 cpuid_data.cpuid.padding = 0;
 r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data);
 if (r) {
diff --git a/target-i386/machine.c b/target-i386/machine.c
index b8dcd2f..16d2f6a 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -603,7 +603,7 @@ static const VMStateDescription vmstate_msr_hyperv_time = {
 }
 };
 
-const VMStateDescription vmstate_x86_cpu = {
+VMStateDescription vmstate_x86_cpu = {
 .name = cpu,
 .version_id = 12,
 .minimum_version_id = 3,
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] New PAPR hypercall plus individual hypercall enables, v3

2014-06-25 Thread Paul Mackerras

On Wed, Jun 25, 2014 at 11:46:10PM +0200, Alexander Graf wrote:
 
 On 02.06.14 03:02, Paul Mackerras wrote:
 This patch series adds a way for userspace to control which sPAPR
 hypercalls get handled by kernel handlers vs. being sent up to
 userspace, and then adds an implementation of a new hypercall,
 H_SET_MODE.
 
 This version updates the documentation in api.txt as requested.
 
 The series is against the queue branch of the kvm tree.  I would like
 these patches to go into 3.16 if possible.
 
 Thanks, applied to kvm-ppc-queue. I don't think there's a bug fix in here
 that would warrant them in 3.16 still :).

I agree.  It would be good to get a stable assignment of the number
for KVM_CAP_PPC_ENABLE_HCALL so we can start getting the qemu patches
upstream, though.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2 v2] ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping

2014-06-25 Thread Kim Phillips

From: Kim Phillips kim.phill...@linaro.org

A userspace process can map device MMIO memory via VFIO or /dev/mem,
e.g., for platform device passthrough support in QEMU.

During early development, we found the PAGE_S2 memory type being used
for MMIO mappings.  This patch corrects that by using the more strongly
ordered memory type for device MMIO mappings: PAGE_S2_DEVICE.

Signed-off-by: Kim Phillips kim.phill...@linaro.org
Acked-by: Christoffer Dall christoffer.d...@linaro.org
---
Hi, here's a v2, upon request:

- rebased onto today's mainline ToT
- mmu.o-build tested only (ToT build doesn't complete)
- made commit text less terse
- added Christoffer's ack

Cheers,

Kim

 arch/arm/kvm/mmu.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 16f8049..69af021 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -748,6 +748,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
struct kvm_mmu_memory_cache *memcache = vcpu-arch.mmu_page_cache;
struct vm_area_struct *vma;
pfn_t pfn;
+   pgprot_t mem_type = PAGE_S2;
 
write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
if (fault_status == FSC_PERM  !write_fault) {
@@ -798,6 +799,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (is_error_pfn(pfn))
return -EFAULT;
 
+   if (kvm_is_mmio_pfn(pfn))
+   mem_type = PAGE_S2_DEVICE;
+
spin_lock(kvm-mmu_lock);
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;
@@ -805,7 +809,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
hugetlb = transparent_hugepage_adjust(pfn, fault_ipa);
 
if (hugetlb) {
-   pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
+   pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
if (writable) {
kvm_set_s2pmd_writable(new_pmd);
@@ -814,13 +818,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
coherent_cache_guest_page(vcpu, hva  PMD_MASK, PMD_SIZE);
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, new_pmd);
} else {
-   pte_t new_pte = pfn_pte(pfn, PAGE_S2);
+   pte_t new_pte = pfn_pte(pfn, mem_type);
if (writable) {
kvm_set_s2pte_writable(new_pte);
kvm_set_pfn_dirty(pfn);
}
coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
-   ret = stage2_set_pte(kvm, memcache, fault_ipa, new_pte, false);
+   ret = stage2_set_pte(kvm, memcache, fault_ipa, new_pte,
+mem_type == PAGE_S2_DEVICE);
}
 
 
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] driver core: platform: add device binding path 'driver_override'

2014-06-25 Thread Kim Phillips

On Mon, 2 Jun 2014 21:28:42 -0700
Greg KH gre...@linuxfoundation.org wrote:

 On Mon, Jun 02, 2014 at 07:42:58PM -0500, Kim Phillips wrote:
  You are the platform driver core maintainer: can you apply this to
  your driver-core tree now?
 
 Yes, I will after this merge window ends, it's too late for 3.16-rc1
 with the window opening up a week early, sorry.

How about now?  fwiw, I just checked: it still applies cleanly.

Thanks,

Kim
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Bugfix: powerpc/eeh: Create eeh sysfs entry in post_init()

2014-06-25 Thread Wei Yang

On Wed, Jun 25, 2014 at 03:33:12PM +1000, Gavin Shan wrote:
On Tue, Jun 24, 2014 at 11:32:07PM -0400, Mike Qiu wrote:

[ cc Richard ]

Eeh sysfs entry created must be after EEH_ENABLED been set
in eeh_subsystem_flags.

In PowerNV platform, it try to create sysfs entry before
EEH_ENABLED been set, when boot up. So nothing will be
created for eeh in sysfs.


Could you please make the commit log more clear? :-)

I guess the issue is introduced by commit 2213fb1 (
powerpc/eeh: Skip eeh sysfs when eeh is disabled). The
commit checks EEH is enabled while creating PCI device
EEH sysfs files. If not, the sysfs files won't be created.
That's to avoid warning reported during PCI hotplug.

The problem you're reporting (if I understand completely):
You don't see the sysfs files after the system boots up.
If it's the case, you probably need following changes in
arch/powerpc/platforms/powernv/pci.c::pnv_pci_ioda_fixup().
Could you have a try with it?

#ifdef CONFIG_EEH
   eeh_probe_mode_set(EEH_PROBE_MODE_DEV);
-  eeh_addr_cache_build();
   eeh_init();
+  eeh_addr_cache_build();
#endif


I think this is a more proper fix.

BTW, I have one confusion in this mode set.

eeh_init()
  - eeh_ops-dev_probe()
 - powernv_eeh_dev_probe()
- eeh_set_enable(true)   - here the eeh is marked enabled

We can see this flag would be set for each pci_dev. So is it possible to make
this set only once?

Eventually PowerNV/pSeries have same function call sequence:

- Set EEH probe mode
- Doing probe (with device node or PCI device)
- Build address cache.

Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 8ad0c5b..5f95581 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -136,6 +136,9 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
  struct pnv_phb *phb = hose-private_data;
  int ret;

+ /* Creat sysfs after EEH_ENABLED been set */
+ eeh_add_sysfs_files(hose-bus);
+
  /* Register OPAL event notifier */
  if (!ioda_eeh_nb_init) {
  ret = opal_notifier_register(ioda_eeh_nb);

Thanks,
Gavin

-- 
Richard Yang
Help you, Help me

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Bugfix: powerpc/eeh: Create eeh sysfs entry in post_init()

2014-06-25 Thread Gavin Shan

On Wed, Jun 25, 2014 at 02:23:53PM +0800, Wei Yang wrote:
On Wed, Jun 25, 2014 at 03:33:12PM +1000, Gavin Shan wrote:
On Tue, Jun 24, 2014 at 11:32:07PM -0400, Mike Qiu wrote:

[ cc Richard ]

Eeh sysfs entry created must be after EEH_ENABLED been set
in eeh_subsystem_flags.

In PowerNV platform, it try to create sysfs entry before
EEH_ENABLED been set, when boot up. So nothing will be
created for eeh in sysfs.


Could you please make the commit log more clear? :-)

I guess the issue is introduced by commit 2213fb1 (
powerpc/eeh: Skip eeh sysfs when eeh is disabled). The
commit checks EEH is enabled while creating PCI device
EEH sysfs files. If not, the sysfs files won't be created.
That's to avoid warning reported during PCI hotplug.

The problem you're reporting (if I understand completely):
You don't see the sysfs files after the system boots up.
If it's the case, you probably need following changes in
arch/powerpc/platforms/powernv/pci.c::pnv_pci_ioda_fixup().
Could you have a try with it?

#ifdef CONFIG_EEH
  eeh_probe_mode_set(EEH_PROBE_MODE_DEV);
- eeh_addr_cache_build();
  eeh_init();
+ eeh_addr_cache_build();
#endif


I think this is a more proper fix.

BTW, I have one confusion in this mode set.

eeh_init()
  - eeh_ops-dev_probe()
 - powernv_eeh_dev_probe()
- eeh_set_enable(true)   - here the eeh is marked enabled

We can see this flag would be set for each pci_dev. So is it possible to make
this set only once?


It shouldn't be a problem because there might not have PCI devices
supporting EEH in the guest. All PCI devices are emulated.

Eventually PowerNV/pSeries have same function call sequence:

- Set EEH probe mode
- Doing probe (with device node or PCI device)
- Build address cache.

Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 8ad0c5b..5f95581 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -136,6 +136,9 @@ static int ioda_eeh_post_init(struct pci_controller 
*hose)
 struct pnv_phb *phb = hose-private_data;
 int ret;

+/* Creat sysfs after EEH_ENABLED been set */
+eeh_add_sysfs_files(hose-bus);
+
 /* Register OPAL event notifier */
 if (!ioda_eeh_nb_init) {
 ret = opal_notifier_register(ioda_eeh_nb);

Thanks,
Gavin

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value



On 15.06.14 20:47, Aneesh Kumar K.V wrote:

With guests supporting Multiple page size per segment (MPSS),
hpte_page_size returns the actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB. Without this patch a hpte lookup can fail since
we are comparing wrong page size in kvmppc_hv_find_lock_hpte.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


Thanks, applied to for-3.16.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code

2014-06-25 Thread Marek Szyprowski


Hello,

On 2014-06-18 22:51, Andrew Morton wrote:

On Tue, 17 Jun 2014 10:25:07 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote:

v2:
   - Although this patchset looks very different with v1, the end result,
   that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.

This patchset is based on linux-next 20140610.

Thanks for taking care of this. I will test it with my setup and if
everything goes well, I will take it to my -next tree. If any branch
is required for anyone to continue his works on top of those patches,
let me know, I will also prepare it.

Hello,

I'm glad to hear that. :)
But, there is one concern. As you already know, I am preparing further
patches (Aggressively allocate the pages on CMA reserved memory). It
may be highly related to MM branch and also slightly depends on this CMA
changes. In this case, what is the best strategy to merge this
patchset? IMHO, Anrew's tree is more appropriate branch. If there is
no issue in this case, I am willing to develope further patches based
on your tree.

That's probably easier.  Marek, I'll merge these into -mm (and hence
-next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
and shall hold them pending you review/ack/test/etc, OK?


Ok. I've tested them and they work fine. I'm sorry that you had to wait for
me for a few days. You can now add:

Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com

I've also rebased my pending patches onto this set (I will send them soon).

The question is now if you want to keep the discussed patches in your 
-mm tree,

or should I take them to my -next branch. If you like to keep them, I assume
you will also take the patches which depends on the discussed changes.

Best regards
--
Marek Szyprowski, PhD
Samsung RD Institute Poland

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code

2014-06-25 Thread Andrew Morton

On Wed, 25 Jun 2014 14:33:56 +0200 Marek Szyprowski m.szyprow...@samsung.com 
wrote:

  That's probably easier.  Marek, I'll merge these into -mm (and hence
  -next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
  and shall hold them pending you review/ack/test/etc, OK?
 
 Ok. I've tested them and they work fine. I'm sorry that you had to wait for
 me for a few days. You can now add:
 
 Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com

Thanks.

 I've also rebased my pending patches onto this set (I will send them soon).
 
 The question is now if you want to keep the discussed patches in your 
 -mm tree,
 or should I take them to my -next branch. If you like to keep them, I assume
 you will also take the patches which depends on the discussed changes.

Yup, that works.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration



On 06.06.14 02:20, Alexey Kardashevskiy wrote:

On 06/05/2014 09:57 PM, Alexander Graf wrote:

On 05.06.14 09:25, Alexey Kardashevskiy wrote:

This reserves 2 capability numbers.

This implements an extended version of KVM_CREATE_SPAPR_TCE_64 ioctl.

Please advise how to proceed with these patches as I suspect that
first two should go via Paolo's tree while the last one via Alex Graf's tree
(correct?).

They would just go via my tree, but only be actually allocated (read:
mergable to qemu) when they hit Paolo's tree.

In fact, I don't think it makes sense to split them off at all.


So? Are these patches going anywhere? Thanks.


So? Are you going to address the comments?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] New PAPR hypercall plus individual hypercall enables, v3