buildbot failure in kvm on next-i386

2014-06-13 Thread kvm
The Buildbot has detected a new failure on builder next-i386 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-i386/builds/1260

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

Re: Using virtio for inter-VM communication

2014-06-13 Thread Jan Kiszka
On 2014-06-13 02:47, Rusty Russell wrote:
 Jan Kiszka jan.kis...@siemens.com writes:
 On 2014-06-12 04:27, Rusty Russell wrote:
 Henning Schild henning.sch...@siemens.com writes:
 It was also never implemented, and remains a thought experiment.
 However, implementing it in lguest should be fairly easy.

 The reason why a trusted helper, i.e. additional logic in the
 hypervisor, is not our favorite solution is that we'd like to keep the
 hypervisor as small as possible. I wouldn't exclude such an approach
 categorically, but we have to weigh the costs (lines of code, additional
 hypervisor interface) carefully against the gain (existing
 specifications and guest driver infrastructure).
 
 Reasonable, but I think you'll find it is about the minimal
 implementation in practice.  Unfortunately, I don't have time during the
 next 6 months to implement it myself :(
 
 Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA
 working group discussion): What speaks against introducing an
 alternative encoding of addresses inside virtio data structures? The
 idea of this flag was to replace guest-physical addresses with offsets
 into a shared memory region associated with or part of a virtio
 device.
 
 We would also need a way of defining the shared memory region.  But
 that's not the problem.  If such a feature is not accepted by the guest?
 How to you fall back?

Depends on the hypervisor and its scope, but it should be quite
straightforward: full-featured ones like KVM could fall back to slow
copying, specialized ones like Jailhouse would clear FEATURES_OK if the
guest driver does not accept it (because there would be no ring walking
or copying code in Jailhouse), thus refuse the activate the device. That
would be absolutely fine for application domains of specialized
hypervisors (often embedded, customized guests etc.).

The shared memory regions could be exposed as a BARs (PCI) or additional
address ranges (device tree) and addressed in the redefined guest
address fields via some region index and offset.

 
 We don't add features which unmake the standard.
 
 That would preserve zero-copy capabilities (as long as you can work
 against the shared mem directly, e.g. doing DMA from a physical NIC or
 storage device into it) and keep the hypervisor out of the loop.
 
 This seems ill thought out.  How will you program a NIC via the virtio
 protocol without a hypervisor?  And how will you make it safe?  You'll
 need an IOMMU.  But if you have an IOMMU you don't need shared memory.

Scenarios behind this are things like driver VMs: You pass through the
physical hardware to a driver guest that talks to the hardware and
relays data via one or more virtual channels to other VMs. This confines
a certain set of security and stability risks to the driver VM.

 
 Is it
 too invasive to existing infrastructure or does it have some other pitfalls?
 
 You'll have to convince every vendor to implement your addition to the
 standard.  Which is easier than inventing a completely new system, but
 it's not quite virtio.

It would be an optional addition, a feature all three sides (host and
the communicating guests) would have to agree on. I think we would only
have to agree on extending the spec to enable this - after demonstrating
it via an implementation, of course.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
 {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
 
a_psize = __hpte_actual_psize(lp, size);
-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
 
}
return 0;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 87624ab5ba82..c6aca75b8376 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, 
unsigned long slb_v,
 * to check against the actual page size.
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using virtio for inter-VM communication

2014-06-13 Thread Paolo Bonzini

Il 13/06/2014 08:23, Jan Kiszka ha scritto:

That would preserve zero-copy capabilities (as long as you can work
against the shared mem directly, e.g. doing DMA from a physical NIC or
storage device into it) and keep the hypervisor out of the loop.


 This seems ill thought out.  How will you program a NIC via the virtio
 protocol without a hypervisor?  And how will you make it safe?  You'll
 need an IOMMU.  But if you have an IOMMU you don't need shared memory.

Scenarios behind this are things like driver VMs: You pass through the
physical hardware to a driver guest that talks to the hardware and
relays data via one or more virtual channels to other VMs. This confines
a certain set of security and stability risks to the driver VM.


I think implementing Xen hypercalls in jailhouse for grant table and 
event channels would actually make a lot of sense.  The Xen 
implementation is 2.5kLOC and I think it should be possible to compact 
it noticeably, especially if you limit yourself to 64-bit guests.


It should also be almost enough to run Xen PVH guests as jailhouse 
partitions.


If later Xen starts to support virtio, you will get that for free.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-13 Thread Markus Armbruster
Some dropped quoted text restored.

Vincent JARDIN vincent.jar...@6wind.com writes:

 Markus,

 see inline (I am not on all mailing list, please, keep the cc list).

 Sure!  The reasons for my dislike range from practical to
 philosophical.

 My practical concerns include:

 1. ivshmem code needs work, but has no maintainer
 See David's contributions:
   http://patchwork.ozlabs.org/patch/358750/

We're grateful for David's patch for qemu-char.c, but this isn't ivshmem
maintenance, yet.

   - Error handling is generally poor.  For instance, device_add
 ivshmem kills your guest instantly.

   - More subjectively, I don't trust the code to be robust against
 abuse by our own guest, or the other guests sharing the memory.
 Convincing me would take a code audit.

   - MAINTAINERS doesn't cover ivshmem.c.

   - The last non-trivial commit that isn't obviously part of some
 tree-wide infrastructure or cleanup work is from September 2012
 (commit c08ba66).

 2. There is no libvirt support

 One can use qemu without libvivrt.

You asked me for my reasons for disliking ivshmem.  This is one.

Sure, I can drink my water through a straw while standing on one foot,
but that doesn't mean I have to like it.  And me not liking it doesn't
mean the next guy shouldn't like it.  To each their own.

 3. Out-of-tree server program required for full functionality

   Interrupts require a shared memory server running in the host (see
   docs/specs/ivshmem_device_spec.txt).  It doesn't tell where to find
   one.  The initial commit 6cbf4c8 points to
   www.gitorious.org/nahanni.  That repository's last commit is from
   September 2012.  He's dead, Jim.

   ivshmem_device_spec.txt is silent on what the server is supposed to
   do.

 We have the source code, it provides the documentation to write our
 own better server program.

Good for you.  Not good enough for the QEMU community.

QEMU features requiring on out-of-tree software to be useful are fine,
as long as said out-of-tree software is readily available to QEMU
developers and users.

Free software with a community around it and packaged in major distros
qualifies.  If you haven't got that, talk to us to find out whether what
you've got qualifies, and if not, what you'd have to do to make it
qualify.

Back when we accepted ivshmem, the out-of-tree parts it needs were well
below the community  packaged bar.  But folks interested in it talked
to us, and the fact that it's in shows that QEMU maintainers decided
what they had then was enough.

Unfortunately, we now have considerably less: Nahanni appears to be
dead.

An apparently dead git repository you can study is not enough.  The fact
that you hold an improved reimplementation privately is immaterial.  So
is the (plausible) claim that others could also create a
reimplementation.

   If this server requires privileges: I don't trust it without an
   audit.

 4. Out-of-tree kernel uio driver required

 No, it is optional.

Good to know.  Would you be willing to send a patch to
ivshmem_device_spec.txt clarifying that?

   The device is intended to be used with the provided UIO driver
   (ivshmem_device_spec.txt again).  As far as I can tell, the provided
   UIO driver is the one in the dead Nahanni repo.

   By now, you should be expecting this: I don't trust that one either.

 These concerns are all fixable, but it'll take serious work, and time.
 Something like:

 * Find a maintainer for the device model
 I guess, we can find it into the DPDK.org community.
 * Review and fix its code

 * Get the required kernel module upstream

 which module? uio, it is not required.

 * Get all the required parts outside QEMU packaged in major distros, or
absorbed into QEMU

 Redhat did disable it. why? it is there in QEMU.

Up to now, I've been wearing my QEMU hat.  Let me exchange it for my Red
one for a bit.

We (Red Hat) don't just package  ship metric tons of random free
software.  We package  ship useful free software we can support for
many, many years.

Sometimes, we find that we have to focus serious development resources
on making something useful supportable (Paolo mentioned qcow2).  We
obviously can't focus on everything, though.

Anyway, ivshmem didn't make the cut for RHEL-7.0.  Sorry if that
inconveniences you.  To get it into RHEL, you need to show it's both
useful and supportable.  Building a community around it would go a long
way towards that.

If you want to discuss this in more detail with us, you may want to try
communication channels provided by your RHEL subscription in addition to
the QEMU development mailing list.  Don't be shy, you're paying for it!

As always, I'm not speaking for myself, not my employer.

Okay, wearing my QEMU hat again.

 In short, create a viable community around ivshmem, either within the
 QEMU community, or separately but cooperating.

 At least, DPDK.org community is a community using it.

Using something isn't the same as maintaining something.  But it's a
necessary 

Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-13 Thread Vincent JARDIN

(+merging with Paolo's email because of overlaps)


see inline (I am not on all mailing list, please, keep the cc list).




1. ivshmem code needs work, but has no maintainer

See David's contributions:
   http://patchwork.ozlabs.org/patch/358750/


We're grateful for David's patch for qemu-char.c, but this isn't ivshmem
maintenance, yet.


others can come (doc), see below.


2. There is no libvirt support


One can use qemu without libvivrt.


You asked me for my reasons for disliking ivshmem.  This is one.

Sure, I can drink my water through a straw while standing on one foot,
but that doesn't mean I have to like it.  And me not liking it doesn't
mean the next guy shouldn't like it.  To each their own.


I like using qemu without libvirt, libvirt is not part of qemu.
Let's avoid trolling about it ;)


Back when we accepted ivshmem, the out-of-tree parts it needs were well
below the community  packaged bar.  But folks interested in it talked
to us, and the fact that it's in shows that QEMU maintainers decided
what they had then was enough.

Unfortunately, we now have considerably less: Nahanni appears to be
dead.


agree and to bad it is dead. We should let Nahanni dead since ivshmem is 
a QEMU topic now, see below. Does it make sense?




An apparently dead git repository you can study is not enough.  The fact
that you hold an improved reimplementation privately is immaterial.  So
is the (plausible) claim that others could also create a
reimplementation.


Got the point. What's about a patch to 
docs/specs/ivshmem_device_spec.txt that improves it?


I can make qemu's ivshmem better:
  - keep explaining memnic for instance,
  - explain how to write other ivshmem.

does it help?


4. Out-of-tree kernel uio driver required


No, it is optional.


Good to know.  Would you be willing to send a patch to
ivshmem_device_spec.txt clarifying that?


got the point, yes,


* Get all the required parts outside QEMU packaged in major distros, or
absorbed into QEMU


Redhat did disable it. why? it is there in QEMU.


Up to now, I've been wearing my QEMU hat.  Let me exchange it for my Red
one for a bit.

We (Red Hat) don't just package  ship metric tons of random free
software.  We package  ship useful free software we can support for
many, many years.

Sometimes, we find that we have to focus serious development resources
on making something useful supportable (Paolo mentioned qcow2).  We
obviously can't focus on everything, though.


Good open technology should rule. ivshmem has use cases. And I go agree 
with you, it is like the phoenix, it has to be re-explained/documented 
to be back to life. I was not aware that the QEMU community was missing 
ivshmem contributors (my bad I did not check MAINTAINERS).



Anyway, ivshmem didn't make the cut for RHEL-7.0.  Sorry if that
inconveniences you.  To get it into RHEL, you need to show it's both
useful and supportable.  Building a community around it would go a long
way towards that.


understood.


If you want to discuss this in more detail with us, you may want to try
communication channels provided by your RHEL subscription in addition to
the QEMU development mailing list.  Don't be shy, you're paying for it!


done. I was focusing on DPDK.org and ignorant of QEMU's status, thinking 
Redhat was covering it. How to know which part of an opensource software 
are and are not included into Redhat. Sales are ignorant about it ;). 
Redhat randomly disables some files at compilation (for some good 
reasons I guess, but not public rationals or I am missing something).


Feel free to open this PR to anyone:
  https://bugzilla.redhat.com/show_bug.cgi?id=1088332


In short, create a viable community around ivshmem, either within the
QEMU community, or separately but cooperating.


At least, DPDK.org community is a community using it.


Using something isn't the same as maintaining something.  But it's a
necessary first step.


understood, after David's patch, documentation will come.

(now Paolo's email since there were some overlaps)

 Markus especially referred to parts *outside* QEMU: the server, the
 uio driver, etc.  These out-of-tree, non-packaged parts of ivshmem
 are one of the reasons why Red Hat has disabled ivshmem in RHEL7.

You made the right choices, these out-of-tree packages are not required. 
You can use QEMU's ivshmem without any of the out-of-tree packages. The 
out-of-tree packages are just some examples of using ivshmem.


 He also listed many others.  Basically for parts of QEMU that are not
 of high quality, we either fix them (this is for example what we did
 for qcow2) or disable them.  Not just ivshmem suffered this fate, for
 example many network cards, sound cards, SCSI storage adapters.

I and David (cc) are working on making it better based on the issues 
that are found.


 Now, vhost-user is in the process of being merged for 2.1.  Compared 
to the DPDK solution:


now, you cannot compare vhost-user to DPDK/ivshmem; both should exsit 
because 

Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-13 Thread Jobin Raju George
Nahanni's poor current development coupled with virtIO's promising
expansion was what encouraged us to explore virtIO-serial [1] for
inter-virtual machine communication. Though virtIO-serial as it is
isn't helpful for inter-VM communication, some work is needed for this
purpose and this is exactly what we (I and two of my fellow
classmates) accomplished.

We haven't published it yet since we do need to polish yet for
upstreaming it and are planning do it in near future.

[1]: http://fedoraproject.org/wiki/Features/VirtioSerial


On Fri, Jun 13, 2014 at 2:56 PM, Vincent JARDIN
vincent.jar...@6wind.com wrote:

 (+merging with Paolo's email because of overlaps)


 see inline (I am not on all mailing list, please, keep the cc list).


 1. ivshmem code needs work, but has no maintainer

 See David's contributions:
http://patchwork.ozlabs.org/patch/358750/


 We're grateful for David's patch for qemu-char.c, but this isn't ivshmem
 maintenance, yet.


 others can come (doc), see below.


 2. There is no libvirt support


 One can use qemu without libvivrt.


 You asked me for my reasons for disliking ivshmem.  This is one.

 Sure, I can drink my water through a straw while standing on one foot,
 but that doesn't mean I have to like it.  And me not liking it doesn't
 mean the next guy shouldn't like it.  To each their own.


 I like using qemu without libvirt, libvirt is not part of qemu.
 Let's avoid trolling about it ;)


 Back when we accepted ivshmem, the out-of-tree parts it needs were well
 below the community  packaged bar.  But folks interested in it talked
 to us, and the fact that it's in shows that QEMU maintainers decided
 what they had then was enough.

 Unfortunately, we now have considerably less: Nahanni appears to be
 dead.


 agree and to bad it is dead. We should let Nahanni dead since ivshmem is a 
 QEMU topic now, see below. Does it make sense?



 An apparently dead git repository you can study is not enough.  The fact
 that you hold an improved reimplementation privately is immaterial.  So
 is the (plausible) claim that others could also create a
 reimplementation.


 Got the point. What's about a patch to docs/specs/ivshmem_device_spec.txt 
 that improves it?

 I can make qemu's ivshmem better:
   - keep explaining memnic for instance,
   - explain how to write other ivshmem.

 does it help?


 4. Out-of-tree kernel uio driver required


 No, it is optional.


 Good to know.  Would you be willing to send a patch to
 ivshmem_device_spec.txt clarifying that?


 got the point, yes,


 * Get all the required parts outside QEMU packaged in major distros, or
 absorbed into QEMU


 Redhat did disable it. why? it is there in QEMU.


 Up to now, I've been wearing my QEMU hat.  Let me exchange it for my Red
 one for a bit.

 We (Red Hat) don't just package  ship metric tons of random free
 software.  We package  ship useful free software we can support for
 many, many years.

 Sometimes, we find that we have to focus serious development resources
 on making something useful supportable (Paolo mentioned qcow2).  We
 obviously can't focus on everything, though.


 Good open technology should rule. ivshmem has use cases. And I go agree with 
 you, it is like the phoenix, it has to be re-explained/documented to be back 
 to life. I was not aware that the QEMU community was missing ivshmem 
 contributors (my bad I did not check MAINTAINERS).


 Anyway, ivshmem didn't make the cut for RHEL-7.0.  Sorry if that
 inconveniences you.  To get it into RHEL, you need to show it's both
 useful and supportable.  Building a community around it would go a long
 way towards that.


 understood.


 If you want to discuss this in more detail with us, you may want to try
 communication channels provided by your RHEL subscription in addition to
 the QEMU development mailing list.  Don't be shy, you're paying for it!


 done. I was focusing on DPDK.org and ignorant of QEMU's status, thinking 
 Redhat was covering it. How to know which part of an opensource software are 
 and are not included into Redhat. Sales are ignorant about it ;). Redhat 
 randomly disables some files at compilation (for some good reasons I guess, 
 but not public rationals or I am missing something).

 Feel free to open this PR to anyone:
   https://bugzilla.redhat.com/show_bug.cgi?id=1088332


 In short, create a viable community around ivshmem, either within the
 QEMU community, or separately but cooperating.


 At least, DPDK.org community is a community using it.


 Using something isn't the same as maintaining something.  But it's a
 necessary first step.


 understood, after David's patch, documentation will come.

 (now Paolo's email since there were some overlaps)

  Markus especially referred to parts *outside* QEMU: the server, the
  uio driver, etc.  These out-of-tree, non-packaged parts of ivshmem
  are one of the reasons why Red Hat has disabled ivshmem in RHEL7.

 You made the right choices, these out-of-tree packages are not 

mips: Accidental removal of paravirt_cpus_done?

2014-06-13 Thread Geert Uytterhoeven
Hi Ralf,

It seems you accidentally assimilated an (unwanted?) kvm change in my
patch:

On Tue, Jun 10, 2014 at 3:31 AM, Linux Kernel Mailing List
linux-ker...@vger.kernel.org wrote:
 Gitweb: 
 http://git.kernel.org/linus/;a=commit;h=5e888e8fb55cf3da870b85d04fef6bfe0d57c974
 Commit: 5e888e8fb55cf3da870b85d04fef6bfe0d57c974
 Parent: a1eace4ba53546bc7a6670b1c380cd5c1287ae8b
 Refname:refs/heads/master
 Author: Geert Uytterhoeven ge...@linux-m68k.org
 AuthorDate: Tue Apr 22 12:51:13 2014 +0200
 Committer:  Ralf Baechle r...@linux-mips.org
 CommitDate: Mon Jun 2 16:34:41 2014 +0200

 mips: Update the email address of Geert Uytterhoeven

 All my Sony addresses are defunct.

 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
 Cc: linux-m...@linux-mips.org
 Patchwork: https://patchwork.linux-mips.org/patch/6817/
 Signed-off-by: Ralf Baechle r...@linux-mips.org
 ---
  arch/mips/include/asm/nile4.h |2 +-
  arch/mips/paravirt/paravirt-smp.c |5 -
  arch/mips/pci/ops-pmcmsp.c|2 +-
  arch/mips/pci/ops-tx3927.c|2 +-
  4 files changed, 3 insertions(+), 8 deletions(-)

 diff --git a/arch/mips/paravirt/paravirt-smp.c 
 b/arch/mips/paravirt/paravirt-smp.c
 index 73a123e..0164b0c 100644
 --- a/arch/mips/paravirt/paravirt-smp.c
 +++ b/arch/mips/paravirt/paravirt-smp.c
 @@ -99,10 +99,6 @@ static void paravirt_smp_finish(void)
 local_irq_enable();
  }

 -static void paravirt_cpus_done(void)
 -{
 -}
 -
  static void paravirt_boot_secondary(int cpu, struct task_struct *idle)
  {
 paravirt_smp_gp[cpu] = (unsigned long)task_thread_info(idle);
 @@ -141,7 +137,6 @@ struct plat_smp_ops paravirt_smp_ops = {
 .send_ipi_mask  = paravirt_send_ipi_mask,
 .init_secondary = paravirt_init_secondary,
 .smp_finish = paravirt_smp_finish,
 -   .cpus_done  = paravirt_cpus_done,
 .boot_secondary = paravirt_boot_secondary,
 .smp_setup  = paravirt_smp_setup,
 .prepare_cpus   = paravirt_prepare_cpus,

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Alexander Graf


On 13.06.14 09:23, Aneesh Kumar K.V wrote:

With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB


Why? What does this fix? Is this a bug fix, an enhancement? Don't 
describe only what you do, but also why you do it.



Alex



Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
  arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
  arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
  arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
  3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
  }
  
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)

+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
  {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
  
  			a_psize = __hpte_actual_psize(lp, size);

-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
  
  	}

return 0;
  }
  
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)

+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
  static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
  {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
  
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c

index 87624ab5ba82..c6aca75b8376 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, 
unsigned long slb_v,
 * to check against the actual page size.
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-13 Thread Paolo Bonzini

Il 13/06/2014 11:26, Vincent JARDIN ha scritto:

Markus especially referred to parts *outside* QEMU: the server, the
uio driver, etc.  These out-of-tree, non-packaged parts of ivshmem
are one of the reasons why Red Hat has disabled ivshmem in RHEL7.


You made the right choices, these out-of-tree packages are not required.
You can use QEMU's ivshmem without any of the out-of-tree packages. The
out-of-tree packages are just some examples of using ivshmem.


Fine, however Red Hat would also need a way to test ivshmem code, with 
proper quality assurance (that also benefits upstream, of course).  With 
ivshmem this is not possible without the out-of-tree packages.


Disabling all the unwanted devices is a lot of work and thankless too 
(you only get complaints, in fact!).  But we prefer to ship only what we 
know we can test, support and improve.  We do not want customers' bug 
reports to languish because they are using code that cannot really be fixed.


Note that we do take into account community contributions in choosing 
which new code can be supported.  For example most work on VMDK images 
was done by Fam when he was a student, libiscsi is mostly the work of 
Peter Lieven, and so on; both of them are supported in RHEL.  These 
people did/do a great job, and we were happy to embrace those features!


Now, putting back my QEMU hat...


He also listed many others.  Basically for parts of QEMU that are not
of high quality, we either fix them (this is for example what we did
for qcow2) or disable them.  Not just ivshmem suffered this fate, for
example many network cards, sound cards, SCSI storage adapters.


I and David (cc) are working on making it better based on the issues
that are found.


Now, vhost-user is in the process of being merged for 2.1.  Compared

to the DPDK solution:

now, you cannot compare vhost-user to DPDK/ivshmem; both should exsit
because they have different scope and use cases. It is like comparing
two different(A) models of IPC:
  - vhost-user - networking use case specific


Not necessarily.  First and foremost, vhost-user defines an API for 
communication between QEMU and the host, including:


* file descriptor passing for the shared memory file

* mapping offsets in shared memory to physical memory addresses in the 
guests


* passing dirty memory information back and forth, so that migration is 
not prevented


* sending interrupts to a device

* setting up ring buffers in the shared memory


None of these is virtio specific, except the last (even then, you could 
repurpose the messages to pass the address of the whole shared memory 
area, instead of the vrings only).


Yes, the only front-end for vhost-user, right now, is a network device. 
 But it is possible to connect vhost-scsi to vhost-user as well, it is 
possible to develop a vhost-serial as well, and it is possible to only 
use the RPC and develop arbitrary shared-memory based tools using this 
API.  It's just that no one has done it yet.


Also, vhost-user is documented! See here: 
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg00581.html


The only part of ivshmem that vhost doesn't include is the n-way 
inter-guest doorbell.  This is the part that requires a server and uio 
driver.  vhost only supports host-guest and guest-host doorbells.



* it doesn't require hugetlbfs (which only enabled shared memory by
chance in older QEMU releases, that was never documented)


ivhsmem does not require hugetlbfs. It is optional.


* it doesn't require the kernel driver from the DPDK sample


ivhsmem does not require DPDK kernel driver. see memnic's PMD:
  http://dpdk.org/browse/memnic/tree/pmd/pmd_memnic.c


You're right, I was confusing memnic and the vhost example in DPDK.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-13 Thread Olivier MATZ

Hello,

On 06/13/2014 11:26 AM, Vincent JARDIN wrote:

ivhsmem does not require hugetlbfs. It is optional.

  * it doesn't require ivshmem (it does require shared memory, which
  will also be added to 2.1)


Right, hugetlbfs is not required. A posix shared memory or tmpfs
can be used instead. For instance, to use /dev/shm/foobar:

  qemu-system-x86_64 -enable-kvm -cpu host [...] \
 -device ivshmem,size=16,shm=foobar


Regards,
Olivier
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mips: Accidental removal of paravirt_cpus_done?

2014-06-13 Thread Andreas Herrmann
On Fri, Jun 13, 2014 at 12:02:30PM +0200, Geert Uytterhoeven wrote:
 Hi Ralf,
 
 It seems you accidentally assimilated an (unwanted?) kvm change in my
 patch:

Hi Geert,

Actually this change was wanted. After Ralf informed me about a
compile error in linux-next I've sent him an update for one of my
mips-paravirt patches.

Unfortunately that ended up in your (unrelated patch).


Andreas

 On Tue, Jun 10, 2014 at 3:31 AM, Linux Kernel Mailing List
 linux-ker...@vger.kernel.org wrote:
  Gitweb: 
  http://git.kernel.org/linus/;a=commit;h=5e888e8fb55cf3da870b85d04fef6bfe0d57c974
  Commit: 5e888e8fb55cf3da870b85d04fef6bfe0d57c974
  Parent: a1eace4ba53546bc7a6670b1c380cd5c1287ae8b
  Refname:refs/heads/master
  Author: Geert Uytterhoeven ge...@linux-m68k.org
  AuthorDate: Tue Apr 22 12:51:13 2014 +0200
  Committer:  Ralf Baechle r...@linux-mips.org
  CommitDate: Mon Jun 2 16:34:41 2014 +0200
 
  mips: Update the email address of Geert Uytterhoeven
 
  All my Sony addresses are defunct.
 
  Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
  Cc: linux-m...@linux-mips.org
  Patchwork: https://patchwork.linux-mips.org/patch/6817/
  Signed-off-by: Ralf Baechle r...@linux-mips.org
  ---
   arch/mips/include/asm/nile4.h |2 +-
   arch/mips/paravirt/paravirt-smp.c |5 -
   arch/mips/pci/ops-pmcmsp.c|2 +-
   arch/mips/pci/ops-tx3927.c|2 +-
   4 files changed, 3 insertions(+), 8 deletions(-)
 
  diff --git a/arch/mips/paravirt/paravirt-smp.c 
  b/arch/mips/paravirt/paravirt-smp.c
  index 73a123e..0164b0c 100644
  --- a/arch/mips/paravirt/paravirt-smp.c
  +++ b/arch/mips/paravirt/paravirt-smp.c
  @@ -99,10 +99,6 @@ static void paravirt_smp_finish(void)
  local_irq_enable();
   }
 
  -static void paravirt_cpus_done(void)
  -{
  -}
  -
   static void paravirt_boot_secondary(int cpu, struct task_struct *idle)
   {
  paravirt_smp_gp[cpu] = (unsigned long)task_thread_info(idle);
  @@ -141,7 +137,6 @@ struct plat_smp_ops paravirt_smp_ops = {
  .send_ipi_mask  = paravirt_send_ipi_mask,
  .init_secondary = paravirt_init_secondary,
  .smp_finish = paravirt_smp_finish,
  -   .cpus_done  = paravirt_cpus_done,
  .boot_secondary = paravirt_boot_secondary,
  .smp_setup  = paravirt_smp_setup,
  .prepare_cpus   = paravirt_prepare_cpus,
 
 Gr{oetje,eeting}s,
 
 Geert
 
 --
 Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
 ge...@linux-m68k.org
 
 In personal conversations with technical people, I call myself a hacker. But
 when I'm talking to journalists I just say programmer or something like 
 that.
 -- Linus Torvalds
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Personal Donation to you

2014-06-13 Thread Cress, Jacquelyn
2,000,000 USD A Personal Donation to you from Mr. Pedro Quezada, Contact Mr. 
Pedro Quezada On (; p.quezada4e...@3mail.ie; ) for more details.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] vhost: avoid large order allocations

2014-06-13 Thread Michael Mueller
On Tue, 13 May 2014 18:15:27 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On Tue, May 13, 2014 at 04:29:58PM +0200, Romain Francoise wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   Please dont' do this, extra indirection hurts performance.
   Instead, please change vhost_net_open and scsi to allocate the whole
   structure with vmalloc if kmalloc fails, along the lines of
   74d332c13b2148ae934ea94dac1745ae92efe8e5
  
  Back in January 2013, you didn't seem to think it was a good idea:
  
  https://lkml.org/lkml/2013/1/23/492
 
 Hmm true, and Dave thought the structure's too large.
 I'll have to do some benchmarks to see what the effect
 of Michael's patch is, performance-wise.
 If it's too expensive I can pick up your patch, no need to
 repost.
 

Hi Michael,

do you have any update in this case for us?

Michael

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 54521] nVMX: accurately emulate VMXON region

2014-06-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=54521

Paolo Bonzini bonz...@gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||bonz...@gnu.org
 Kernel Version||3.16
 Resolution|--- |CODE_FIX

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53601] nVMX meta-bug

2014-06-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53601

Bug 53601 depends on bug 54521, which changed state.

Bug 54521 Summary: nVMX: accurately emulate VMXON region
https://bugzilla.kernel.org/show_bug.cgi?id=54521

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |CODE_FIX

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-13 Thread Vincent JARDIN

Fine, however Red Hat would also need a way to test ivshmem code, with
proper quality assurance (that also benefits upstream, of course).  With
ivshmem this is not possible without the out-of-tree packages.


You did not reply to my question: how to get the list of things that 
are/will be disabled by Redhat?


About Redhat's QA, I do not care.
About Qemu's QA, I do care ;)

I guess we can combine both. What's about something like:
  tests/virtio-net-test.c # qtest_add_func( is a nop)
but for ivshmem
  test/ivshmem-test.c
?

would it have any values?

If not, what do you use at Redhat to test Qemu?


now, you cannot compare vhost-user to DPDK/ivshmem; both should exsit
because they have different scope and use cases. It is like comparing
two different(A) models of IPC:


I do repeat this use case that you had removed because vhost-user does 
not solve it yet:


  - ivshmem - framework to be generic to have shared memory for many
 use cases (HPC, in-memory-database, a network too like memnic).


  - vhost-user - networking use case specific


Not necessarily.  First and foremost, vhost-user defines an API for
communication between QEMU and the host, including:
* file descriptor passing for the shared memory file
* mapping offsets in shared memory to physical memory addresses in the
guests
* passing dirty memory information back and forth, so that migration is
not prevented
* sending interrupts to a device
* setting up ring buffers in the shared memory


Yes, I do agree that it is promising.
And of course some tests are here:
  https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg00584.html
for some of the bullets you are listing (not all yet).


Also, vhost-user is documented! See here:
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg00581.html


as I told you, we'll send a contribution with ivshmem's documentation.


The only part of ivshmem that vhost doesn't include is the n-way
inter-guest doorbell.  This is the part that requires a server and uio
driver.  vhost only supports host-guest and guest-host doorbells.


agree: both will need it: vhost and ivshmem requires a doorbell for 
VM2VM, but then we'll have a security issue to be managed by Qemu for 
vhost and ivshmem.
I'll be pleased to contribute on it for ivshmem thru another thread that 
this one.



ivhsmem does not require DPDK kernel driver. see memnic's PMD:
  http://dpdk.org/browse/memnic/tree/pmd/pmd_memnic.c


You're right, I was confusing memnic and the vhost example in DPDK.


Definitively, it proves a lack of documentation. You welcome. Olivier 
did explain it:



ivhsmem does not require hugetlbfs. It is optional.

  * it doesn't require ivshmem (it does require shared memory, which
  will also be added to 2.1)


Right, hugetlbfs is not required. A posix shared memory or tmpfs
can be used instead. For instance, to use /dev/shm/foobar:

  qemu-system-x86_64 -enable-kvm -cpu host [...] \
 -device ivshmem,size=16,shm=foobar



Best regards,
  Vincent
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mips: Accidental removal of paravirt_cpus_done?

2014-06-13 Thread Ralf Baechle
On Fri, Jun 13, 2014 at 12:02:30PM +0200, Geert Uytterhoeven wrote:

 It seems you accidentally assimilated an (unwanted?) kvm change in my
 patch:

I accidentally must have done a git commit --amend with the wrong
patch on top, sorry about that.  The change itself was intensional.

  Ralf
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-13 Thread Paolo Bonzini

Il 13/06/2014 15:41, Vincent JARDIN ha scritto:

Fine, however Red Hat would also need a way to test ivshmem code, with
proper quality assurance (that also benefits upstream, of course).  With
ivshmem this is not possible without the out-of-tree packages.


You did not reply to my question: how to get the list of things that
are/will be disabled by Redhat?


I don't know exactly what the answer is, and this is probably not the 
right list to discuss it.  I guess there are partnership programs with 
Red Hat that I don't know the details of, but these are more for 
management folks and not really for developers.


ivshmem in particular was disabled even in RHEL7 beta, so you could have 
found out about this in December and opened a bug in Bugzilla about it.



I guess we can combine both. What's about something like:
  tests/virtio-net-test.c # qtest_add_func( is a nop)
but for ivshmem
  test/ivshmem-test.c
?

would it have any values?


The first things to do are:

1) try to understand if there is any value in a simplified shared memory 
device with no interrupts (and those no eventfd or uio dependencies, not 
even optionally).  You are not using them because DPDK only does polling 
and basically reserves a core for the NIC code. If so, this would be a 
very simple device, just a 100 or so lines of code.  We could get this 
in upstream, and it would be likely enabled in RHEL too.


2) if not, get the server and uio driver merged into the QEMU tree, and 
document the protocol in docs/specs/ivshmem_device_spec.txt.  It doesn't 
matter if the code comes from the Nahanni repository or from your own 
implementation.  Also start fixing bugs such as the ones that Markus 
reported (removing all exit() invocations).


Writing testcases using the qtest framework would also be useful, but 
first of all it is important to make ivshmem easier to use.



If not, what do you use at Redhat to test Qemu?


We do integration testing using autotest/virt-test (QEMU and KVM 
developers for upstream use it too) and also some manual functional tests.


Contributing ivshmem tests to the virt-test would also be helpful in 
demonstrating your interest in maintaining ivshmem.  The repository and 
documentation is at https://github.com/autotest/virt-test/ (a bit 
Fedora-centric).



I do repeat this use case that you had removed because vhost-user does
not solve it yet:


 - ivshmem - framework to be generic to have shared memory for many
use cases (HPC, in-memory-database, a network too like memnic).


Right, ivshmem is better for guest-to-guest.  vhost-user is not 
restricted to networking, but it is indeed more focused on 
guest-to-host.  ivshmem is usable for guest-to-host, but I would prefer 
still some hybrid that uses vhost-like messages to pass the shared 
memory fds to the external program.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB

 Why? What does this fix? Is this a bug fix, an enhancement? Don't 
 describe only what you do, but also why you do it.



This could result in page fault failures (unhandled page fault) because
even though we have a valid hpte entry mapping a 16MB page, since we
were comparing actual page size against page size calculated from SLB
bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
a failure in real and the bug was found during code audit. That could be
because with THP we have guest ram backed by hugetlbfs and we always
find the page in the host linux page table. The will result in do_h_enter always
inserting HPTE_V_VALID entry and hence we might not really end up calling
kvmppc_hv_find_lock_hpte.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

2014-06-13 Thread mihai.cara...@freescale.com
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, June 12, 2014 8:05 PM
 To: Caraman Mihai Claudiu-B02008
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Wood Scott-B07421
 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
 on vcpu schedule
 
 On 06/12/2014 04:00 PM, Mihai Caraman wrote:
  On vcpu schedule, the condition checked for tlb pollution is too tight.
  The tlb entries of one vcpu are polluted when a different vcpu from the
  same partition runs in-between. Relax the current tlb invalidation
  condition taking into account the lpid.
 
  Signed-off-by: Mihai Caraman mihai.caraman at freescale.com
 
 Your mailer is broken? :)
 This really should be an @.
 
 I think this should work. Scott, please ack.

Alex, you were right. I screwed up the patch description by inverting relax
and tight terms :) It should have been more like this:

KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

On vcpu schedule, the condition checked for tlb pollution is too loose.
The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu
within the same logical partition runs in-between. Optimize the tlb invalidation
condition taking into account the lpid.

-Mike
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Alexander Graf


On 13.06.14 16:28, Aneesh Kumar K.V wrote:

Alexander Graf ag...@suse.de writes:


On 13.06.14 09:23, Aneesh Kumar K.V wrote:

With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB

Why? What does this fix? Is this a bug fix, an enhancement? Don't
describe only what you do, but also why you do it.



This could result in page fault failures (unhandled page fault) because
even though we have a valid hpte entry mapping a 16MB page, since we
were comparing actual page size against page size calculated from SLB
bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
a failure in real and the bug was found during code audit. That could be
because with THP we have guest ram backed by hugetlbfs and we always
find the page in the host linux page table. The will result in do_h_enter always
inserting HPTE_V_VALID entry and hence we might not really end up calling
kvmppc_hv_find_lock_hpte.


So why do we need to override to base page size for the VRMA region? 
Also I think you want to change the comment above the line in 
find_lock_hpte you're changing.



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

2014-06-13 Thread Alexander Graf


On 13.06.14 16:43, mihai.cara...@freescale.com wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, June 12, 2014 8:05 PM
To: Caraman Mihai Claudiu-B02008
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
d...@lists.ozlabs.org; Wood Scott-B07421
Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
on vcpu schedule

On 06/12/2014 04:00 PM, Mihai Caraman wrote:

On vcpu schedule, the condition checked for tlb pollution is too tight.
The tlb entries of one vcpu are polluted when a different vcpu from the
same partition runs in-between. Relax the current tlb invalidation
condition taking into account the lpid.

Signed-off-by: Mihai Caraman mihai.caraman at freescale.com

Your mailer is broken? :)
This really should be an @.

I think this should work. Scott, please ack.

Alex, you were right. I screwed up the patch description by inverting relax
and tight terms :) It should have been more like this:

KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

On vcpu schedule, the condition checked for tlb pollution is too loose.
The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu
within the same logical partition runs in-between. Optimize the tlb invalidation
condition taking into account the lpid.


Can't we give every vcpu its own lpid? Or don't we trap on global 
invalidates?



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 16:28, Aneesh Kumar K.V wrote:
 Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB
 Why? What does this fix? Is this a bug fix, an enhancement? Don't
 describe only what you do, but also why you do it.


 This could result in page fault failures (unhandled page fault) because
 even though we have a valid hpte entry mapping a 16MB page, since we
 were comparing actual page size against page size calculated from SLB
 bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
 a failure in real and the bug was found during code audit. That could be
 because with THP we have guest ram backed by hugetlbfs and we always
 find the page in the host linux page table. The will result in do_h_enter 
 always
 inserting HPTE_V_VALID entry and hence we might not really end up calling
 kvmppc_hv_find_lock_hpte.

 So why do we need to override to base page size for the VRMA region?

slb encoding should be derived based on base page size. 

 Also I think you want to change the comment above the line in 
 find_lock_hpte you're changing.


Will do that.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

2014-06-13 Thread Scott Wood
On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote:
 On 13.06.14 16:43, mihai.cara...@freescale.com wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, June 12, 2014 8:05 PM
  To: Caraman Mihai Claudiu-B02008
  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
  d...@lists.ozlabs.org; Wood Scott-B07421
  Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
  on vcpu schedule
 
  On 06/12/2014 04:00 PM, Mihai Caraman wrote:
  On vcpu schedule, the condition checked for tlb pollution is too tight.
  The tlb entries of one vcpu are polluted when a different vcpu from the
  same partition runs in-between. Relax the current tlb invalidation
  condition taking into account the lpid.

Can you quantify the performance improvement from this?  We've had bugs
in this area before, so let's make sure it's worth it before making this
more complicated.

  Signed-off-by: Mihai Caraman mihai.caraman at freescale.com
  Your mailer is broken? :)
  This really should be an @.
 
  I think this should work. Scott, please ack.
  Alex, you were right. I screwed up the patch description by inverting relax
  and tight terms :) It should have been more like this:
 
  KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
 
  On vcpu schedule, the condition checked for tlb pollution is too loose.
  The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu
  within the same logical partition runs in-between. Optimize the tlb 
  invalidation
  condition taking into account the lpid.
 
 Can't we give every vcpu its own lpid? Or don't we trap on global 
 invalidates?

That would significantly increase the odds of exhausting LPIDs,
especially on large chips like t4240 with similarly large VMs.  If we
were to do that, the LPIDs would need to be dynamically assigned (like
PIDs), and should probably be a separate numberspace per physical core.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4 v3] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-06-13 Thread Scott Wood
On Thu, 2014-06-12 at 18:04 +0200, Alexander Graf wrote:
 On 06/02/2014 05:50 PM, Mihai Caraman wrote:
  On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
  guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
  and LRAT), generated by loading a guest address, needs to be handled by KVM.
  These exceptions are generated in a substituted guest translation context
  (EPLC[EGS] = 1) from host context (MSR[GS] = 0).
 
  Currently, KVM hooks only interrupts generated from guest context (MSR[GS] 
  = 1),
  doing minimal checks on the fast path to avoid host performance degradation.
  lwepx exceptions originate from host state (MSR[GS] = 0) which implies
  additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by 
  looking
  at the Exception Syndrome Register (ESR[EPID]) and the External PID Load 
  Context
  Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
  too intrusive for the host.
 
  Read guest last instruction from kvmppc_load_last_inst() by searching for 
  the
  physical address and kmap it. This address the TODO for TLB eviction and
  execute-but-not-read entries, and allow us to get rid of lwepx until we are
  able to handle failures.
 
  A simple stress benchmark shows a 1% sys performance degradation compared 
  with
  previous approach (lwepx without failure handling):
 
  time for i in `seq 1 1`; do /bin/echo  /dev/null; done
 
  real0m 8.85s
  user0m 4.34s
  sys 0m 4.48s
 
  vs
 
  real0m 8.84s
  user0m 4.36s
  sys 0m 4.44s
 
  An alternative solution, to handle lwepx exceptions in KVM, is to temporary
  highjack the interrupt vector from host. Some cores share host IVOR 
  registers
  between hardware threads, which is the case of FSL e6500, which impose 
  additional
  synchronization logic for this solution to work. This optimized solution can
  be developed later on top of this patch.
 
  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
  ---
  v3:
- reworked patch description
- use unaltered kmap addr for kunmap
- get last instruction before beeing preempted
 
  v2:
- reworked patch description
- used pr_* functions
- addressed cosmetic feedback
 
arch/powerpc/kvm/booke.c  | 32 
arch/powerpc/kvm/bookehv_interrupts.S | 37 --
arch/powerpc/kvm/e500_mmu_host.c  | 93 
  +++
3 files changed, 134 insertions(+), 28 deletions(-)
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index 34a42b9..4ef52a8 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -880,6 +880,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  int r = RESUME_HOST;
  int s;
  int idx;
  +   u32 last_inst = KVM_INST_FETCH_FAILED;
  +   enum emulation_result emulated = EMULATE_DONE;

  /* update before a new last_exit_type is rewritten */
  kvmppc_update_timing_stats(vcpu);
  @@ -887,6 +889,15 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  /* restart interrupts if they were meant for the host */
  kvmppc_restart_interrupt(vcpu, exit_nr);

  +   /*
  +* get last instruction before beeing preempted
  +* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR  ESR_DATA
  +*/
  +   if (exit_nr == BOOKE_INTERRUPT_DATA_STORAGE ||
  +   exit_nr == BOOKE_INTERRUPT_DTLB_MISS ||
  +   exit_nr == BOOKE_INTERRUPT_HV_PRIV)
 
 Please make this a switch() - that's easier to read.
 
  +   emulated = kvmppc_get_last_inst(vcpu, false, last_inst);
  +
  local_irq_enable();

  trace_kvm_exit(exit_nr, vcpu);
  @@ -895,6 +906,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  run-exit_reason = KVM_EXIT_UNKNOWN;
  run-ready_for_interrupt_injection = 1;

  +   switch (emulated) {
  +   case EMULATE_AGAIN:
  +   r = RESUME_GUEST;
  +   goto out;
  +
  +   case EMULATE_FAIL:
  +   pr_debug(%s: emulation at %lx failed (%08x)\n,
  +  __func__, vcpu-arch.pc, last_inst);
  +   /* For debugging, encode the failing instruction and
  +* report it to userspace. */
  +   run-hw.hardware_exit_reason = ~0ULL  32;
  +   run-hw.hardware_exit_reason |= last_inst;
  +   kvmppc_core_queue_program(vcpu, ESR_PIL);
  +   r = RESUME_HOST;
  +   goto out;
  +
  +   default:
  +   break;
  +   }
 
 I think you can just put this into a function.
 
 Scott, I think the patch overall looks quite good. Can you please check 
 as well and if you agree give it your reviewed-by? Mike, when Scott 
 gives you a reviewed-by, please include it for the next version.
 
 
 Alex
 
  +
  switch (exit_nr) {
  case BOOKE_INTERRUPT_MACHINE_CHECK:
  printk(MACHINE CHECK: %lx\n, mfspr(SPRN_MCSR));
  @@ -1184,6 +1215,7 @@ int 

[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
 {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
 
a_psize = __hpte_actual_psize(lp, size);
-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
 
}
return 0;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 87624ab5ba82..c6aca75b8376 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, 
unsigned long slb_v,
 * to check against the actual page size.
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Alexander Graf


On 13.06.14 09:23, Aneesh Kumar K.V wrote:

With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB


Why? What does this fix? Is this a bug fix, an enhancement? Don't 
describe only what you do, but also why you do it.



Alex



Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
  arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
  arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
  arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
  3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
  }
  
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)

+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
  {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
  
  			a_psize = __hpte_actual_psize(lp, size);

-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
  
  	}

return 0;
  }
  
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)

+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
  static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
  {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
  
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c

index 87624ab5ba82..c6aca75b8376 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, 
unsigned long slb_v,
 * to check against the actual page size.
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
  


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB

 Why? What does this fix? Is this a bug fix, an enhancement? Don't 
 describe only what you do, but also why you do it.



This could result in page fault failures (unhandled page fault) because
even though we have a valid hpte entry mapping a 16MB page, since we
were comparing actual page size against page size calculated from SLB
bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
a failure in real and the bug was found during code audit. That could be
because with THP we have guest ram backed by hugetlbfs and we always
find the page in the host linux page table. The will result in do_h_enter always
inserting HPTE_V_VALID entry and hence we might not really end up calling
kvmppc_hv_find_lock_hpte.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

2014-06-13 Thread mihai.cara...@freescale.com
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, June 12, 2014 8:05 PM
 To: Caraman Mihai Claudiu-B02008
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Wood Scott-B07421
 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
 on vcpu schedule
 
 On 06/12/2014 04:00 PM, Mihai Caraman wrote:
  On vcpu schedule, the condition checked for tlb pollution is too tight.
  The tlb entries of one vcpu are polluted when a different vcpu from the
  same partition runs in-between. Relax the current tlb invalidation
  condition taking into account the lpid.
 
  Signed-off-by: Mihai Caraman mihai.caraman at freescale.com
 
 Your mailer is broken? :)
 This really should be an @.
 
 I think this should work. Scott, please ack.

Alex, you were right. I screwed up the patch description by inverting relax
and tight terms :) It should have been more like this:

KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

On vcpu schedule, the condition checked for tlb pollution is too loose.
The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu
within the same logical partition runs in-between. Optimize the tlb invalidation
condition taking into account the lpid.

-Mike
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Alexander Graf


On 13.06.14 16:28, Aneesh Kumar K.V wrote:

Alexander Graf ag...@suse.de writes:


On 13.06.14 09:23, Aneesh Kumar K.V wrote:

With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB

Why? What does this fix? Is this a bug fix, an enhancement? Don't
describe only what you do, but also why you do it.



This could result in page fault failures (unhandled page fault) because
even though we have a valid hpte entry mapping a 16MB page, since we
were comparing actual page size against page size calculated from SLB
bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
a failure in real and the bug was found during code audit. That could be
because with THP we have guest ram backed by hugetlbfs and we always
find the page in the host linux page table. The will result in do_h_enter always
inserting HPTE_V_VALID entry and hence we might not really end up calling
kvmppc_hv_find_lock_hpte.


So why do we need to override to base page size for the VRMA region? 
Also I think you want to change the comment above the line in 
find_lock_hpte you're changing.



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

2014-06-13 Thread Alexander Graf


On 13.06.14 16:43, mihai.cara...@freescale.com wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, June 12, 2014 8:05 PM
To: Caraman Mihai Claudiu-B02008
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-
d...@lists.ozlabs.org; Wood Scott-B07421
Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
on vcpu schedule

On 06/12/2014 04:00 PM, Mihai Caraman wrote:

On vcpu schedule, the condition checked for tlb pollution is too tight.
The tlb entries of one vcpu are polluted when a different vcpu from the
same partition runs in-between. Relax the current tlb invalidation
condition taking into account the lpid.

Signed-off-by: Mihai Caraman mihai.caraman at freescale.com

Your mailer is broken? :)
This really should be an @.

I think this should work. Scott, please ack.

Alex, you were right. I screwed up the patch description by inverting relax
and tight terms :) It should have been more like this:

KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

On vcpu schedule, the condition checked for tlb pollution is too loose.
The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu
within the same logical partition runs in-between. Optimize the tlb invalidation
condition taking into account the lpid.


Can't we give every vcpu its own lpid? Or don't we trap on global 
invalidates?



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 16:28, Aneesh Kumar K.V wrote:
 Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB
 Why? What does this fix? Is this a bug fix, an enhancement? Don't
 describe only what you do, but also why you do it.


 This could result in page fault failures (unhandled page fault) because
 even though we have a valid hpte entry mapping a 16MB page, since we
 were comparing actual page size against page size calculated from SLB
 bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
 a failure in real and the bug was found during code audit. That could be
 because with THP we have guest ram backed by hugetlbfs and we always
 find the page in the host linux page table. The will result in do_h_enter 
 always
 inserting HPTE_V_VALID entry and hence we might not really end up calling
 kvmppc_hv_find_lock_hpte.

 So why do we need to override to base page size for the VRMA region?

slb encoding should be derived based on base page size. 

 Also I think you want to change the comment above the line in 
 find_lock_hpte you're changing.


Will do that.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

2014-06-13 Thread Scott Wood
On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote:
 On 13.06.14 16:43, mihai.cara...@freescale.com wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, June 12, 2014 8:05 PM
  To: Caraman Mihai Claudiu-B02008
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-
  d...@lists.ozlabs.org; Wood Scott-B07421
  Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
  on vcpu schedule
 
  On 06/12/2014 04:00 PM, Mihai Caraman wrote:
  On vcpu schedule, the condition checked for tlb pollution is too tight.
  The tlb entries of one vcpu are polluted when a different vcpu from the
  same partition runs in-between. Relax the current tlb invalidation
  condition taking into account the lpid.

Can you quantify the performance improvement from this?  We've had bugs
in this area before, so let's make sure it's worth it before making this
more complicated.

  Signed-off-by: Mihai Caraman mihai.caraman at freescale.com
  Your mailer is broken? :)
  This really should be an @.
 
  I think this should work. Scott, please ack.
  Alex, you were right. I screwed up the patch description by inverting relax
  and tight terms :) It should have been more like this:
 
  KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
 
  On vcpu schedule, the condition checked for tlb pollution is too loose.
  The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu
  within the same logical partition runs in-between. Optimize the tlb 
  invalidation
  condition taking into account the lpid.
 
 Can't we give every vcpu its own lpid? Or don't we trap on global 
 invalidates?

That would significantly increase the odds of exhausting LPIDs,
especially on large chips like t4240 with similarly large VMs.  If we
were to do that, the LPIDs would need to be dynamically assigned (like
PIDs), and should probably be a separate numberspace per physical core.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4 v3] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-06-13 Thread Scott Wood
On Thu, 2014-06-12 at 18:04 +0200, Alexander Graf wrote:
 On 06/02/2014 05:50 PM, Mihai Caraman wrote:
  On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
  guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
  and LRAT), generated by loading a guest address, needs to be handled by KVM.
  These exceptions are generated in a substituted guest translation context
  (EPLC[EGS] = 1) from host context (MSR[GS] = 0).
 
  Currently, KVM hooks only interrupts generated from guest context (MSR[GS] 
  = 1),
  doing minimal checks on the fast path to avoid host performance degradation.
  lwepx exceptions originate from host state (MSR[GS] = 0) which implies
  additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by 
  looking
  at the Exception Syndrome Register (ESR[EPID]) and the External PID Load 
  Context
  Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
  too intrusive for the host.
 
  Read guest last instruction from kvmppc_load_last_inst() by searching for 
  the
  physical address and kmap it. This address the TODO for TLB eviction and
  execute-but-not-read entries, and allow us to get rid of lwepx until we are
  able to handle failures.
 
  A simple stress benchmark shows a 1% sys performance degradation compared 
  with
  previous approach (lwepx without failure handling):
 
  time for i in `seq 1 1`; do /bin/echo  /dev/null; done
 
  real0m 8.85s
  user0m 4.34s
  sys 0m 4.48s
 
  vs
 
  real0m 8.84s
  user0m 4.36s
  sys 0m 4.44s
 
  An alternative solution, to handle lwepx exceptions in KVM, is to temporary
  highjack the interrupt vector from host. Some cores share host IVOR 
  registers
  between hardware threads, which is the case of FSL e6500, which impose 
  additional
  synchronization logic for this solution to work. This optimized solution can
  be developed later on top of this patch.
 
  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
  ---
  v3:
- reworked patch description
- use unaltered kmap addr for kunmap
- get last instruction before beeing preempted
 
  v2:
- reworked patch description
- used pr_* functions
- addressed cosmetic feedback
 
arch/powerpc/kvm/booke.c  | 32 
arch/powerpc/kvm/bookehv_interrupts.S | 37 --
arch/powerpc/kvm/e500_mmu_host.c  | 93 
  +++
3 files changed, 134 insertions(+), 28 deletions(-)
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index 34a42b9..4ef52a8 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -880,6 +880,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  int r = RESUME_HOST;
  int s;
  int idx;
  +   u32 last_inst = KVM_INST_FETCH_FAILED;
  +   enum emulation_result emulated = EMULATE_DONE;

  /* update before a new last_exit_type is rewritten */
  kvmppc_update_timing_stats(vcpu);
  @@ -887,6 +889,15 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  /* restart interrupts if they were meant for the host */
  kvmppc_restart_interrupt(vcpu, exit_nr);

  +   /*
  +* get last instruction before beeing preempted
  +* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR  ESR_DATA
  +*/
  +   if (exit_nr == BOOKE_INTERRUPT_DATA_STORAGE ||
  +   exit_nr == BOOKE_INTERRUPT_DTLB_MISS ||
  +   exit_nr == BOOKE_INTERRUPT_HV_PRIV)
 
 Please make this a switch() - that's easier to read.
 
  +   emulated = kvmppc_get_last_inst(vcpu, false, last_inst);
  +
  local_irq_enable();

  trace_kvm_exit(exit_nr, vcpu);
  @@ -895,6 +906,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  run-exit_reason = KVM_EXIT_UNKNOWN;
  run-ready_for_interrupt_injection = 1;

  +   switch (emulated) {
  +   case EMULATE_AGAIN:
  +   r = RESUME_GUEST;
  +   goto out;
  +
  +   case EMULATE_FAIL:
  +   pr_debug(%s: emulation at %lx failed (%08x)\n,
  +  __func__, vcpu-arch.pc, last_inst);
  +   /* For debugging, encode the failing instruction and
  +* report it to userspace. */
  +   run-hw.hardware_exit_reason = ~0ULL  32;
  +   run-hw.hardware_exit_reason |= last_inst;
  +   kvmppc_core_queue_program(vcpu, ESR_PIL);
  +   r = RESUME_HOST;
  +   goto out;
  +
  +   default:
  +   break;
  +   }
 
 I think you can just put this into a function.
 
 Scott, I think the patch overall looks quite good. Can you please check 
 as well and if you agree give it your reviewed-by? Mike, when Scott 
 gives you a reviewed-by, please include it for the next version.
 
 
 Alex
 
  +
  switch (exit_nr) {
  case BOOKE_INTERRUPT_MACHINE_CHECK:
  printk(MACHINE CHECK: %lx\n, mfspr(SPRN_MCSR));
  @@ -1184,6 +1215,7 @@ int