date:20140422

Re: [PATCH 4/5] KVM: x86: RSI/RDI/RCX are zero-extended when affected by string ops

2014-04-22 Thread Nadav Amit


Gleb,

On 4/20/14, 12:26 PM, Gleb Natapov wrote:

On Fri, Apr 18, 2014 at 07:11:33AM +0300, Nadav Amit wrote:

When using address-size override prefix with string instructions in long-mode,
ESI/EDI/ECX are zero extended if they are affected by the instruction
(incremented/decremented).  Currently, the KVM emulator does not do so.

In addition, although it is not well-documented, when address override prefix
is used with REP-string instruction, RCX high half is zeroed even if ECX was
zero on the first iteration. Therefore, the emulator should clear the upper
part of RCX in this case, as x86 CPUs do.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
:100644 100644 69e2636... a69ed67... M  arch/x86/kvm/emulate.c
  arch/x86/kvm/emulate.c |4 
  1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 69e2636..a69ed67 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -491,6 +491,8 @@ register_address_increment(struct x86_emulate_ctxt *ctxt, 
unsigned long *reg, in
else
mask = ad_mask(ctxt);
masked_increment(reg, mask, inc);
+   if (ctxt-ad_bytes == 4)
+   *reg = 0x;

*reg=(u32)*reg; and you can do it inside else part.

register_address_increment() is used also by jmp_rel and loop instructions,
is this correct for both of those too? Probably yes.


It appears to be so.
Results of 32-bit operations are implicitly zero extended to 64-bit 
values, and this appears to apply to all 32 bit operations, including 
implicit ones. Therefore it seems to apply to all these operations.



  }

  static void rsp_increment(struct x86_emulate_ctxt *ctxt, int inc)
@@ -4567,6 +4569,8 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
if (ctxt-rep_prefix  (ctxt-d  String)) {
/* All REP prefixes have the same first termination condition */
if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
+   if (ctxt-ad_bytes == 4)
+   *reg_write(ctxt, VCPU_REGS_RCX) = 0;

Does zero extension happens even if ECX was zero at the beginning on an 
instruction or only during
ECX modification. If later it is already covered in register_address_increment, 
no?
The observed behaviour of the Sandy-Bridge I use, is that even if ECX is 
zero on the first iteration, the high half of RCX is zeroed. Therefore, 
this is a different case, which was not covered in 
register_address_increment. I agree it is totally undocumented.
Following your previous comment - I may have missed the case in which 
loop instruction is executed with ECX = 0 while RCX != 0 and the address 
size is 32 bit. I will test this case soon (yet, it is lower on my 
priority list).


Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC untested] kvm/x86: implement hv EOI assist

2014-04-22 Thread Michael S. Tsirkin

On Mon, Apr 21, 2014 at 06:40:17PM -0300, Marcelo Tosatti wrote:
 On Sun, Apr 13, 2014 at 04:10:22PM +0300, Michael S. Tsirkin wrote:
  It seems that it's easy to implement the EOI assist
  on top of the PV EOI feature: simply convert the
  page address to the format expected by PV EOI.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Looks alright except:
 
 - There is no handling of PV EOI data to be performed at HV_X64_MSR_EOI write
 path?
 - Migration fails with PV-EOI not enabled in CPUID ? (perhaps could
   require it for PV-EOI over HV-APIC-ASSIST).
 - MS docs mention No EOI required is set only if interrupt injected is edge
   triggered.

Hmm I thought level interrupts are going through IOAPIC so that's already true 
isn't it?

if (!pv_eoi_enabled(vcpu) ||
/* IRR set or many bits in ISR: could be nested. */
apic-irr_pending ||
/* Cache not set: could be safe but we don't bother. */
apic-highest_isr_cache == -1 ||
---/* Need EOI to update ioapic. */
kvm_ioapic_handles_vector(vcpu-kvm, apic-highest_isr_cache)) {
/*
 * PV EOI was disabled by apic_sync_pv_eoi_from_guest
 * so we need not do anything here.
 */
return;
}

In any case if some interrupt handler ignores this bit because it's
level, that's harmless since it will do EOI and then we'll clear the
bit, right?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Christian Borntraeger

While preparing/testing some KVM on s390 patches for the next merge window 
(target is kvm/next which is based on 3.15-rc1) I faced a very severe 
performance hickup on guest paging (all anonymous memory).

All memory bound guests are in D state now and the system is barely unusable.

Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
mm: vmscan: do not swap anon pages just because free+file is low makes the 
problem go away.

According to /proc/vmstat the system is now in direct reclaim almost all the 
time for every page fault (more than 10x more direct reclaims than kswap 
reclaims)
With the patch being reverted everything is fine again.

Any ideas?

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 10/54] KVM: s390: Exploiting generic userspace interface for cmma

2014-04-22 Thread Christian Borntraeger

From: Dominik Dingel din...@linux.vnet.ibm.com

To enable CMMA and to reset its state we use the vm kvm_device ioctls,
encapsulating attributes within the KVM_S390_VM_MEM_CTRL group.

Signed-off-by: Dominik Dingel din...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 Documentation/virtual/kvm/devices/vm.txt | 16 
 arch/s390/include/uapi/asm/kvm.h |  7 ++
 arch/s390/kvm/kvm-s390.c | 43 
 3 files changed, 66 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/vm.txt 
b/Documentation/virtual/kvm/devices/vm.txt
index 562bee6..0d16f96 100644
--- a/Documentation/virtual/kvm/devices/vm.txt
+++ b/Documentation/virtual/kvm/devices/vm.txt
@@ -8,3 +8,19 @@ and controls.
 
 The groups and attributes per virtual machine, if any, are architecture
 specific.
+
+1. GROUP: KVM_S390_VM_MEM_CTRL
+Architectures: s390
+
+1.1. ATTRIBUTE: KVM_S390_VM_MEM_CTRL
+Parameters: none
+Returns: -EBUSY if already a vcpus is defined, otherwise 0
+
+Enables CMMA for the virtual machine
+
+1.2. ATTRIBUTE: KVM_S390_VM_CLR_CMMA
+Parameteres: none
+Returns: 0
+
+Clear the CMMA status for all guest pages, so any pages the guest marked
+as unused are again used any may not be reclaimed by the host.
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index c003c6a..e35c798 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -54,6 +54,13 @@ struct kvm_s390_io_adapter_req {
__u64 addr;
 };
 
+/* kvm attr_group  on vm fd */
+#define KVM_S390_VM_MEM_CTRL   0
+
+/* kvm attributes for mem_ctrl */
+#define KVM_S390_VM_MEM_ENABLE_CMMA0
+#define KVM_S390_VM_MEM_CLR_CMMA   1
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
/* general purpose regs for s390 */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index fc2fe49..fe2396c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -258,11 +258,43 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, 
struct kvm_enable_cap *cap)
return r;
 }
 
+static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+   int ret;
+   unsigned int idx;
+   switch (attr-attr) {
+   case KVM_S390_VM_MEM_ENABLE_CMMA:
+   ret = -EBUSY;
+   mutex_lock(kvm-lock);
+   if (atomic_read(kvm-online_vcpus) == 0) {
+   kvm-arch.use_cmma = 1;
+   ret = 0;
+   }
+   mutex_unlock(kvm-lock);
+   break;
+   case KVM_S390_VM_MEM_CLR_CMMA:
+   mutex_lock(kvm-lock);
+   idx = srcu_read_lock(kvm-srcu);
+   page_table_reset_pgste(kvm-arch.gmap-mm, 0, TASK_SIZE, false);
+   srcu_read_unlock(kvm-srcu, idx);
+   mutex_unlock(kvm-lock);
+   ret = 0;
+   break;
+   default:
+   ret = -ENXIO;
+   break;
+   }
+   return ret;
+}
+
 static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
 {
int ret;
 
switch (attr-group) {
+   case KVM_S390_VM_MEM_CTRL:
+   ret = kvm_s390_mem_control(kvm, attr);
+   break;
default:
ret = -ENXIO;
break;
@@ -281,6 +313,17 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct 
kvm_device_attr *attr)
int ret;
 
switch (attr-group) {
+   case KVM_S390_VM_MEM_CTRL:
+   switch (attr-attr) {
+   case KVM_S390_VM_MEM_ENABLE_CMMA:
+   case KVM_S390_VM_MEM_CLR_CMMA:
+   ret = 0;
+   break;
+   default:
+   ret = -ENXIO;
+   break;
+   }
+   break;
default:
ret = -ENXIO;
break;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL 08/54] KVM: s390: Per-vm kvm device controls

2014-04-22 Thread Christian Borntraeger

From: Dominik Dingel din...@linux.vnet.ibm.com

We sometimes need to get/set attributes specific to a virtual machine
and so need something else than ONE_REG.

Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vm file descriptor.

Signed-off-by: Dominik Dingel din...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Acked-by: Alexander Graf ag...@suse.de
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt|  8 ++---
 Documentation/virtual/kvm/devices/vm.txt | 10 ++
 arch/s390/kvm/kvm-s390.c | 54 
 include/uapi/linux/kvm.h |  1 +
 4 files changed, 69 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/vm.txt

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a9380ba5..2014ff1 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2314,8 +2314,8 @@ struct kvm_create_device {
 
 4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
 
-Capability: KVM_CAP_DEVICE_CTRL
-Type: device ioctl
+Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
+Type: device ioctl, vm ioctl
 Parameters: struct kvm_device_attr
 Returns: 0 on success, -1 on error
 Errors:
@@ -2340,8 +2340,8 @@ struct kvm_device_attr {
 
 4.81 KVM_HAS_DEVICE_ATTR
 
-Capability: KVM_CAP_DEVICE_CTRL
-Type: device ioctl
+Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
+Type: device ioctl, vm ioctl
 Parameters: struct kvm_device_attr
 Returns: 0 on success, -1 on error
 Errors:
diff --git a/Documentation/virtual/kvm/devices/vm.txt 
b/Documentation/virtual/kvm/devices/vm.txt
new file mode 100644
index 000..562bee6
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/vm.txt
@@ -0,0 +1,10 @@
+Generic vm interface
+
+
+The virtual machine device also accepts the ioctls KVM_SET_DEVICE_ATTR,
+KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same
+struct kvm_device_attr as other devices, but targets VM-wide settings
+and controls.
+
+The groups and attributes per virtual machine, if any, are architecture
+specific.
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 346a347..c335a2e 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -162,6 +162,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_IOEVENTFD:
case KVM_CAP_DEVICE_CTRL:
case KVM_CAP_ENABLE_CAP_VM:
+   case KVM_CAP_VM_ATTRIBUTES:
r = 1;
break;
case KVM_CAP_NR_VCPUS:
@@ -257,11 +258,43 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, 
struct kvm_enable_cap *cap)
return r;
 }
 
+static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+   int ret;
+
+   switch (attr-group) {
+   default:
+   ret = -ENXIO;
+   break;
+   }
+
+   return ret;
+}
+
+static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+   return -ENXIO;
+}
+
+static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+   int ret;
+
+   switch (attr-group) {
+   default:
+   ret = -ENXIO;
+   break;
+   }
+
+   return ret;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
struct kvm *kvm = filp-private_data;
void __user *argp = (void __user *)arg;
+   struct kvm_device_attr attr;
int r;
 
switch (ioctl) {
@@ -294,6 +327,27 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
break;
}
+   case KVM_SET_DEVICE_ATTR: {
+   r = -EFAULT;
+   if (copy_from_user(attr, (void __user *)arg, sizeof(attr)))
+   break;
+   r = kvm_s390_vm_set_attr(kvm, attr);
+   break;
+   }
+   case KVM_GET_DEVICE_ATTR: {
+   r = -EFAULT;
+   if (copy_from_user(attr, (void __user *)arg, sizeof(attr)))
+   break;
+   r = kvm_s390_vm_get_attr(kvm, attr);
+   break;
+   }
+   case KVM_HAS_DEVICE_ATTR: {
+   r = -EFAULT;
+   if (copy_from_user(attr, (void __user *)arg, sizeof(attr)))
+   break;
+   r = kvm_s390_vm_has_attr(kvm, attr);
+   break;
+   }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a8f4ee5..90acfe4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -743,6 +743,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_IOAPIC_POLARITY_IGNORED 97
 #define KVM_CAP_ENABLE_CAP_VM 98
 #define KVM_CAP_S390_IRQCHIP 99
+#define KVM_CAP_VM_ATTRIBUTES 100
 
 #ifdef

[GIT PULL 0/54] KVM: s390: Features and Fixes for 3.16

2014-04-22 Thread Christian Borntraeger

Marcelo, Gleb, Paolo,

here is an updated PULL request combining the previous pull requests
rebased against the new kvm/next.

As a followup email only the patches containing the device API documentation
are send everything else is the same or contains minor fixups/rebase changes.

See the tag message below for a summary of changes.

Christian

The following changes since commit 0f689a33ad17845363acdc6d52783befd6ad116c:

  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux (2014-04-16 11:28:25 
-0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-20140422

for you to fetch changes up to e325fe69aa37b485635521568651642791d6d140:

  KVM: s390: Factor out handle_itdb to handle TX aborts (2014-04-22 13:24:54 
+0200)


Lazy storage key handling
-
Linux does not use the ACC and F bits of the storage key. Newer Linux
versions also do not use the storage keys for dirty and reference
tracking. We can optimize the guest handling for those guests for faults
as well as page-in and page-out by simply not caring about the guest
visible storage key. We trap guest storage key instruction to enable
those keys only on demand.

Migration bitmap

Until now s390 never provided a proper dirty bitmap.  Let's provide a
proper migration bitmap for s390. We also change the user dirty tracking
to a fault based mechanism. This makes the host completely independent
from the storage keys. Long term this will allow us to back guest memory
with large pages.

per-VM device attributes

To avoid the introduction of new ioctls, let's provide the
attribute semanantic also on the VM-device.

Userspace controlled CMMA
-
The CMMA assist is changed from always on to on if requested via
per-VM device attributes. In addition a callback to reset all usage
states is provided.

Proper guest DAT handling for intercepts

While instructions handled by SIE take care of all addressing aspects,
KVM/s390 currently does not care about guest address translation of
intercepts. This worked out fine, because
- the s390 Linux kernel has a 1:1 mapping between kernel virtual-real
 for all pages up to memory size
- intercepts happen only for a small amount of cases
- all of these intercepts happen to be in the kernel text for current
  distros

Of course we need to be better for other intercepts, kernel modules etc.
We provide the infrastructure and rework all in-kernel intercepts to work
on logical addresses (paging etc) instead of real ones. The code has
been running internally for several months now, so it is time for going
public.

GDB support
---
We provide breakpoints, single stepping and watchpoints.

Fixes/Cleanups
--
- Improve program check delivery
- Factor out the handling of transactional memory  on program checks
- Use the existing define __LC_PGM_TDB
- Several cleanups in the lowcore structure
- Documentation

NOTES
-
- All patches touching base s390 are either ACKed or written by the s390
  maintainers
- One base KVM patch KVM: add kvm_is_error_gpa() helper
- One patch introduces the notion of VM device attributes


Christian Borntraeger (1):
  KVM: s390: Drop pending interrupts on guest exit

Cornelia Huck (1):
  KVM: s390: reinject io interrupt on tpi failure

David Hildenbrand (8):
  KVM: s390: extract irq parameters of intercepted program irqs
  KVM: s390: deliver program irq parameters and use correct ilc
  KVM: s390: emulate stctl and stctg
  KVM: s390: kernel header addition for guest debugging
  KVM: s390: hardware support for guest debugging
  KVM: s390: add documentation for diag 501
  KVM: s390: move timer interrupt checks into own functions
  KVM: s390: no timer interrupts when single-stepping a guest

Dominik Dingel (8):
  KVM: s390: Adding skey bit to mmu context
  KVM: s390: Clear storage keys
  KVM: s390: Allow skeys to be enabled for the current process
  KVM: s390: Don't enable skeys by default
  KVM: s390/mm: new gmap_test_and_clear_dirty function
  KVM: s390: Per-vm kvm device controls
  KVM: s390: make cmma usage conditionally
  KVM: s390: Exploiting generic userspace interface for cmma

Heiko Carstens (27):
  KVM: add kvm_is_error_gpa() helper
  KVM: s390: export test_vfacility()
  s390/ptrace: add struct psw and accessor function
  s390/ctl_reg: add union type for control register 0
  KVM: s390: add kvm_s390_logical_to_effective() helper
  KVM: s390: add 'pgm' member to kvm_vcpu_arch and helper function
  KVM: s390: add lowcore access functions
  KVM: s390: add architecture compliant guest access functions
  KVM: s390/sclp: correctly set eca siif bit

Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Christian Borntraeger

On 22/04/14 12:55, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).
 
 All memory bound guests are in D state now and the system is barely 
 unusable.
 
 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.
 
 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.
 
 Any ideas?

Here is an idea to tackle my problem and the original problem:

reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
also seems to make my system usable.

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
 */
if (global_reclaim(sc)) {
free = zone_page_state(zone, NR_FREE_PAGES);
-   if (unlikely(file + free = high_wmark_pages(zone))) {
+   if (unlikely(file + free = low_wmark_pages(zone))) {
scan_balance = SCAN_ANON;
goto out;
}

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug - IP Address

2014-04-22 Thread Stefan Hajnoczi

On Sat, Apr 19, 2014 at 11:41:33AM +0200, Ivan Stojcevic (Tronyx) wrote:
 I just found a very sensitive bug in KVM and Xen platforms. Easily if you 
 have VPS hosted on Xen or KVM you can assign yourself a IPv4 address for free 
 and bypass regular system with billing.
 I tried this on many VPS hosting companies and it work on all. If you would 
 like to talk with me about this, you can get me on skype: ivans2901

This doesn't sound like a bug in Xen or KVM.  Rather it's an issue with
the VPS providers you tested.  They should lock down their network
appropriately (i.e. only allow MACs and IPs assigned to the guest).

Similar issues can also happen with dedicated servers if the provider
has not configured their routers correctly.

Please get in touch with the VPS providers or post more details here if
you think the issue lies in QEMU/KVM.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Who signed gemu-1.7.1.tar.bz2?

2014-04-22 Thread Stefan Hajnoczi

On Wed, Apr 02, 2014 at 05:40:23PM -0700, Alex Davis wrote:
 and where is their gpg key?

Michael Roth mdr...@linux.vnet.ibm.com is doing releases:

http://pgp.mit.edu/pks/lookup?op=vindexsearch=0x3353C9CEF108B584

$ gpg --verify qemu-2.0.0.tar.bz2.sig 
gpg: Signature made Thu 17 Apr 2014 03:49:55 PM CEST using RSA key ID
F108B584
gpg: Good signature from Michael Roth fluks...@gmail.com
gpg: aka Michael Roth mdr...@utexas.edu
gpg: aka Michael Roth mdr...@linux.vnet.ibm.com

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Who signed gemu-1.7.1.tar.bz2?

2014-04-22 Thread Peter Maydell

On 22 April 2014 14:31, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, Apr 02, 2014 at 05:40:23PM -0700, Alex Davis wrote:
 and where is their gpg key?

 Michael Roth mdr...@linux.vnet.ibm.com is doing releases:

 http://pgp.mit.edu/pks/lookup?op=vindexsearch=0x3353C9CEF108B584

 $ gpg --verify qemu-2.0.0.tar.bz2.sig
 gpg: Signature made Thu 17 Apr 2014 03:49:55 PM CEST using RSA key ID
 F108B584
 gpg: Good signature from Michael Roth fluks...@gmail.com
 gpg: aka Michael Roth mdr...@utexas.edu
 gpg: aka Michael Roth mdr...@linux.vnet.ibm.com

NB that this is different from the key used to sign the 2.0 release tags
in git; that's expected since I did the tagging and Michael did the
tarballs.

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Who signed gemu-1.7.1.tar.bz2?

2014-04-22 Thread Michael Roth

Quoting Stefan Hajnoczi (2014-04-22 08:31:08)
 On Wed, Apr 02, 2014 at 05:40:23PM -0700, Alex Davis wrote:
  and where is their gpg key?
 
 Michael Roth mdr...@linux.vnet.ibm.com is doing releases:
 
 http://pgp.mit.edu/pks/lookup?op=vindexsearch=0x3353C9CEF108B584
 
 $ gpg --verify qemu-2.0.0.tar.bz2.sig 
 gpg: Signature made Thu 17 Apr 2014 03:49:55 PM CEST using RSA key ID
 F108B584
 gpg: Good signature from Michael Roth fluks...@gmail.com
 gpg: aka Michael Roth mdr...@utexas.edu
 gpg: aka Michael Roth mdr...@linux.vnet.ibm.com

Missed the context, but if this is specifically about 1.7.1:

1.7.1 was prior to me handling the release tarballs, Anthony actually
did the signing and uploading for that one. I'm a bit confused though,
as the key ID on that tarball is:

mdroth@loki:~/Downloads$ gpg --verify qemu-1.7.1.tar.bz2.sig 
gpg: Signature made Tue 25 Mar 2014 09:03:24 AM CDT using RSA key ID ADF0D2D9
gpg: Can't check signature: public key not found

I can't seem to locate ADF0D2D9 though:

  http://pgp.mit.edu/pks/lookup?search=0xADF0D2D9op=vindex

Anthony's normal key (for 1.6.0 and 1.7.0 at least) was 7C18C076:

  http://pgp.mit.edu/pks/lookup?search=0x7C18C076op=vindex

I think maybe Anthony might've signed it with a separate local key?

 
 Stefan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[kvm-kmod PATCH 2/2] sync: copy linux/vfio.h from kvm source tree

2014-04-22 Thread gsomlo

Signed-off-by: Gabriel Somlo so...@cmu.edu
---

vfio.c gets copied by sync, and it needs vfio.h. I don't think there's
an easy way to #define ourselves out of this one, copying vfio.h into
kvm-kmod/include/linux/ seems to be the path of least resistance...

Thanks,
   Gabriel

P.S. I'm not a native Python speaker, so, while the glob() line does the
job, it may not be the most beautiful way to express the new requirement :)

 sync | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sync b/sync
index e447ec4..9902a11 100755
--- a/sync
+++ b/sync
@@ -433,6 +433,7 @@ def header_sync(arch):
 T = 'header'
 rmtree(T)
 for file in (glob('%(linux)s/include/linux/kvm*.h' % { 'linux': linux }) +
+ glob('%(linux)s/include/linux/vfio.h' % { 'linux': linux }) +
  glob('%(linux)s/include/uapi/linux/kvm*.h' % { 'linux': linux 
})):
 out = ('%(T)s/include/linux/%(name)s'
% { 'T': T, 'name': os.path.basename(file) })
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[kvm-kmod PATCH 1/2] Provide VFIO_DMA_CC_IOMMU and cpu_notifier functions for kernels 3.15

2014-04-22 Thread gsomlo

Signed-off-by: Gabriel Somlo so...@cmu.edu
---

Jan,

I upgraded my fedora 20 box to 3.13.10-200.fc20.x86_64 today, and
noticed that in order to build the latest kvm via kvm-kmod, I need
these two patches (second patch to follow shortly).

Thanks,
   Gabriel

 x86/external-module-compat.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/x86/external-module-compat.h b/x86/external-module-compat.h
index 3c41c74..40c31f2 100644
--- a/x86/external-module-compat.h
+++ b/x86/external-module-compat.h
@@ -1472,6 +1472,16 @@ static inline void update_debugctlmsr(unsigned long 
debugctlmsr)
 #define POSTED_INTR_VECTOR 0xf2
 #endif
 
+#if LINUX_VERSION_CODE  KERNEL_VERSION(3,15,0)
+#define VFIO_DMA_CC_IOMMU  4
+static inline void cpu_notifier_register_begin(void) {}
+static inline void cpu_notifier_register_done(void) {}
+static inline int __register_hotcpu_notifier(struct notifier_block *nb)
+{
+   return 0;
+}
+#endif
+
 #ifndef MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS
 #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
 #endif
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Rafael Aquini

On Tue, Apr 22, 2014 at 10:40:17AM -0400, Rik van Riel wrote:
 On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
  On 22/04/14 12:55, Christian Borntraeger wrote:
  While preparing/testing some KVM on s390 patches for the next merge window 
  (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
  performance hickup on guest paging (all anonymous memory).
 
  All memory bound guests are in D state now and the system is barely 
  unusable.
 
  Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
  mm: vmscan: do not swap anon pages just because free+file is low makes 
  the problem go away.
 
  According to /proc/vmstat the system is now in direct reclaim almost all 
  the time for every page fault (more than 10x more direct reclaims than 
  kswap reclaims)
  With the patch being reverted everything is fine again.
 
  Any ideas?
  
  Here is an idea to tackle my problem and the original problem:
  
  reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
  also seems to make my system usable.
  
  --- a/mm/vmscan.c
  +++ b/mm/vmscan.c
  @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
  struct scan_control *sc,
   */
  if (global_reclaim(sc)) {
  free = zone_page_state(zone, NR_FREE_PAGES);
  -   if (unlikely(file + free = high_wmark_pages(zone))) {
  +   if (unlikely(file + free = low_wmark_pages(zone))) {
  scan_balance = SCAN_ANON;
  goto out;
  }
  
 
 Looks reasonable to me.
+1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-kmod PATCH 2/2] sync: copy linux/vfio.h from kvm source tree

2014-04-22 Thread Jan Kiszka

On 2014-04-22 16:52, gso...@gmail.com wrote:
 Signed-off-by: Gabriel Somlo so...@cmu.edu
 ---
 
 vfio.c gets copied by sync, and it needs vfio.h. I don't think there's
 an easy way to #define ourselves out of this one, copying vfio.h into
 kvm-kmod/include/linux/ seems to be the path of least resistance...
 
 Thanks,
Gabriel
 
 P.S. I'm not a native Python speaker, so, while the glob() line does the
 job, it may not be the most beautiful way to express the new requirement :)

I've a different mother languages as well ;).

Did you try if ...linux/{kvm*,vfio}.h works? If we have shell power here
for pattern matching, it should.

Jan

 
  sync | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/sync b/sync
 index e447ec4..9902a11 100755
 --- a/sync
 +++ b/sync
 @@ -433,6 +433,7 @@ def header_sync(arch):
  T = 'header'
  rmtree(T)
  for file in (glob('%(linux)s/include/linux/kvm*.h' % { 'linux': linux }) 
 +
 + glob('%(linux)s/include/linux/vfio.h' % { 'linux': linux }) 
 +
   glob('%(linux)s/include/uapi/linux/kvm*.h' % { 'linux': 
 linux })):
  out = ('%(T)s/include/linux/%(name)s'
 % { 'T': T, 'name': os.path.basename(file) })
 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Johannes Weiner

Hi Christian,

On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).
 
 All memory bound guests are in D state now and the system is barely 
 unusable.
 
 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.
 
 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.

Ouch.  Yes, I think we have to revert this for now.

How about this?

---
From: Johannes Weiner han...@cmpxchg.org
Subject: [patch] Revert mm: vmscan: do not swap anon pages just because
 free+file is low

This reverts commit 0bf1457f0cfc (mm: vmscan: do not swap anon pages
just because free+file is low) because it introduced a regression in
mostly-anonymous workloads, where reclaim would become ineffective and
trap every allocating task in direct reclaim.

The problem is that there is a runaway feedback loop in the scan
balance between file and anon, where the balance tips heavily towards
a tiny thrashing file LRU and anonymous pages are no longer being
looked at.  The commit in question removed the safe guard that would
detect such situations and respond with forced anonymous reclaim.

This commit was part of a series to fix premature swapping in loads
with relatively little cache, and while it made a small difference,
the cure is obviously worse than the disease.  Revert it.

Reported-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Johannes Weiner han...@cmpxchg.org
Cc: sta...@kernel.org [3.12+]
---
 mm/vmscan.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9b6497eda806..169acb8e31c9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
get_lru_size(lruvec, LRU_INACTIVE_FILE);
 
/*
+* Prevent the reclaimer from falling into the cache trap: as
+* cache pages start out inactive, every cache fault will tip
+* the scan balance towards the file LRU.  And as the file LRU
+* shrinks, so does the window for rotation from references.
+* This means we have a runaway feedback loop where a tiny
+* thrashing file LRU becomes infinitely more attractive than
+* anon pages.  Try to detect this based on file LRU size.
+*/
+   if (global_reclaim(sc)) {
+   unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
+
+   if (unlikely(file + free = high_wmark_pages(zone))) {
+   scan_balance = SCAN_ANON;
+   goto out;
+   }
+   }
+
+   /*
 * There is enough inactive page cache, do not reclaim
 * anything from the anonymous working set right now.
 */
-- 
1.9.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Rik van Riel

On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
 On 22/04/14 12:55, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).

 All memory bound guests are in D state now and the system is barely 
 unusable.

 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.

 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.

 Any ideas?
 
 Here is an idea to tackle my problem and the original problem:
 
 reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
 also seems to make my system usable.
 
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
 struct scan_control *sc,
  */
 if (global_reclaim(sc)) {
 free = zone_page_state(zone, NR_FREE_PAGES);
 -   if (unlikely(file + free = high_wmark_pages(zone))) {
 +   if (unlikely(file + free = low_wmark_pages(zone))) {
 scan_balance = SCAN_ANON;
 goto out;
 }
 

Looks reasonable to me.  Johannes?

-- 
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-04-22 Thread Ben Hutchings

On Mon, 2014-04-21 at 12:26 +0100, Luis Henriques wrote:
 Hi David,

 On Thu, Mar 27, 2014 at 03:29:56PM -0400, David Miller wrote:
  From: Zoltan Kiss zoltan.k...@citrix.com
  Date: Wed, 26 Mar 2014 22:37:45 +

   skb_zerocopy can copy elements of the frags array between skbs, but it 
   doesn't
   orphan them. Also, it doesn't handle errors, so this patch takes care of 
   that
   as well, and modify the callers accordingly. skb_tx_error() is also added 
   to
   the callers so they will signal the failed delivery towards the creator 
   of the
   skb.

   Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com

  Fingers crossed :-)  Applied, thanks Zoltan.
  --
  To unsubscribe from this list: send the line unsubscribe netdev in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html

 Could you please queue this for stable as well?  It has CVE-2014-2568
 assigned to it and I believe it is applicable to some of the trees.

It was applied to 3.13, but needs backporting to earlier versions.  I
posted my attempt in
1397429860.10849.86.ca...@deadeye.wl.decadent.org.uk but it needs
testing/reviewing.

Ben.

-- 
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2] KVM: x86: Check for host supported fields in shadow vmcs

2014-04-22 Thread Bandan Das

Paolo Bonzini pbonz...@redhat.com writes:

 Il 21/04/2014 15:20, Bandan Das ha scritto:
 +for (i = j = 0; i  max_shadow_read_write_fields; i++) {
 +

 Extra empty line.  Not a big deal, but...

 +switch (shadow_read_write_fields[i]) {
 +case GUEST_BNDCFGS:
 +if (!vmx_mpx_supported())
 +continue;
 +break;
 +default:
 +break;
 +}
 +
 +if (j  i)
 +shadow_read_write_fields[j] =
 +shadow_read_write_fields[i];
 +j++;

 ... you need to respin anyway because the j++ is wrong.  It should be
 inside the if.  If you prefer, you can put it in the lhs of the

j++ outside the if looks right to me. Can you please explain why
you think it shouldn't be that way ?

 assignment, otherwise a separate statement is fine by me too.

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-04-22 Thread Zoltan Kiss


On 22/04/14 16:38, Ben Hutchings wrote:

On Mon, 2014-04-21 at 12:26 +0100, Luis Henriques wrote:

Hi David,

On Thu, Mar 27, 2014 at 03:29:56PM -0400, David Miller wrote:

From: Zoltan Kiss zoltan.k...@citrix.com
Date: Wed, 26 Mar 2014 22:37:45 +


skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com


Fingers crossed :-)  Applied, thanks Zoltan.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Could you please queue this for stable as well?  It has CVE-2014-2568
assigned to it and I believe it is applicable to some of the trees.


It was applied to 3.13, but needs backporting to earlier versions.  I
posted my attempt in
1397429860.10849.86.ca...@deadeye.wl.decadent.org.uk but it needs
testing/reviewing.


This one is a different issue, although it is very similar.

Zoli

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Christian Borntraeger

On 22/04/14 17:06, Johannes Weiner wrote:
 Hi Christian,
 
 On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).

 All memory bound guests are in D state now and the system is barely 
 unusable.

 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.

 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.
 
 Ouch.  Yes, I think we have to revert this for now.
 
 How about this?
 
 ---
 From: Johannes Weiner han...@cmpxchg.org
 Subject: [patch] Revert mm: vmscan: do not swap anon pages just because
  free+file is low
 
 This reverts commit 0bf1457f0cfc (mm: vmscan: do not swap anon pages
 just because free+file is low) because it introduced a regression in
 mostly-anonymous workloads, where reclaim would become ineffective and
 trap every allocating task in direct reclaim.
 
 The problem is that there is a runaway feedback loop in the scan
 balance between file and anon, where the balance tips heavily towards
 a tiny thrashing file LRU and anonymous pages are no longer being
 looked at.  The commit in question removed the safe guard that would
 detect such situations and respond with forced anonymous reclaim.
 
 This commit was part of a series to fix premature swapping in loads
 with relatively little cache, and while it made a small difference,
 the cure is obviously worse than the disease.  Revert it.
 
 Reported-by: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Johannes Weiner han...@cmpxchg.org
 Cc: sta...@kernel.org   [3.12+]



This is certainly safer than my hack with low_wmark_pages. We have several 
cases where increasing the min_free_kbytes avoids going into direct reclaim for 
large host systems with heavy paging. So I guess my patch is just a trade off 
between the two cases, but it actually makes it still more likely to go into 
direct reclaim than your revert. So I prefer your revert

Acked-by: Christian Borntraeger borntrae...@de.ibm.com

 ---
  mm/vmscan.c | 18 ++
  1 file changed, 18 insertions(+)
 
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 9b6497eda806..169acb8e31c9 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, 
 struct scan_control *sc,
   get_lru_size(lruvec, LRU_INACTIVE_FILE);
 
   /*
 +  * Prevent the reclaimer from falling into the cache trap: as
 +  * cache pages start out inactive, every cache fault will tip
 +  * the scan balance towards the file LRU.  And as the file LRU
 +  * shrinks, so does the window for rotation from references.
 +  * This means we have a runaway feedback loop where a tiny
 +  * thrashing file LRU becomes infinitely more attractive than
 +  * anon pages.  Try to detect this based on file LRU size.
 +  */
 + if (global_reclaim(sc)) {
 + unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
 +
 + if (unlikely(file + free = high_wmark_pages(zone))) {
 + scan_balance = SCAN_ANON;
 + goto out;
 + }
 + }
 +
 + /*
* There is enough inactive page cache, do not reclaim
* anything from the anonymous working set right now.
*/
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-kmod PATCH 2/2] sync: copy linux/vfio.h from kvm source tree

2014-04-22 Thread Gabriel L. Somlo

On Tue, Apr 22, 2014 at 04:57:32PM +0200, Jan Kiszka wrote:
 On 2014-04-22 16:52, gso...@gmail.com wrote:
  Signed-off-by: Gabriel Somlo so...@cmu.edu
  ---
  
  vfio.c gets copied by sync, and it needs vfio.h. I don't think there's
  an easy way to #define ourselves out of this one, copying vfio.h into
  kvm-kmod/include/linux/ seems to be the path of least resistance...
  
  Thanks,
 Gabriel
  
  P.S. I'm not a native Python speaker, so, while the glob() line does the
  job, it may not be the most beautiful way to express the new requirement :)
 
 I've a different mother languages as well ;).
 
 Did you try if ...linux/{kvm*,vfio}.h works? If we have shell power here
 for pattern matching, it should.

I played around a bit, and I couldn't find a way to do full regex for
a glob() argument.

You can pick from a set of characters at a time:

'foo[0-9]bar' would match 'foo0bar', 'foo1bar', etc.

But nothing I found allows you to pick from a set of *substrings*,
which is what we'd need:

'foo(xyz|abc)bar' to match 'fooxyzbar' and 'fooabcbar' (but *not* say,
'foo123bar').

I tried parentheses, curly braces, with and without '\', with no success.

Not sure at this point there *is* a more eloquent way to express it
than what I sent you originally.

Learning Python *is* on my bucket list, just not right this minute... :)

Thanks,
--Gabriel

  
   sync | 1 +
   1 file changed, 1 insertion(+)
  
  diff --git a/sync b/sync
  index e447ec4..9902a11 100755
  --- a/sync
  +++ b/sync
  @@ -433,6 +433,7 @@ def header_sync(arch):
   T = 'header'
   rmtree(T)
   for file in (glob('%(linux)s/include/linux/kvm*.h' % { 'linux': linux 
  }) +
  + glob('%(linux)s/include/linux/vfio.h' % { 'linux': linux 
  }) +
glob('%(linux)s/include/uapi/linux/kvm*.h' % { 'linux': 
  linux })):
   out = ('%(T)s/include/linux/%(name)s'
  % { 'T': T, 'name': os.path.basename(file) })
  
 
 -- 
 Siemens AG, Corporate Technology, CT RTC ITP SES-DE
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Rafael Aquini

On Tue, Apr 22, 2014 at 11:06:56AM -0400, Johannes Weiner wrote:
 Hi Christian,
 
 On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
  While preparing/testing some KVM on s390 patches for the next merge window 
  (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
  performance hickup on guest paging (all anonymous memory).
  
  All memory bound guests are in D state now and the system is barely 
  unusable.
  
  Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
  mm: vmscan: do not swap anon pages just because free+file is low makes 
  the problem go away.
  
  According to /proc/vmstat the system is now in direct reclaim almost all 
  the time for every page fault (more than 10x more direct reclaims than 
  kswap reclaims)
  With the patch being reverted everything is fine again.
 
 Ouch.  Yes, I think we have to revert this for now.
 
 How about this?
 
 ---
 From: Johannes Weiner han...@cmpxchg.org
 Subject: [patch] Revert mm: vmscan: do not swap anon pages just because
  free+file is low
 
 This reverts commit 0bf1457f0cfc (mm: vmscan: do not swap anon pages
 just because free+file is low) because it introduced a regression in
 mostly-anonymous workloads, where reclaim would become ineffective and
 trap every allocating task in direct reclaim.
 
 The problem is that there is a runaway feedback loop in the scan
 balance between file and anon, where the balance tips heavily towards
 a tiny thrashing file LRU and anonymous pages are no longer being
 looked at.  The commit in question removed the safe guard that would
 detect such situations and respond with forced anonymous reclaim.
 
 This commit was part of a series to fix premature swapping in loads
 with relatively little cache, and while it made a small difference,
 the cure is obviously worse than the disease.  Revert it.
 
 Reported-by: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Johannes Weiner han...@cmpxchg.org
 Cc: sta...@kernel.org   [3.12+]
 ---
  mm/vmscan.c | 18 ++
  1 file changed, 18 insertions(+)
 
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 9b6497eda806..169acb8e31c9 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, 
 struct scan_control *sc,
   get_lru_size(lruvec, LRU_INACTIVE_FILE);
  
   /*
 +  * Prevent the reclaimer from falling into the cache trap: as
 +  * cache pages start out inactive, every cache fault will tip
 +  * the scan balance towards the file LRU.  And as the file LRU
 +  * shrinks, so does the window for rotation from references.
 +  * This means we have a runaway feedback loop where a tiny
 +  * thrashing file LRU becomes infinitely more attractive than
 +  * anon pages.  Try to detect this based on file LRU size.
 +  */
 + if (global_reclaim(sc)) {
 + unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
 +
 + if (unlikely(file + free = high_wmark_pages(zone))) {
 + scan_balance = SCAN_ANON;
 + goto out;
 + }
 + }
 +
 + /*
* There is enough inactive page cache, do not reclaim
* anything from the anonymous working set right now.
*/
 -- 
 1.9.2
 
Acked-by: Rafael Aquini aqu...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/2] expose invariant tsc flag for kvm guests

2014-04-22 Thread Marcelo Tosatti

see patches for details.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] KVM: x86: Check for host supported fields in shadow vmcs

2014-04-22 Thread Paolo Bonzini


Il 22/04/2014 12:25, Bandan Das ha scritto:

 +  if (j  i)
 +  shadow_read_write_fields[j] =
 +  shadow_read_write_fields[i];
 +  j++;


 ... you need to respin anyway because the j++ is wrong.  It should be
 inside the if.  If you prefer, you can put it in the lhs of the

j++ outside the if looks right to me. Can you please explain why
you think it shouldn't be that way ?



The way you wrote it, j will always be equal to i.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-22 Thread Marcelo Tosatti

Invariant TSC documentation mentions that invariant TSC will run at a
constant rate in all ACPI P-, C-. and T-states.

This is not the case if migration to a host with different TSC frequency 
is allowed, or if savevm is performed. So block migration/savevm.

Also do not expose invariant tsc flag by default.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-invariant-tsc/target-i386/kvm.c
===
--- qemu-invariant-tsc.orig/target-i386/kvm.c
+++ qemu-invariant-tsc/target-i386/kvm.c
@@ -33,6 +33,8 @@
 #include exec/ioport.h
 #include asm/hyperv.h
 #include hw/pci/pci.h
+#include migration/migration.h
+#include qapi/qmp/qerror.h
 
 //#define DEBUG_KVM
 
@@ -447,6 +449,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu-hyperv_relaxed_timing);
 }
 
+Error *invtsc_mig_blocker;
+
 #define KVM_MAX_CPUID_ENTRIES  100
 
 int kvm_arch_init_vcpu(CPUState *cs)
@@ -702,6 +706,16 @@ int kvm_arch_init_vcpu(CPUState *cs)
   !!(c-ecx  CPUID_EXT_SMX);
 }
 
+c = cpuid_find_entry(cpuid_data.cpuid, 0x8007, 0);
+if (c  (c-edx  18)) {
+/* for migration */
+invtsc_mig_blocker = NULL;
+error_set(invtsc_mig_blocker, QERR_MIGRATION_NOT_SUPPORTED, cpu);
+migrate_add_blocker(invtsc_mig_blocker);
+/* for savevm */
+vmstate_x86_cpu.unmigratable = 1;
+}
+
 cpuid_data.cpuid.padding = 0;
 r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data);
 if (r) {
Index: qemu-invariant-tsc/target-i386/cpu-qom.h
===
--- qemu-invariant-tsc.orig/target-i386/cpu-qom.h
+++ qemu-invariant-tsc/target-i386/cpu-qom.h
@@ -116,7 +116,7 @@ static inline X86CPU *x86_env_get_cpu(CP
 #define ENV_OFFSET offsetof(X86CPU, env)
 
 #ifndef CONFIG_USER_ONLY
-extern const struct VMStateDescription vmstate_x86_cpu;
+extern struct VMStateDescription vmstate_x86_cpu;
 #endif
 
 /**
Index: qemu-invariant-tsc/target-i386/machine.c
===
--- qemu-invariant-tsc.orig/target-i386/machine.c
+++ qemu-invariant-tsc/target-i386/machine.c
@@ -613,7 +613,7 @@ static const VMStateDescription vmstate_
 }
 };
 
-const VMStateDescription vmstate_x86_cpu = {
+VMStateDescription vmstate_x86_cpu = {
 .name = cpu,
 .version_id = 12,
 .minimum_version_id = 3,
Index: qemu-invariant-tsc/target-i386/cpu.c
===
--- qemu-invariant-tsc.orig/target-i386/cpu.c
+++ qemu-invariant-tsc/target-i386/cpu.c
@@ -1230,6 +1230,8 @@ static void host_x86_cpu_initfn(Object *
 
 for (w = 0; w  FEATURE_WORDS; w++) {
 FeatureWordInfo *wi = feature_word_info[w];
+if (w == FEAT_8000_0007_EDX)
+continue;
 env-features[w] =
 kvm_arch_get_supported_cpuid(s, wi-cpuid_eax, wi-cpuid_ecx,
  wi-cpuid_reg);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/2] target-i386: support invariant tsc flag

2014-04-22 Thread Marcelo Tosatti

Expose Invariant TSC flag, if KVM is enabled. From Intel documentation:

17.13.1 Invariant TSC The time stamp counter in newer processors may
support an enhancement, referred to as invariant TSC. Processorâs
support for invariant TSC is indicated by CPUID.8007H:EDX[8].
The invariant TSC will run at a constant rate in all ACPI P-, C-.
and T-states. This is the architectural behavior moving forward. On
processors with invariant TSC support, the OS may use the TSC for wall
clock timer services (instead of ACPI or HPET timers). TSC reads are
much more efficient and do not incur the overhead associated with a ring
transition or access to a platform resource.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-invariant-tsc/target-i386/cpu.c
===
--- qemu-invariant-tsc.orig/target-i386/cpu.c
+++ qemu-invariant-tsc/target-i386/cpu.c
@@ -262,6 +262,17 @@ static const char *cpuid_7_0_ebx_feature
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 };
 
+static const char *cpuid_apm_edx_feature_name[] = {
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+invtsc, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};
+
 typedef struct FeatureWordInfo {
 const char **feat_names;
 uint32_t cpuid_eax;   /* Input EAX for CPUID */
@@ -305,6 +316,11 @@ static FeatureWordInfo feature_word_info
 .cpuid_needs_ecx = true, .cpuid_ecx = 0,
 .cpuid_reg = R_EBX,
 },
+[FEAT_8000_0007_EDX] = {
+.feat_names = cpuid_apm_edx_feature_name,
+.cpuid_eax = 0x8007,
+.cpuid_reg = R_EDX,
+},
 };
 
 typedef struct X86RegisterInfo32 {
@@ -1740,6 +1756,7 @@ static void x86_cpu_parse_featurestr(CPU
 env-features[FEAT_1_ECX] |= plus_features[FEAT_1_ECX];
 env-features[FEAT_8000_0001_EDX] |= plus_features[FEAT_8000_0001_EDX];
 env-features[FEAT_8000_0001_ECX] |= plus_features[FEAT_8000_0001_ECX];
+env-features[FEAT_8000_0007_EDX] |= plus_features[FEAT_8000_0007_EDX];
 env-features[FEAT_C000_0001_EDX] |= plus_features[FEAT_C000_0001_EDX];
 env-features[FEAT_KVM] |= plus_features[FEAT_KVM];
 env-features[FEAT_SVM] |= plus_features[FEAT_SVM];
@@ -1748,6 +1765,7 @@ static void x86_cpu_parse_featurestr(CPU
 env-features[FEAT_1_ECX] = ~minus_features[FEAT_1_ECX];
 env-features[FEAT_8000_0001_EDX] = ~minus_features[FEAT_8000_0001_EDX];
 env-features[FEAT_8000_0001_ECX] = ~minus_features[FEAT_8000_0001_ECX];
+env-features[FEAT_8000_0007_EDX] = ~minus_features[FEAT_8000_0007_EDX];
 env-features[FEAT_C000_0001_EDX] = ~minus_features[FEAT_C000_0001_EDX];
 env-features[FEAT_KVM] = ~minus_features[FEAT_KVM];
 env-features[FEAT_SVM] = ~minus_features[FEAT_SVM];
@@ -2333,6 +2351,17 @@ void cpu_x86_cpuid(CPUX86State *env, uin
(AMD_ENC_ASSOC(L3_ASSOCIATIVITY)  12) | \
(L3_LINES_PER_TAG  8) | (L3_LINE_SIZE);
 break;
+case 0x8007:
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+
+if (kvm_enabled()) {
+*edx = env-features[FEAT_8000_0007_EDX];
+} else {
+*edx = 0;
+}
+break;
 case 0x8008:
 /* virtual  phys address size in low 2 bytes. */
 /* XXX: This value must match the one used in the MMU code. */
Index: qemu-invariant-tsc/target-i386/cpu.h
===
--- qemu-invariant-tsc.orig/target-i386/cpu.h
+++ qemu-invariant-tsc/target-i386/cpu.h
@@ -398,6 +398,7 @@ typedef enum FeatureWord {
 FEAT_7_0_EBX,   /* CPUID[EAX=7,ECX=0].EBX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_C000_0001_EDX, /* CPUID[C000_0001].EDX */
 FEAT_KVM,   /* CPUID[4000_0001].EAX (KVM_CPUID_FEATURES) */
 FEAT_SVM,   /* CPUID[8000_000A].EDX */


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3.13] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-04-22 Thread Ben Hutchings

From: Zoltan Kiss zoltan.k...@citrix.com

commit 36d5fe6a000790f56039afe26834265db0a3ad4c upstream.

skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com
Signed-off-by: David S. Miller da...@davemloft.net
[bwh: Backported to 3.13: skb_zerocopy() is new in 3.14, but was moved from a
 static function in nfnetlink_queue.  We need to patch that and its caller, but
 not openvswitch.]
Signed-off-by: Ben Hutchings b...@decadent.org.uk
---
On Tue, 2014-04-22 at 19:02 +0100, Zoltan Kiss wrote:
 On 22/04/14 16:38, Ben Hutchings wrote:
  On Mon, 2014-04-21 at 12:26 +0100, Luis Henriques wrote:
  Hi David,
 
  On Thu, Mar 27, 2014 at 03:29:56PM -0400, David Miller wrote:
  From: Zoltan Kiss zoltan.k...@citrix.com
  Date: Wed, 26 Mar 2014 22:37:45 +
 
  skb_zerocopy can copy elements of the frags array between skbs, but it 
  doesn't
  orphan them. Also, it doesn't handle errors, so this patch takes care of 
  that
  as well, and modify the callers accordingly. skb_tx_error() is also 
  added to
  the callers so they will signal the failed delivery towards the creator 
  of the
  skb.
 
  Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com
 
  Fingers crossed :-)  Applied, thanks Zoltan.
  --
  To unsubscribe from this list: send the line unsubscribe netdev in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
  Could you please queue this for stable as well?  It has CVE-2014-2568
  assigned to it and I believe it is applicable to some of the trees.
 
  It was applied to 3.13, but needs backporting to earlier versions.  I
  posted my attempt in
  1397429860.10849.86.ca...@deadeye.wl.decadent.org.uk but it needs
  testing/reviewing.
 
 This one is a different issue, although it is very similar.

Sorry for mixing these up.  I believe this bug has been present since
zerocopy was added to nfqueue in Linux 3.10 (commit ae08ce002108).  This
is the version I used for Debian's 3.13 branch, which might be usable
for older stable branches too.

Ben.

---
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -235,22 +235,23 @@ nfqnl_flush(struct nfqnl_instance *queue
spin_unlock_bh(queue-lock);
 }
 
-static void
+static int
 nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
 {
int i, j = 0;
int plen = 0; /* length of skb-head fragment */
+   int ret;
struct page *page;
unsigned int offset;
 
/* dont bother with small payloads */
-   if (len = skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
-   }
+   if (len = skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);
 
if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -268,6 +269,11 @@ nfqnl_zcopy(struct sk_buff *to, const st
to-len += len + plen;
to-data_len += len + plen;
 
+   if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
+   skb_tx_error(from);
+   return -ENOMEM;
+   }
+
for (i = 0; i  skb_shinfo(from)-nr_frags; i++) {
if (!len)
break;
@@ -278,6 +284,8 @@ nfqnl_zcopy(struct sk_buff *to, const st
j++;
}
skb_shinfo(to)-nr_frags = j;
+
+   return 0;
 }
 
 static int
@@ -374,13 +382,16 @@ nfqnl_build_packet_message(struct net *n
 
skb = nfnetlink_alloc_skb(net, size, queue-peer_portid,
  GFP_ATOMIC);
-   if (!skb)
+   if (!skb) {
+   skb_tx_error(entskb);
return NULL;
+   }
 
nlh = nlmsg_put(skb, 0, 0,
NFNL_SUBSYS_QUEUE  8 | NFQNL_MSG_PACKET,
sizeof(struct nfgenmsg), 0);
if (!nlh) {
+   skb_tx_error(entskb);
kfree_skb(skb);
return NULL;
}
@@ -504,13 +515,15 @@ nfqnl_build_packet_message(struct net *n
nla-nla_type = NFQA_PAYLOAD;
nla-nla_len = nla_attr_size(data_len);
 
-   nfqnl_zcopy(skb, entskb, data_len, hlen);
+   if (nfqnl_zcopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}
 
nlh-nlmsg_len = skb-len;
return skb;
 
 nla_put_failure:
+   skb_tx_error(entskb);

Re: [PATCH RFC untested] kvm/x86: implement hv EOI assist

2014-04-22 Thread Marcelo Tosatti

On Tue, Apr 22, 2014 at 12:11:47PM +0300, Michael S. Tsirkin wrote:
 On Mon, Apr 21, 2014 at 06:40:17PM -0300, Marcelo Tosatti wrote:
  On Sun, Apr 13, 2014 at 04:10:22PM +0300, Michael S. Tsirkin wrote:
   It seems that it's easy to implement the EOI assist
   on top of the PV EOI feature: simply convert the
   page address to the format expected by PV EOI.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
  
  Looks alright except:
  
  - There is no handling of PV EOI data to be performed at HV_X64_MSR_EOI 
  write
  path?
  - Migration fails with PV-EOI not enabled in CPUID ? (perhaps could
require it for PV-EOI over HV-APIC-ASSIST).
  - MS docs mention No EOI required is set only if interrupt injected is 
  edge
triggered.
 
 Hmm I thought level interrupts are going through IOAPIC so that's already 
 true isn't it?
 
 if (!pv_eoi_enabled(vcpu) ||
 /* IRR set or many bits in ISR: could be nested. */
 apic-irr_pending ||
 /* Cache not set: could be safe but we don't bother. */
 apic-highest_isr_cache == -1 ||
 ---/* Need EOI to update ioapic. */
 kvm_ioapic_handles_vector(vcpu-kvm, apic-highest_isr_cache)) {

Right.

 /*
  * PV EOI was disabled by apic_sync_pv_eoi_from_guest
  * so we need not do anything here.
  */
 return;
 }
 
 In any case if some interrupt handler ignores this bit because it's
 level, that's harmless since it will do EOI and then we'll clear the
 bit, right?

Yes.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-22 Thread Marcelo Tosatti

On Tue, Apr 22, 2014 at 04:10:44PM -0300, Marcelo Tosatti wrote:
 Invariant TSC documentation mentions that invariant TSC will run at a
 constant rate in all ACPI P-, C-. and T-states.
 
 This is not the case if migration to a host with different TSC frequency 
 is allowed, or if savevm is performed. So block migration/savevm.
 
 Also do not expose invariant tsc flag by default.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: qemu-invariant-tsc/target-i386/kvm.c
 ===
 --- qemu-invariant-tsc.orig/target-i386/kvm.c
 +++ qemu-invariant-tsc/target-i386/kvm.c
 @@ -33,6 +33,8 @@
  #include exec/ioport.h
  #include asm/hyperv.h
  #include hw/pci/pci.h
 +#include migration/migration.h
 +#include qapi/qmp/qerror.h
  
  //#define DEBUG_KVM
  
 @@ -447,6 +449,8 @@ static bool hyperv_enabled(X86CPU *cpu)
  cpu-hyperv_relaxed_timing);
  }
  
 +Error *invtsc_mig_blocker;
 +
  #define KVM_MAX_CPUID_ENTRIES  100
  
  int kvm_arch_init_vcpu(CPUState *cs)
 @@ -702,6 +706,16 @@ int kvm_arch_init_vcpu(CPUState *cs)
!!(c-ecx  CPUID_EXT_SMX);
  }
  
 +c = cpuid_find_entry(cpuid_data.cpuid, 0x8007, 0);
 +if (c  (c-edx  18)) {
 +/* for migration */
 +invtsc_mig_blocker = NULL;
 +error_set(invtsc_mig_blocker, QERR_MIGRATION_NOT_SUPPORTED, cpu);
 +migrate_add_blocker(invtsc_mig_blocker);
 +/* for savevm */
 +vmstate_x86_cpu.unmigratable = 1;
 +}

Oops, broken with multiple CPUs. Please review the rest.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] KVM: x86: Check for host supported fields in shadow vmcs

2014-04-22 Thread Bandan Das

Paolo Bonzini pbonz...@redhat.com writes:

 Il 22/04/2014 12:25, Bandan Das ha scritto:
  +   if (j  i)
  +   shadow_read_write_fields[j] =
  +   shadow_read_write_fields[i];
  +   j++;
 
  ... you need to respin anyway because the j++ is wrong.  It should be
  inside the if.  If you prefer, you can put it in the lhs of the
 j++ outside the if looks right to me. Can you please explain why
 you think it shouldn't be that way ?


 The way you wrote it, j will always be equal to i.

Right, and that's what we want unless we come across an unsupported 
field. We then overwrite the unsupported field with the next
field supported. j this way keeps track of the real length.

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] bridge: preserve random init MAC address

2014-04-22 Thread Luis R. Rodriguez

On Wed, Mar 19, 2014 at 7:05 PM, Luis R. Rodriguez mcg...@suse.com wrote:
 On Tue, Mar 18, 2014 at 08:10:56PM -0700, Stephen Hemminger wrote:
 On Wed, 12 Mar 2014 20:15:25 -0700
 Luis R. Rodriguez mcg...@do-not-panic.com wrote:

  As it is now if you add create a bridge it gets started
  with a random MAC address and if you then add a net_device
  as a slave but later kick it out you end up with a zero
  MAC address. Instead preserve the original random MAC
  address and use it.

 What is supposed to happen is that the recalculate chooses
 the lowest MAC address of the slaves. If there are no slaves
 it might as well just calculate a new random value. There is
 not great merit in preserving the original defunct address.

 Or something like this
 --- a/net/bridge/br_stp_if.c  2014-02-12 08:21:56.733857356 -0800
 +++ b/net/bridge/br_stp_if.c  2014-03-18 20:09:09.334388826 -0700
 @@ -235,6 +235,9 @@ bool br_stp_recalculate_bridge_id(struct
   addr = p-dev-dev_addr;

   }
 +
 + if (addr == br_mac_zero)
 + return false;  /* keep original address */

   if (ether_addr_equal(br-bridge_id.addr, addr))
   return false;   /* no change */

 that just keeps the old value.

 The old value could be a port which got root blocked, I think
 it can be confusing to see that happen. Either way feel free to
 make the call, I'll provide more details below on perhaps one reason
 to keep the original MAC address.

Stephen, I'd like to respin this series to address all pending
feedback, I'd still like your feedback / call / judgement on this
part. I'm fine either way, just wanted to ensure I highlight the
reasoning of why I kept the original random MAC address. Please keep
in mind that at this point I'm convinced bridging is the *wrong*
solution to networking with guests but it is being used in a lot of
current topologies, this would just help with smoothing out corner
cases.

 The bridge is in a meaningless state when there are no ports,

 Some virtualization topologies may want a backend with no link (or
 perhaps one which is dynamic, with the option to have none) to the
 internet but just a bridge so guests sharing the bridge can talk to
 each other. In this case bridging can be used only to link the
 batch of guests.

 In this case the bridge simply acts as a switch, but also can be used as the
 interface for DHCP, for example. In such a case the guests will be doing
 ARP to get to the DHCP server. There's a flurry of ways one can try to get
 all this meshed together including enablign an ARP proxy but I'm looking
 at ways to optimize this further -- but I'd like to address the current
 usage cases first.

 and when the first port is added back it will be used as the
 new bridge id.

 Sure. Let me know how you think we should proceed with this patch based
 on the above.

Thanks in advance.

  Luis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] bridge: trigger a bridge calculation upon port changes

2014-04-22 Thread Luis R. Rodriguez

On Tue, Mar 18, 2014 at 02:22:43PM -0700, Luis R. Rodriguez wrote:
 On Tue, Mar 18, 2014 at 01:46:49PM -0700, Cong Wang wrote:
  On Fri, Mar 14, 2014 at 6:39 PM, Luis R. Rodriguez mcg...@suse.com wrote:
   On Thu, Mar 13, 2014 at 11:26:25AM -0700, Cong Wang wrote:
   On Wed, Mar 12, 2014 at 8:15 PM, Luis R. Rodriguez
   mcg...@do-not-panic.com wrote:
spin_unlock_bh(p-br-lock);
+   if (changed)
+   
call_netdevice_notifiers(NETDEV_CHANGEADDR,
+p-br-dev);
+   netdev_update_features(p-br-dev);
  
   I think this is supposed to be in netdev event handler of br-dev
   instead of here.
  
   Do you mean netdev_update_features() ? I mimic'd what was being done on
   br_del_if() given that root blocking is doing something similar. If
   we need to change something for the above then I suppose it means we need
   to change br_del_if() too. Let me know if you see any reason for something
   else.
  
  
  Yeah, for me it looks like it's better to call netdev_update_features()
  in the event handler of br-dev, rather than where calling
  call_netdevice_notifiers(..., br-dev);.
 
 I still don't see why, in fact trying to verify this I am wondering now
 if instead we should actually fix br_features_recompute() to take into
 consideration BR_ROOT_BLOCK as below. Notice how netdev_update_features()
 is called above even if the MAC address did not change, just as is done
 on br_del_if(). There is an NETDEV_FEAT_CHANGE event so would it be more
 appropriate we just call
 
 call_netdevice_notifiers(NETDEV_FEAT_CHANGE, p-br-dev)
 
 for both the above then and also br_del_if()? How about the below
 change?
 
 diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
 index 54d207d..dcd9378 100644
 --- a/net/bridge/br_if.c
 +++ b/net/bridge/br_if.c
 @@ -315,6 +315,8 @@ netdev_features_t br_features_recompute(struct net_bridge 
 *br,
   features = ~NETIF_F_ONE_FOR_ALL;
  
   list_for_each_entry(p, br-port_list, list) {
 + if (p-flags  BR_ROOT_BLOCK)
 + continue;
   features = netdev_increment_features(features,
p-dev-features, mask);
   }

Cong, can you provide feedback on this? I tried to grow confidence on the
hunk above but its not clear but the other points still hold and I'd love
your feedback on those.

  Luis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: async_pf: kill the unnecessary use_mm/unuse_mm async_pf_execute()

2014-04-22 Thread Christian Borntraeger

On 21/04/14 15:25, Oleg Nesterov wrote:
 async_pf_execute() has no reasons to adopt apf-mm, gup(current, mm)
 should work just fine even if current has another or NULL -mm.
 
 Recently kvm_async_page_present_sync() was added insedie the use_mm
 section, but it seems that it doesn't need current-mm too.
 
 Signed-off-by: Oleg Nesterov o...@redhat.com

Indeed, use/unuse_mm should only be necessary for copy_to/from_user etc.
This is fine for s390, but it seems that x86 kvm_arch_async_page_not_present
might call apf_put_user which might call copy_to_user, so this is not ok, I 
guess.


 ---
  virt/kvm/async_pf.c |2 --
  1 files changed, 0 insertions(+), 2 deletions(-)
 
 diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
 index 10df100..0ced4f3 100644
 --- a/virt/kvm/async_pf.c
 +++ b/virt/kvm/async_pf.c
 @@ -80,12 +80,10 @@ static void async_pf_execute(struct work_struct *work)
 
   might_sleep();
 
 - use_mm(mm);
   down_read(mm-mmap_sem);
   get_user_pages(current, mm, addr, 1, 1, 0, NULL, NULL);
   up_read(mm-mmap_sem);
   kvm_async_page_present_sync(vcpu, apf);
 - unuse_mm(mm);
 
   spin_lock(vcpu-async_pf.lock);
   list_add_tail(apf-link, vcpu-async_pf.done);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-22 Thread Eduardo Habkost

On Tue, Apr 22, 2014 at 04:10:44PM -0300, Marcelo Tosatti wrote:
 Invariant TSC documentation mentions that invariant TSC will run at a
 constant rate in all ACPI P-, C-. and T-states.
 
 This is not the case if migration to a host with different TSC frequency 
 is allowed, or if savevm is performed. So block migration/savevm.
 
 Also do not expose invariant tsc flag by default.

What do you mean do not expose invtsc by default, exactly? It is
already not exposed by default because the default CPU model is qemu64
and qemu64 doesn't have it enabled.

 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: qemu-invariant-tsc/target-i386/kvm.c
 ===
 --- qemu-invariant-tsc.orig/target-i386/kvm.c
 +++ qemu-invariant-tsc/target-i386/kvm.c
 @@ -33,6 +33,8 @@
  #include exec/ioport.h
  #include asm/hyperv.h
  #include hw/pci/pci.h
 +#include migration/migration.h
 +#include qapi/qmp/qerror.h
  
  //#define DEBUG_KVM
  
 @@ -447,6 +449,8 @@ static bool hyperv_enabled(X86CPU *cpu)
  cpu-hyperv_relaxed_timing);
  }
  
 +Error *invtsc_mig_blocker;
 +
  #define KVM_MAX_CPUID_ENTRIES  100
  
  int kvm_arch_init_vcpu(CPUState *cs)
 @@ -702,6 +706,16 @@ int kvm_arch_init_vcpu(CPUState *cs)
!!(c-ecx  CPUID_EXT_SMX);
  }
  
 +c = cpuid_find_entry(cpuid_data.cpuid, 0x8007, 0);
 +if (c  (c-edx  18)) {
 +/* for migration */
 +invtsc_mig_blocker = NULL;
 +error_set(invtsc_mig_blocker, QERR_MIGRATION_NOT_SUPPORTED, cpu);
 +migrate_add_blocker(invtsc_mig_blocker);
 +/* for savevm */
 +vmstate_x86_cpu.unmigratable = 1;
 +}
 +
  cpuid_data.cpuid.padding = 0;
  r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data);
  if (r) {
 Index: qemu-invariant-tsc/target-i386/cpu-qom.h
 ===
 --- qemu-invariant-tsc.orig/target-i386/cpu-qom.h
 +++ qemu-invariant-tsc/target-i386/cpu-qom.h
 @@ -116,7 +116,7 @@ static inline X86CPU *x86_env_get_cpu(CP
  #define ENV_OFFSET offsetof(X86CPU, env)
  
  #ifndef CONFIG_USER_ONLY
 -extern const struct VMStateDescription vmstate_x86_cpu;
 +extern struct VMStateDescription vmstate_x86_cpu;
  #endif
  
  /**
 Index: qemu-invariant-tsc/target-i386/machine.c
 ===
 --- qemu-invariant-tsc.orig/target-i386/machine.c
 +++ qemu-invariant-tsc/target-i386/machine.c
 @@ -613,7 +613,7 @@ static const VMStateDescription vmstate_
  }
  };
  
 -const VMStateDescription vmstate_x86_cpu = {
 +VMStateDescription vmstate_x86_cpu = {
  .name = cpu,
  .version_id = 12,
  .minimum_version_id = 3,
 Index: qemu-invariant-tsc/target-i386/cpu.c
 ===
 --- qemu-invariant-tsc.orig/target-i386/cpu.c
 +++ qemu-invariant-tsc/target-i386/cpu.c
 @@ -1230,6 +1230,8 @@ static void host_x86_cpu_initfn(Object *
  
  for (w = 0; w  FEATURE_WORDS; w++) {
  FeatureWordInfo *wi = feature_word_info[w];
 +if (w == FEAT_8000_0007_EDX)
 +continue;

This breaks the assumption that -cpu host contains all features that
can be enabled in a given host. IIRC, Igor even has plans to make
kvm_check_features_against_host() use the feature data from the host
CPU model instead of calling kvm_arch_get_supported_cpuid() directly.

-cpu host is already very likely to break migration, and I was already
willing to block migration when -cpu host was used. So I really don't
think that having migration prevented as a side-effect of using -cpu
host would be a problem.

In other words: what about simply dropping this hunk and letting invtsc
be enabled when using -cpu host?

  env-features[w] =
  kvm_arch_get_supported_cpuid(s, wi-cpuid_eax, wi-cpuid_ecx,
   wi-cpuid_reg);
 
 

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: async_pf: mm-mm_users can not pin apf-mm

2014-04-22 Thread Christian Borntraeger

On 21/04/14 15:26, Oleg Nesterov wrote:
 get_user_pages(mm) is simply wrong if mm-mm_users == 0 and exit_mmap/etc
 was already called (or is in progress), mm-mm_count can only pin mm-pgd
 and mm_struct itself.
 
 Change kvm_setup_async_pf/async_pf_execute to inc/dec mm-mm_users.
 
 kvm_create_vm/kvm_destroy_vm play with -mm_count too but this case looks
 fine at first glance, it seems that this -mm is only used to verify that
 current-mm == kvm-mm.
 
 Signed-off-by: Oleg Nesterov o...@redhat.com

This looks fine, from what I can tell. The old code has the problem mentioned
in your patch description, so your version is probably better.


 ---
  virt/kvm/async_pf.c |8 
  1 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
 index 0ced4f3..cda703e 100644
 --- a/virt/kvm/async_pf.c
 +++ b/virt/kvm/async_pf.c
 @@ -99,7 +99,7 @@ static void async_pf_execute(struct work_struct *work)
   if (waitqueue_active(vcpu-wq))
   wake_up_interruptible(vcpu-wq);
 
 - mmdrop(mm);
 + mmput(mm);
   kvm_put_kvm(vcpu-kvm);
  }
 
 @@ -116,7 +116,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu 
 *vcpu)
   flush_work(work-work);
  #else
   if (cancel_work_sync(work-work)) {
 - mmdrop(work-mm);
 + mmput(work-mm);
   kvm_put_kvm(vcpu-kvm); /* == work-vcpu-kvm */
   kmem_cache_free(async_pf_cache, work);
   }
 @@ -181,7 +181,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
 unsigned long hva,
   work-addr = hva;
   work-arch = *arch;
   work-mm = current-mm;
 - atomic_inc(work-mm-mm_count);
 + atomic_inc(work-mm-mm_users);
   kvm_get_kvm(work-vcpu-kvm);
 
   /* this can't really happen otherwise gfn_to_pfn_async
 @@ -199,7 +199,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
 unsigned long hva,
   return 1;
  retry_sync:
   kvm_put_kvm(work-vcpu-kvm);
 - mmdrop(work-mm);
 + mmput(work-mm);
   kmem_cache_free(async_pf_cache, work);
   return 0;
  }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: async_pf: kill the unnecessary use_mm/unuse_mm async_pf_execute()

2014-04-22 Thread Christian Borntraeger

On 22/04/14 22:15, Christian Borntraeger wrote:
 On 21/04/14 15:25, Oleg Nesterov wrote:
 async_pf_execute() has no reasons to adopt apf-mm, gup(current, mm)
 should work just fine even if current has another or NULL -mm.

 Recently kvm_async_page_present_sync() was added insedie the use_mm
 section, but it seems that it doesn't need current-mm too.

 Signed-off-by: Oleg Nesterov o...@redhat.com
 
 Indeed, use/unuse_mm should only be necessary for copy_to/from_user etc.
 This is fine for s390, but it seems that x86 kvm_arch_async_page_not_present
 might call apf_put_user which might call copy_to_user, so this is not ok, I 
 guess.

wanted to say kvm_arch_async_page_not_present, but I have to correct myself.
x86 does the page is there in the cpu loop, not in the worker. The cpu look 
d oes have a valid mm. So this patch should be also ok.
 
 
 ---
  virt/kvm/async_pf.c |2 --
  1 files changed, 0 insertions(+), 2 deletions(-)

 diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
 index 10df100..0ced4f3 100644
 --- a/virt/kvm/async_pf.c
 +++ b/virt/kvm/async_pf.c
 @@ -80,12 +80,10 @@ static void async_pf_execute(struct work_struct *work)

  might_sleep();

 -use_mm(mm);
  down_read(mm-mmap_sem);
  get_user_pages(current, mm, addr, 1, 1, 0, NULL, NULL);
  up_read(mm-mmap_sem);
  kvm_async_page_present_sync(vcpu, apf);
 -unuse_mm(mm);

  spin_lock(vcpu-async_pf.lock);
  list_add_tail(apf-link, vcpu-async_pf.done);

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-22 Thread Marcelo Tosatti

On Tue, Apr 22, 2014 at 05:38:07PM -0300, Eduardo Habkost wrote:
 On Tue, Apr 22, 2014 at 04:10:44PM -0300, Marcelo Tosatti wrote:
  Invariant TSC documentation mentions that invariant TSC will run at a
  constant rate in all ACPI P-, C-. and T-states.
  
  This is not the case if migration to a host with different TSC frequency 
  is allowed, or if savevm is performed. So block migration/savevm.
  
  Also do not expose invariant tsc flag by default.
 
 What do you mean do not expose invtsc by default, exactly? It is
 already not exposed by default because the default CPU model is qemu64
 and qemu64 doesn't have it enabled.

Should be do not expose invtsc by default with -cpu host.

Since it blocks migration, i considered it a special flag that should 
be set when user is aware that migration is going to be blocked.

Makes sense?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/3] nVMX: Fixes to run Xen as L1

2014-04-22 Thread Marcelo Tosatti

On Sat, Apr 19, 2014 at 06:17:43PM -0400, Bandan Das wrote:
 Minor changes to enable Xen as a L1 hypervisor.
 
 Tested with a Haswell host, Xen-4.3 L1 and debian6 L2

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC untested] kvm/x86: implement hv EOI assist

2014-04-22 Thread Michael S. Tsirkin

On Tue, Apr 22, 2014 at 04:26:48PM -0300, Marcelo Tosatti wrote:
 On Tue, Apr 22, 2014 at 12:11:47PM +0300, Michael S. Tsirkin wrote:
  On Mon, Apr 21, 2014 at 06:40:17PM -0300, Marcelo Tosatti wrote:
   On Sun, Apr 13, 2014 at 04:10:22PM +0300, Michael S. Tsirkin wrote:
It seems that it's easy to implement the EOI assist
on top of the PV EOI feature: simply convert the
page address to the format expected by PV EOI.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
   
   Looks alright except:
   
   - There is no handling of PV EOI data to be performed at HV_X64_MSR_EOI 
   write
   path?

I thought HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
replacement that lets you do EOI with an MSR and not IO.
No?


   - Migration fails with PV-EOI not enabled in CPUID ? (perhaps could
 require it for PV-EOI over HV-APIC-ASSIST).

Did not try migration yet - could you explain the issue please?
HV-APIC-ASSIST MSR is in the list of saved MSRs ...

   - MS docs mention No EOI required is set only if interrupt injected is 
   edge
 triggered.
  
  Hmm I thought level interrupts are going through IOAPIC so that's already 
  true isn't it?
  
  if (!pv_eoi_enabled(vcpu) ||
  /* IRR set or many bits in ISR: could be nested. */
  apic-irr_pending ||
  /* Cache not set: could be safe but we don't bother. */
  apic-highest_isr_cache == -1 ||
  ---/* Need EOI to update ioapic. */
  kvm_ioapic_handles_vector(vcpu-kvm, apic-highest_isr_cache)) {
 
 Right.
 
  /*
   * PV EOI was disabled by apic_sync_pv_eoi_from_guest
   * so we need not do anything here.
   */
  return;
  }
  
  In any case if some interrupt handler ignores this bit because it's
  level, that's harmless since it will do EOI and then we'll clear the
  bit, right?
 
 Yes.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/4] live migration dirty bitmap support for ARMv7

2014-04-22 Thread Mario Smarduch

Hi,
 this the third iteration of live migration support for the time being on 
ARMv7. The patches depend on Eric Augers patch for memory regions.

Changes since v2:
- move initial VM write protect to memory region architecture prepare function
  (needed to make dirty logging function generic) 
- added stage2_mark_pte_ro() - to mark ptes ro - Marc's comment
- optimized initial VM memory region write protect to do fewer table lookups -
  applied Marc's comment for walking dirty bitmap mask
- added pud_addr_end() for stage2 tables, to make the walk 4-level
- added kvm_flush_remote_tlbs() to use ARM TLB invalidation, made the generic
  one weak, Marc's comment to for generic dirty bitmap log function
- optimized walking dirty bit map mask to skip upper tables - Marc's comment
- deleted x86,arm kvm_vm_ioctl_get_dirty_log(), moved to kvm_main.c tagged 
  the function weak - Marc's comment
- changed Data Abort handler pte index handling - Marc's comment


Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs without address param
  live migration support for initial write protect of VM to track dirty
pages
  live migration support for VM dirty log management
  add 2nd stage page fault handling during live migration

 arch/arm/include/asm/kvm_asm.h  |1 +
 arch/arm/include/asm/kvm_host.h |   13 ++
 arch/arm/kvm/arm.c  |8 +-
 arch/arm/kvm/interrupts.S   |5 +
 arch/arm/kvm/mmu.c  |  303 ++-
 arch/x86/kvm/x86.c  |   78 --
 virt/kvm/kvm_main.c |   87 ++-
 7 files changed, 409 insertions(+), 86 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/4] add ARMv7 HYP API to flush VM TLBs without address param

2014-04-22 Thread Mario Smarduch

Add HYP interface for global VM TLB invalidation without address
parameter.

- Added ARM version of kvm_flush_remote_tlbs()

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/include/asm/kvm_asm.h  |1 +
 arch/arm/include/asm/kvm_host.h |2 ++
 arch/arm/kvm/interrupts.S   |5 +
 arch/arm/kvm/mmu.c  |   10 ++
 4 files changed, 18 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 661da11..0eeaca1 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -76,6 +76,7 @@ extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 098f7dd..1e739f9 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -228,4 +228,6 @@ int kvm_perf_teardown(void);
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
+void kvm_tlb_flush_vmid(struct kvm *kvm);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 0d68d40..8620280 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -45,8 +45,12 @@ __kvm_hyp_code_start:
  *
  * As v7 does not support flushing per IPA, just nuke the whole TLB
  * instead, ignoring the ipa value.
+ *
+ * void __kvm_tlb_flush_vm(struct kvm *kvm) - alias on ARMv7 for global VM TLB
+ * flush with no address parameters.
  */
 ENTRY(__kvm_tlb_flush_vmid_ipa)
+ENTRY(__kvm_tlb_flush_vmid)
push{r2, r3}
 
dsb ishst
@@ -65,6 +69,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
pop {r2, r3}
bx  lr
 ENDPROC(__kvm_tlb_flush_vmid_ipa)
+ENDPROC(__kvm_tlb_flush_vmid)
 
 /
  * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e8580e2..7ab77f3 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -56,6 +56,16 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 
phys_addr_t ipa)
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+/* Function reuses __kvm_tlb_flush_vmid_ipa() HYP interface without additional
+ * address argument to flush entire VM TLBs.
+ */
+void kvm_flush_remote_tlbs(struct kvm *kvm)
+{
+   if (kvm)
+   kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
  int min, int max)
 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-22 Thread Mario Smarduch



Support for live migration initial write protect.
- moved write protect to architecture memory region prepare function. This
  way you can fail, abort migration without keep track of migration status.
- Above also allows to generalize read dirty log function with x86
- Added stage2_mark_pte_ro()
- optimized initial write protect, skip upper table lookups
- added stage2pmd_addr_end() to do generic 4 level table walk 
- changed kvm_flush_remote_tlbs() to weak function

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/include/asm/kvm_host.h |8 ++
 arch/arm/kvm/arm.c  |3 +
 arch/arm/kvm/mmu.c  |  163 +++
 virt/kvm/kvm_main.c |5 +-
 4 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1e739f9..9f827c8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -67,6 +67,12 @@ struct kvm_arch {
 
/* Interrupt controller */
struct vgic_distvgic;
+
+   /* Marks start of migration, used to handle 2nd stage page faults
+* during migration, prevent installing huge pages and split huge pages
+* to small pages.
+*/
+   int migration_in_progress;
 };
 
 #define KVM_NR_MEM_OBJS 40
@@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 
value);
 
 void kvm_tlb_flush_vmid(struct kvm *kvm);
 
+int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9a4bc10..b916478 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
   struct kvm_userspace_memory_region *mem,
   enum kvm_mr_change change)
 {
+   /* Request for migration issued by user, write protect memory slot */
+   if ((change != KVM_MR_DELETE)  (mem-flags  KVM_MEM_LOG_DIRTY_PAGES))
+   return kvm_mmu_slot_remove_write_access(kvm, mem-slot);
return 0;
 }
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7ab77f3..4d029a6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -31,6 +31,11 @@
 
 #include trace.h
 
+#define stage2pud_addr_end(addr, end)  \
+({ u64 __boundary = ((addr) + PUD_SIZE)  PUD_MASK;\
+   (__boundary - 1  (end) - 1) ? __boundary : (end);  \
+})
+
 extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
 static pgd_t *boot_hyp_pgd;
@@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
return 0;
 }
 
+/* Write protect page */
+static void stage2_mark_pte_ro(pte_t *pte)
+{
+   pte_t new_pte;
+
+   new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2);
+   *pte = new_pte;
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
@@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
phys_addr_t *ipap)
return false;
 }
 
+/**
+ * split_pmd - splits huge pages to small pages, required to keep a dirty log 
of
+ *  smaller memory granules, otherwise huge pages would need to be
+ *  migrated. Practically an idle system has problems migrating with
+ *  huge pages.  Called during WP of entire VM address space, done
+ *  initially when  migration thread isses the KVM_MEM_LOG_DIRTY_PAGES
+ *  ioctl.
+ *  The mmu_lock is held during splitting.
+ *
+ * @kvm:The KVM pointer
+ * @pmd:Pmd to 2nd stage huge page
+ * @addr: ` Guest Physical Address
+ */
+int split_pmd(struct kvm *kvm, pmd_t *pmd, u64 addr)
+{
+   struct page *page;
+   pfn_t pfn = pmd_pfn(*pmd);
+   pte_t *pte;
+   int i;
+
+   page = alloc_page(GFP_KERNEL);
+   if (page == NULL)
+   return -ENOMEM;
+
+   pte = page_address(page);
+   /* cycle through ptes first, use pmd pfn */
+   for (i = 0; i  PTRS_PER_PMD; i++) {
+   pte[i] = pfn_pte(pfn+i, 0);
+   stage2_mark_pte_ro(pte[i]);
+   }
+   kvm_clean_pte(pte);
+   /* After page table setup set pmd */
+   pmd_populate_kernel(NULL, pmd, pte);
+
+   /* get reference on pte page */
+   get_page(virt_to_page(pte));
+   return 0;
+}
+
+/**
+ * kvm_mmu_slot_remove_access - write protects entire VM address space.
+ *  Called at start of migration when KVM_MEM_LOG_DIRTY_PAGES ioctl is
+ *  issued. After this function returns all pages (minus the ones faulted
+ *  in when mmu_lock is released) must be write protected to keep track of
+ *  dirty pages to migrate on subsequent dirty log retrieval.
+ *  mmu_lock is held during write protecting, released on contention.
+ *
+ * @kvm:The KVM pointer
+ * @slot:   The memory slot the dirty log is retrieved for
+ */
+int

[PATCH v3 4/4] add 2nd stage page fault handling during live migration

2014-04-22 Thread Mario Smarduch


- added pte_index() to add to pmd pfn

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/kvm/mmu.c |   31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 52d4dd6..61ee812 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -924,6 +924,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
struct kvm_mmu_memory_cache *memcache = vcpu-arch.mmu_page_cache;
struct vm_area_struct *vma;
pfn_t pfn;
+   bool migration_active;

write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
if (fault_status == FSC_PERM  !write_fault) {
@@ -975,12 +976,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
return -EFAULT;

spin_lock(kvm-mmu_lock);
+   /* place inside lock to prevent race condition when whole VM is being
+* write proteced. Prevent race of huge page install when migration is
+* active.
+*/
+   migration_active = vcpu-kvm-arch.migration_in_progress;
+
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;
-   if (!hugetlb  !force_pte)
+
+   /* During migration don't rebuild huge pages */
+   if (!hugetlb  !force_pte  !migration_active)
hugetlb = transparent_hugepage_adjust(pfn, fault_ipa);

-   if (hugetlb) {
+   /* During migration don't install new huge pages */
+   if (hugetlb  !migration_active) {
pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
new_pmd = pmd_mkhuge(new_pmd);
if (writable) {
@@ -992,6 +1002,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
} else {
pte_t new_pte = pfn_pte(pfn, PAGE_S2);
if (writable) {
+   /* First convert huge page pfn to normal 4k page pfn,
+* while  migration is in progress.
+* Second in migration mode and rare case where
+* splitting of huge pages fails check if pmd is
+* mapping a huge page if it is then clear it so
+* stage2_set_pte() can map in a small page.
+*/
+   if (migration_active  hugetlb) {
+   pmd_t *pmd;
+   pfn += pte_index(fault_ipa);
+   new_pte = pfn_pte(pfn, PAGE_S2);
+   pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
+   if (pmd  kvm_pmd_huge(*pmd))
+   clear_pmd_entry(kvm, pmd, fault_ipa);
+   }
kvm_set_s2pte_writable(new_pte);
kvm_set_pfn_dirty(pfn);
}
@@ -999,6 +1024,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
ret = stage2_set_pte(kvm, memcache, fault_ipa, new_pte, false);
}

+   if (writable)
+   mark_page_dirty(kvm, gfn);

 out_unlock:
spin_unlock(kvm-mmu_lock);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/4] live migration support for VM dirty log management

2014-04-22 Thread Mario Smarduch


- made kvm_vm_ioctl_get_dirty_log() generic moved to kvm_main.c, deleted 
arm,x86  versions
- optimized kvm_mmu_write_protected_pt_masked() to skip upper table lookups


Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/include/asm/kvm_host.h |3 ++
 arch/arm/kvm/arm.c  |5 --
 arch/arm/kvm/mmu.c  |   99 +++
 arch/x86/kvm/x86.c  |   78 --
 virt/kvm/kvm_main.c |   82 
 5 files changed, 184 insertions(+), 83 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 9f827c8..c5c27d8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -237,5 +237,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 
value);
 void kvm_tlb_flush_vmid(struct kvm *kvm);
 
 int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+   struct kvm_memory_slot *slot,
+   gfn_t gfn_offset, unsigned long mask);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b916478..6ca3e84 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -784,11 +784,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
}
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-   return -EINVAL;
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 4d029a6..52d4dd6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -812,6 +812,105 @@ int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int 
slot)
return 0;
 }
 
+/**
+ * kvm_mmu_write_protected_pt_masked - after migration thread write protects
+ *  the entire VM address space itterative call are made to get diry pags
+ *  as the VM pages are being migrated. New dirty pages may be subset
+ *  of initial WPed VM or new writes faulted in. Here write protect new
+ *  dirty pages again in preparation of next dirty log read. This function is
+ *  called as a result KVM_GET_DIRTY_LOG ioctl, to determine what pages
+ *  need to be migrated.
+ *   'kvm-mmu_lock' must be  held to protect against concurrent modification
+ *   of page tables (2nd stage fault, mmu modifiers, ...)
+ *
+ * @kvm:The KVM pointer
+ * @slot:   The memory slot the dirty log is retrieved for
+ * @gfn_offset: The gfn offset in memory slot
+ * @mask:   The mask of dirty pages at offset 'gnf_offset in this memory
+ *  slot to be writ protect
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+   struct kvm_memory_slot *slot,
+   gfn_t gfn_offset, unsigned long mask)
+{
+   phys_addr_t ipa, next, offset_ipa;
+   pgd_t *pgdp = kvm-arch.pgd, *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   gfn_t gfnofst = slot-base_gfn + gfn_offset;
+   bool crosses_pmd;
+
+   ipa = (gfnofst + __ffs(mask))  PAGE_SHIFT;
+   offset_ipa  = gfnofst  PAGE_SHIFT;
+   next = (gfnofst + (BITS_PER_LONG - 1))  PAGE_SHIFT;
+
+   /* check if mask width crosses 2nd level page table range, and
+* possibly 3rd, 4th. If not skip upper table lookups. Unlikely
+* to be true machine memory regions tend to start on atleast PMD
+* boundary and mask is a power of 2.
+*/
+   crosses_pmd = ((offset_ipa  PMD_MASK) ^ (next  PMD_MASK)) ? true :
+   false;
+   /* If pgd, pud, pmd not present and you cross pmd range check next
+* index. Unlikely that pgd and pud would be not present. Between
+* dirty page marking and now page tables may have been altered.
+*/
+   pgd = pgdp + pgd_index(ipa);
+   if (unlikely(crosses_pmd  !pgd_present(*pgd))) {
+   pgd = pgdp + pgd_index(next);
+   if (!pgd_present(*pgd))
+   return;
+   }
+
+   pud = pud_offset(pgd, ipa);
+   if (unlikely(crosses_pmd  !pud_present(*pud))) {
+   pud = pud_offset(pgd, next);
+   if (!pud_present(*pud))
+   return;
+   }
+
+   pmd = pmd_offset(pud, ipa);
+   if (unlikely(crosses_pmd  !pmd_present(*pmd))) {
+   pmd = pmd_offset(pud, next);
+   if (!pmd_present(*pmd))
+   return;
+   }
+
+   for (;;) {
+   pte = pte_offset_kernel(pmd, ipa);
+   if (!pte_present(*pte))
+   goto next_ipa;
+
+   if ((*pte  L_PTE_S2_RDWR) == L_PTE_S2_RDONLY)
+   goto next_ipa;
+
+   stage2_mark_pte_ro(pte);
+
+next_ipa:
+   mask = mask - 1;
+   if (!mask)

Re: [patch 1/2] target-i386: support invariant tsc flag

2014-04-22 Thread Paolo Bonzini


Il 22/04/2014 15:10, Marcelo Tosatti ha scritto:

+case 0x8007:
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+
+if (kvm_enabled()) {
+*edx = env-features[FEAT_8000_0007_EDX];
+} else {
+*edx = 0;
+}


I think TCG is able to eliminate unsupported features too before they 
get in env-features[].  Or are those patches not in yet?  Eduardo?


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] target-i386: support invariant tsc flag

2014-04-22 Thread Eduardo Habkost

On Tue, Apr 22, 2014 at 08:26:36PM -0400, Paolo Bonzini wrote:
 Il 22/04/2014 15:10, Marcelo Tosatti ha scritto:
 +case 0x8007:
 +*eax = 0;
 +*ebx = 0;
 +*ecx = 0;
 +
 +if (kvm_enabled()) {
 +*edx = env-features[FEAT_8000_0007_EDX];
 +} else {
 +*edx = 0;
 +}
 
 I think TCG is able to eliminate unsupported features too before they
 get in env-features[].  Or are those patches not in yet?  Eduardo?

They are not in yet, but while they are not included this filtering
(based on TCG_*_FEATURES macros) should be done in the !kvm_enabled()
block of x86_cpu_realizefn().

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-22 Thread Eduardo Habkost

On Tue, Apr 22, 2014 at 06:27:59PM -0300, Marcelo Tosatti wrote:
 On Tue, Apr 22, 2014 at 05:38:07PM -0300, Eduardo Habkost wrote:
  On Tue, Apr 22, 2014 at 04:10:44PM -0300, Marcelo Tosatti wrote:
   Invariant TSC documentation mentions that invariant TSC will run at a
   constant rate in all ACPI P-, C-. and T-states.
   
   This is not the case if migration to a host with different TSC frequency 
   is allowed, or if savevm is performed. So block migration/savevm.
   
   Also do not expose invariant tsc flag by default.
  
  What do you mean do not expose invtsc by default, exactly? It is
  already not exposed by default because the default CPU model is qemu64
  and qemu64 doesn't have it enabled.
 
 Should be do not expose invtsc by default with -cpu host.
 
 Since it blocks migration, i considered it a special flag that should 
 be set when user is aware that migration is going to be blocked.
 
 Makes sense?

Not for -cpu host. If somebody needs migration to work, they shouldn't
be using -cpu host anyway (I don't know if you have seen the other
comments in my message?).

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC untested] kvm/x86: implement hv EOI assist

2014-04-22 Thread Marcelo Tosatti

On Wed, Apr 23, 2014 at 01:12:49AM +0300, Michael S. Tsirkin wrote:
 On Tue, Apr 22, 2014 at 04:26:48PM -0300, Marcelo Tosatti wrote:
  On Tue, Apr 22, 2014 at 12:11:47PM +0300, Michael S. Tsirkin wrote:
   On Mon, Apr 21, 2014 at 06:40:17PM -0300, Marcelo Tosatti wrote:
On Sun, Apr 13, 2014 at 04:10:22PM +0300, Michael S. Tsirkin wrote:
 It seems that it's easy to implement the EOI assist
 on top of the PV EOI feature: simply convert the
 page address to the format expected by PV EOI.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Looks alright except:

- There is no handling of PV EOI data to be performed at HV_X64_MSR_EOI 
write
path?
 
 I thought HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
 replacement that lets you do EOI with an MSR and not IO.
 No?

Yes.

- Migration fails with PV-EOI not enabled in CPUID ? (perhaps could
  require it for PV-EOI over HV-APIC-ASSIST).
 
 Did not try migration yet - could you explain the issue please?
 HV-APIC-ASSIST MSR is in the list of saved MSRs ...

Restoration of MSR_KVM_PV_EOI_EN is required for migration under
when PVEOI enabled ?

- MS docs mention No EOI required is set only if interrupt injected 
is edge
  triggered.
   
   Hmm I thought level interrupts are going through IOAPIC so that's already 
   true isn't it?
   
   if (!pv_eoi_enabled(vcpu) ||
   /* IRR set or many bits in ISR: could be nested. */
   apic-irr_pending ||
   /* Cache not set: could be safe but we don't bother. */
   apic-highest_isr_cache == -1 ||
   ---/* Need EOI to update ioapic. */
   kvm_ioapic_handles_vector(vcpu-kvm, 
   apic-highest_isr_cache)) {
  
  Right.
  
   /*
* PV EOI was disabled by apic_sync_pv_eoi_from_guest
* so we need not do anything here.
*/
   return;
   }
   
   In any case if some interrupt handler ignores this bit because it's
   level, that's harmless since it will do EOI and then we'll clear the
   bit, right?
  
  Yes.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC untested] kvm/x86: implement hv EOI assist

2014-04-22 Thread Michael S. Tsirkin

On Tue, Apr 22, 2014 at 10:57:48PM -0300, Marcelo Tosatti wrote:
 On Wed, Apr 23, 2014 at 01:12:49AM +0300, Michael S. Tsirkin wrote:
  On Tue, Apr 22, 2014 at 04:26:48PM -0300, Marcelo Tosatti wrote:
   On Tue, Apr 22, 2014 at 12:11:47PM +0300, Michael S. Tsirkin wrote:
On Mon, Apr 21, 2014 at 06:40:17PM -0300, Marcelo Tosatti wrote:
 On Sun, Apr 13, 2014 at 04:10:22PM +0300, Michael S. Tsirkin wrote:
  It seems that it's easy to implement the EOI assist
  on top of the PV EOI feature: simply convert the
  page address to the format expected by PV EOI.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Looks alright except:
 
 - There is no handling of PV EOI data to be performed at 
 HV_X64_MSR_EOI write
 path?
  
  I thought HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
  replacement that lets you do EOI with an MSR and not IO.
  No?
 
 Yes.
 
 - Migration fails with PV-EOI not enabled in CPUID ? (perhaps could
   require it for PV-EOI over HV-APIC-ASSIST).
  
  Did not try migration yet - could you explain the issue please?
  HV-APIC-ASSIST MSR is in the list of saved MSRs ...
 
 Restoration of MSR_KVM_PV_EOI_EN is required for migration under
 when PVEOI enabled ?

That's what I don't get.
Since after this patch, set of HV_X64_MSR_APIC_ASSIST_PAGE sets
KVM_PV_EOI_EN internally, I think restoring HV_X64_MSR_APIC_ASSIST_PAGE
would be sufficient.
No?

 - MS docs mention No EOI required is set only if interrupt injected 
 is edge
   triggered.

Hmm I thought level interrupts are going through IOAPIC so that's 
already true isn't it?

if (!pv_eoi_enabled(vcpu) ||
/* IRR set or many bits in ISR: could be nested. */
apic-irr_pending ||
/* Cache not set: could be safe but we don't bother. */
apic-highest_isr_cache == -1 ||
---/* Need EOI to update ioapic. */
kvm_ioapic_handles_vector(vcpu-kvm, 
apic-highest_isr_cache)) {
   
   Right.
   
/*
 * PV EOI was disabled by apic_sync_pv_eoi_from_guest
 * so we need not do anything here.
 */
return;
}

In any case if some interrupt handler ignores this bit because it's
level, that's harmless since it will do EOI and then we'll clear the
bit, right?
   
   Yes.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

50 matches

Mail list logo