Re: virtio scsi host draft specification, v3

2011-07-01 Thread Paolo Bonzini

On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:

   Of course, when doing so we would be lose the ability to freely remap
   LUNs. But then remapping LUNs doesn't gain you much imho.
   Plus you could always use qemu block backend here if you want
   to hide the details.

  And you could always use the QEMU block backend with scsi-generic if you
  want to remap LUNs, instead of true passthrough via the kernel target.

IIUC the in-kernel target always does remapping.  It passes through
individual LUNs rather than entire targets and you pick LU Numbers to
map to the backing storage (which may or may not be a SCSI
pass-through device).  Nicholas Bellinger can confirm whether this is
correct.


But then I don't understand.  If you pick LU numbers both with the 
in-kernel target and with QEMU, you do not need to use e.g. WWPNs with 
fiber channel, because we are not passing through the details of the 
transport protocol (one day we might have virtio-fc, but more likely 
not).  So the LUNs you use might as well be represented by hierarchical 
LUNs.


Using NPIV with KVM would be done by mapping the same virtual N_Port ID 
in the host(s) to the same LU number in the guest.  You might already do 
this now with virtio-blk, in fact.


Put in another way: the virtio-scsi device is itself a SCSI target, so 
yes, there is a single target port identifier in virtio-scsi.  But this 
SCSI target just passes requests down to multiple real targets, and so 
will let you do ALUA and all that.


Of course if I am dead wrong please correct me.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] tools/kvm: Use kernel header version of net/9p/9p.h

2011-07-01 Thread Aneesh Kumar K.V
don't do a copy of the kernel header

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---

NOTE: This patch is against -tip. 

 include/net/9p/9p.h   |2 +-
 tools/kvm/include/net/9p/9p.h |  734 -
 2 files changed, 1 insertions(+), 735 deletions(-)
 delete mode 100644 tools/kvm/include/net/9p/9p.h

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index 008711e..b7d83e9 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -561,7 +561,7 @@ struct p9_rauth {
 
 struct p9_rerror {
struct p9_str error;
-   u32 errno;  /* 9p2000.u extension */
+   u32 p9_errno;   /* 9p2000.u extension */
 };
 
 struct p9_tflush {
diff --git a/tools/kvm/include/net/9p/9p.h b/tools/kvm/include/net/9p/9p.h
deleted file mode 100644
index 61ecff3..000
--- a/tools/kvm/include/net/9p/9p.h
+++ /dev/null
@@ -1,734 +0,0 @@
-/*
- * include/net/9p/9p.h
- *
- * 9P protocol definitions.
- *
- *  Copyright (C) 2005 by Latchesar Ionkov lu...@ionkov.net
- *  Copyright (C) 2004 by Eric Van Hensbergen eri...@gmail.com
- *  Copyright (C) 2002 by Ron Minnich rminn...@lanl.gov
- *
- *  This program is free software; you can redistribute it and/or modify
- *  it under the terms of the GNU General Public License version 2
- *  as published by the Free Software Foundation.
- *
- *  This program is distributed in the hope that it will be useful,
- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *  GNU General Public License for more details.
- *
- *  You should have received a copy of the GNU General Public License
- *  along with this program; if not, write to:
- *  Free Software Foundation
- *  51 Franklin Street, Fifth Floor
- *  Boston, MA  02111-1301  USA
- *
- */
-
-#ifndef NET_9P_H
-#define NET_9P_H
-
-#pragma pack(1)
-
-/**
- * enum p9_debug_flags - bits for mount time debug parameter
- * @P9_DEBUG_ERROR: more verbose error messages including original error string
- * @P9_DEBUG_9P: 9P protocol tracing
- * @P9_DEBUG_VFS: VFS API tracing
- * @P9_DEBUG_CONV: protocol conversion tracing
- * @P9_DEBUG_MUX: trace management of concurrent transactions
- * @P9_DEBUG_TRANS: transport tracing
- * @P9_DEBUG_SLABS: memory management tracing
- * @P9_DEBUG_FCALL: verbose dump of protocol messages
- * @P9_DEBUG_FID: fid allocation/deallocation tracking
- * @P9_DEBUG_PKT: packet marshalling/unmarshalling
- * @P9_DEBUG_FSC: FS-cache tracing
- *
- * These flags are passed at mount time to turn on various levels of
- * verbosity and tracing which will be output to the system logs.
- */
-
-enum p9_debug_flags {
-   P9_DEBUG_ERROR =(10),
-   P9_DEBUG_9P =   (12),
-   P9_DEBUG_VFS =  (13),
-   P9_DEBUG_CONV = (14),
-   P9_DEBUG_MUX =  (15),
-   P9_DEBUG_TRANS =(16),
-   P9_DEBUG_SLABS =(17),
-   P9_DEBUG_FCALL =(18),
-   P9_DEBUG_FID =  (19),
-   P9_DEBUG_PKT =  (110),
-   P9_DEBUG_FSC =  (111),
-};
-
-#ifdef CONFIG_NET_9P_DEBUG
-extern unsigned int p9_debug_level;
-
-#define P9_DPRINTK(level, format, arg...) \
-do {  \
-   if ((p9_debug_level  level) == level) {\
-   if (level == P9_DEBUG_9P) \
-   printk(KERN_NOTICE (%8.8d)  \
-   format , task_pid_nr(current) , ## arg); \
-   else \
-   printk(KERN_NOTICE -- %s (%d):  \
-   format , __func__, task_pid_nr(current) , ## arg); \
-   } \
-} while (0)
-
-#else
-#define P9_DPRINTK(level, format, arg...)  do { } while (0)
-#endif
-
-#define P9_EPRINTK(level, format, arg...) \
-do { \
-   printk(level 9p: %s (%d):  \
-   format , __func__, task_pid_nr(current), ## arg); \
-} while (0)
-
-/**
- * enum p9_msg_t - 9P message types
- * @P9_TLERROR: not used
- * @P9_RLERROR: response for any failed request for 9P2000.L
- * @P9_TSTATFS: file system status request
- * @P9_RSTATFS: file system status response
- * @P9_TSYMLINK: make symlink request
- * @P9_RSYMLINK: make symlink response
- * @P9_TMKNOD: create a special file object request
- * @P9_RMKNOD: create a special file object response
- * @P9_TLCREATE: prepare a handle for I/O on an new file for 9P2000.L
- * @P9_RLCREATE: response with file access information for 9P2000.L
- * @P9_TRENAME: rename request
- * @P9_RRENAME: rename response
- * @P9_TMKDIR: create a directory request
- * @P9_RMKDIR: create a directory response
- * @P9_TVERSION: version handshake request
- * @P9_RVERSION: version handshake response
- * @P9_TAUTH: request to establish authentication channel
- * @P9_RAUTH: response with authentication information
- * @P9_TATTACH: establish user access to file service
- * @P9_RATTACH: response with top level handle to file hierarchy
- * @P9_TERROR: not used
- * @P9_RERROR: response for any failed request
- 

Re: virtio scsi host draft specification, v3

2011-07-01 Thread Hannes Reinecke

On 07/01/2011 08:41 AM, Paolo Bonzini wrote:

On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:

  Of course, when doing so we would be lose the ability to
freely remap
  LUNs. But then remapping LUNs doesn't gain you much imho.
  Plus you could always use qemu block backend here if you want
  to hide the details.

 And you could always use the QEMU block backend with
 scsi-generic if you want to remap LUNs, instead of true

  passthrough via the kernel target.


IIUC the in-kernel target always does remapping. It passes through
individual LUNs rather than entire targets and you pick LU Numbers to
map to the backing storage (which may or may not be a SCSI
pass-through device). Nicholas Bellinger can confirm whether this is
correct.


But then I don't understand. If you pick LU numbers both with the
in-kernel target and with QEMU, you do not need to use e.g. WWPNs
with fiber channel, because we are not passing through the details
of the transport protocol (one day we might have virtio-fc, but more
likely not). So the LUNs you use might as well be represented by
hierarchical LUNs.



Actually, the kernel does _not_ do a LUN remapping. It just so 
happens that most storage arrays will present the LUN starting with 
0, so normally you wouldn't notice.


However, some arrays have an array-wide LUN range, so you start 
seeing LUNs at odd places:


[3:0:5:0]diskLSI  INF-01-000750  /dev/sdw
[3:0:5:7]diskLSI  Universal Xport  0750  /dev/sdx


Using NPIV with KVM would be done by mapping the same virtual N_Port
ID in the host(s) to the same LU number in the guest. You might
already do this now with virtio-blk, in fact.


The point here is not the mapping. The point is rescanning.

You can map existing NPIV devices already. But you _cannot_ rescan
the host/device whatever _from the guest_ to detect if new devices
are present.
That is the problem I'm trying to describe here.

To be more explicit:
Currently you have to map existing devices directly as individual 
block or scsi devices to the guest.
And rescan within the guest can only be sent to that device, so the 
only information you will get able to gather is if the device itself 
is still present.
You are unable to detect if there are other devices attached to your 
guest which you should connect to.


So we have to have an enclosing instance (ie the equivalent of a 
SCSI target), which is capable of telling us exactly this.



Put in another way: the virtio-scsi device is itself a SCSI target,
so yes, there is a single target port identifier in virtio-scsi. But
this SCSI target just passes requests down to multiple real targets,
and so will let you do ALUA and all that.


Argl. No way. The virtio-scsi device has to map to a single LUN.

I thought I mentioned this already, but I'd better clarify this again:

The SCSI spec itself only deals with LUNs, so anything you'll read 
in there obviously will only handle the interaction between the 
initiator (read: host) and the LUN itself. However, the actual 
command is send via an intermediat target, hence you'll always see 
the reference to the ITL (initiator-target-lun) nexus.
The SCSI spec details discovery of the individual LUNs presented by 
a given target, it does _NOT_ detail the discovery of the targets 
themselves.
That is being delegated to the underlying transport, in most cases 
SAS or FibreChannel.
For the same reason the SCSI spec can afford to disdain any 
reference to path failure, device hot-plugging etc; all of these 
things are being delegated to the transport.


In our context the virtio-scsi device should map to the LUN, and the 
virtio-scsi _host_ backend should map to the target.

The virtio-scsi _guest_ driver will then map to the initiator.

So we should be able to attach more than one device to the backend,
which then will be presented to the initiator.

In the case of NPIV it would make sense to map the virtual SCSI host 
to the backend, so that all devices presented to the virtual SCSI 
host will be presented to the backend, too.
However, when doing so these devices will normally be referenced by 
their original LUN, as these will be presented to the guest via eg 
'REPORT LUNS'.


The above thread now tries to figure out if we should remap those 
LUN numbers or just expose them as they are.
If we decide on remapping, we have to emulate _all_ commands 
referring explicitely to those LUN numbers (persistent reservations, 
anyone?). If we don't, we would expose some hardware detail to the 
guest, but would save us _a lot_ of processing.


I'm all for the latter.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Biweekly KVM Test report, kernel 2e0d8e28... qemu d5893103...

2011-07-01 Thread Ren, Yongjie
Hi All,
This is KVM test result against kvm.git 
2e0d8e289ef23d0e56923d778e9bea0601a0edb4 based on kernel 3.0.0-rc5+, and 
qemu-kvm.git d58931037dbb4fbc2fbb33858629d3fabfd1b0d4.

We found a make error issue of qemu-kvm.git.  This issue was reported in qemu’s 
bugzilla by someone else. 
I commented 2 lines in /usr/include/pngconf.h to work around in our kvm build 
system.
And the issue also occurred at 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/888/steps/compile/logs/stdio
 

New issue:
1. qemu-kvm.git make error when ‘CC ui/vnc-enc-tight.o’
   https://bugs.launchpad.net/qemu/+bug/802588

Old Issues:
1. ltp diotest running time is 2.54 times than before
 
https://sourceforge.net/tracker/?func=detailaid=2723366group_id=180599atid=893831
2. perfctr wrmsr warning when booting 64bit RHEl5.3
 
https://sourceforge.net/tracker/?func=detailaid=2721640group_id=180599atid=893831
 
3. [vt-d] NIC assignment order in command line make some NIC can't work
 https://bugs.launchpad.net/qemu/+bug/799036


Test environment:
==
  Platform   Westmere-EP  SanyBridge-EP
  CPU Cores   24   32
  Memory size 10G 32G

Report summary of IA32E on Westmere-EP:
Summary Test Report of Last Session
=
   Total   Pass    Fail    NoResult   Crash
=
control_panel_ept_vpid  12  12  0     0    0
control_panel_ept   4   4   0 0    0
control_panel_vpid  3   3   0 0    0
control_panel   3   3   0 0    0
gtest_vpid  1   1   0     0    0
gtest_ept   1   1   0 0    0
gtest   3   3   0 0    0
vtd_ept_vpid    3   2   1 0    0
gtest_ept_vpid  12  12  0 0    0
sriov_ept_vpid  6   6   0 0    0
=
control_panel_ept_vpid  12  12  0 0    0
:KVM_LM_Continuity_64_g3   1   1   0 0    0
:KVM_four_dguest_64_g32e   1   1   0 0    0
:KVM_1500M_guest_64_gPAE   1   1   0 0    0
:KVM_SR_SMP_64_g32e    1   1   0 0    0
:KVM_LM_SMP_64_g32e    1   1   0 0    0
:KVM_linux_win_64_g32e 1   1   0 0    0
:KVM_two_winxp_64_g32e 1   1   0 0    0
:KVM_1500M_guest_64_g32e   1   1   0 0    0
:KVM_256M_guest_64_gPAE    1   1   0 0    0
:KVM_SR_Continuity_64_g3   1   1   0 0    0
:KVM_256M_guest_64_g32e    1   1   0 0    0
:KVM_four_sguest_64_g32e   1   1   0 0    0
control_panel_ept   4   4   0 0    0
:KVM_linux_win_64_g32e 1   1   0 0    0
:KVM_1500M_guest_64_g32e   1   1   0 0    0
:KVM_1500M_guest_64_gPAE   1   1   0 0    0
:KVM_LM_SMP_64_g32e    1   1   0 0    0
control_panel_vpid  3   3   0 0    0
:KVM_linux_win_64_g32e 1   1   0 0    0
:KVM_1500M_guest_64_g32e   1   1   0 0    0
:KVM_1500M_guest_64_gPAE   1   1   0 0    0
control_panel   3   3   0 0    0
:KVM_1500M_guest_64_g32e   1   1   0 0    0
:KVM_1500M_guest_64_gPAE   1   1   0 0    0
:KVM_LM_SMP_64_g32e    1   1   0 0    0
gtest_vpid  1   1   0 0    0
:boot_smp_win7_ent_64_g3   1   1   0 0    0
gtest_ept   1   1   0     0    0
:boot_smp_win7_ent_64_g3   1   1   0 0    0
gtest   3   3   0 0    0
:boot_smp_win2008_64_g32   1   1   0 0    0
:boot_smp_win7_ent_64_gP   1   1   0     0    0
:boot_smp_vista_64_g32e    1   1   0 0    0
vtd_ept_vpid    3   2   1 0    0
:one_pcie_smp_xp_64_g32e   1   1   0 0    0
:one_pcie_smp_64_g32e  1   1   0 0    0
:two_dev_smp_64_g32e   1   0   1 0    0
gtest_ept_vpid  12  12  0 0    0
:boot_up_acpi_64_g32e  1   1   0 0    0

[PATCH 0/3] [v4] Megasas HBA emulation

2011-07-01 Thread Hannes Reinecke
Hi all,

thanks to Paolo and Stefan most of the SCSI patches are now in, so
I've made the next attempt of submitting my Megaraid SAS HBA emulation.

To do so, I've done two additional patches, both should be valid cleanups.

- Replace 'tag' by 'hba_private'
  The SCSIRequest structure has a 'tag', which is being used by the
  drivers to match the SCSIRequest to the internal request structure.
  The only driver actually to benefit from this is the lsi53c895a
  driver, everyone else either leaves it blank or uses some internal
  numberting here.
  So this patch converts the 'tag' to a 'hba_private' pointer, which
  allows the driver to store a pointer to the internal structure
  directly within the SCSIRequest. This saves the lookup and an
  additional field in the driver internal request structure.
- Add an 'offset' parameter to iov_to_buf()
  iov_from_buf() has it, but iov_to_buf() has it not. But we'll be
  needing it if the iovec is larger than the buffer. So there.

And, of course, the megasas driver itself. Which has been modified
to work with the new interface; otherwise there have been no changes
to the previous submission.

Hannes Reinecke (3):
  iov: Add 'offset' parameter to iov_to_buf()
  scsi: replace 'tag' with 'hba_private' pointer
  megasas: LSI Megaraid SAS emulation

 Makefile.objs   |1 +
 default-configs/pci.mak |1 +
 hw/esp.c|2 +-
 hw/lsi53c895a.c |   17 +-
 hw/megasas.c| 1923 +++
 hw/mfi.h| 1197 +
 hw/pci_ids.h|3 +-
 hw/scsi-bus.c   |   22 +-
 hw/scsi-disk.c  |5 +-
 hw/scsi-generic.c   |4 +-
 hw/scsi.h   |8 +-
 hw/spapr_vscsi.c|   41 +-
 hw/usb-msd.c|   10 +-
 hw/virtio-net.c |2 +-
 hw/virtio-serial-bus.c  |2 +-
 iov.c   |   23 +-
 iov.h   |2 +-
 trace-events|   14 +-
 18 files changed, 3193 insertions(+), 84 deletions(-)
 create mode 100644 hw/megasas.c
 create mode 100644 hw/mfi.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()

2011-07-01 Thread Hannes Reinecke
Occasionally, the buffer needs to be placed at a offset within
the iovec when copying the buffer to the iovec.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 hw/virtio-net.c|2 +-
 hw/virtio-serial-bus.c |2 +-
 iov.c  |   23 ++-
 iov.h  |2 +-
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 6997e02..a32cc01 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -657,7 +657,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 /* copy in packet.  ugh */
 len = iov_from_buf(sg, elem.in_num,
-   buf + offset, size - offset);
+   buf + offset, 0, size - offset);
 total += len;
 offset += len;
 /* If buffers can't be merged, at this point we
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 7f6db7b..53c58d0 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -103,7 +103,7 @@ static size_t write_to_port(VirtIOSerialPort *port,
 }
 
 len = iov_from_buf(elem.in_sg, elem.in_num,
-   buf + offset, size - offset);
+   buf + offset, 0, size - offset);
 offset += len;
 
 virtqueue_push(vq, elem, len);
diff --git a/iov.c b/iov.c
index 588cd04..9ead6ee 100644
--- a/iov.c
+++ b/iov.c
@@ -15,21 +15,26 @@
 #include iov.h
 
 size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
-const void *buf, size_t size)
+const void *buf, size_t offset, size_t size)
 {
-size_t offset;
+size_t iov_off, buf_off;
 unsigned int i;
 
-offset = 0;
-for (i = 0; offset  size  i  iovcnt; i++) {
-size_t len;
+iov_off = 0;
+buf_off = 0;
+for (i = 0; i  iovcnt  size; i++) {
+if (offset  (iov_off + iov[i].iov_len)) {
+size_t len = MIN((iov_off + iov[i].iov_len) - offset, size);
 
-len = MIN(iov[i].iov_len, size - offset);
+memcpy(iov[i].iov_base + (offset - iov_off), buf + buf_off, len);
 
-memcpy(iov[i].iov_base, buf + offset, len);
-offset += len;
+buf_off += len;
+offset += len;
+size -= len;
+}
+iov_off += iov[i].iov_len;
 }
-return offset;
+return buf_off;
 }
 
 size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
diff --git a/iov.h b/iov.h
index 60a8547..2677527 100644
--- a/iov.h
+++ b/iov.h
@@ -13,7 +13,7 @@
 #include qemu-common.h
 
 size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
-const void *buf, size_t size);
+const void *buf, size_t offset, size_t size);
 size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
   void *buf, size_t offset, size_t size);
 size_t iov_size(const struct iovec *iov, const unsigned int iovcnt);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer

2011-07-01 Thread Hannes Reinecke
'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 hw/esp.c  |2 +-
 hw/lsi53c895a.c   |   17 -
 hw/scsi-bus.c |   22 +++---
 hw/scsi-disk.c|5 ++---
 hw/scsi-generic.c |4 ++--
 hw/scsi.h |8 
 hw/spapr_vscsi.c  |   41 -
 hw/usb-msd.c  |   10 +-
 trace-events  |   14 +++---
 9 files changed, 52 insertions(+), 71 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 6d3f5d2..912ff89 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t 
busid)
 
 DPRINTF(do_busid_cmd: busid 0x%x\n, busid);
 lun = busid  7;
-s-current_req = scsi_req_new(s-current_dev, 0, lun);
+s-current_req = scsi_req_new(s-current_dev, lun, s);
 datalen = scsi_req_enqueue(s-current_req, buf);
 s-ti_size = datalen;
 if (datalen != 0) {
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 940b43a..272e919 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -670,7 +670,7 @@ static void lsi_request_cancelled(SCSIRequest *req)
 return;
 }
 
-p = lsi_find_by_tag(s, req-tag);
+p = req-hba_private;
 if (p) {
 QTAILQ_REMOVE(s-queue, p, next);
 scsi_req_unref(req);
@@ -680,18 +680,17 @@ static void lsi_request_cancelled(SCSIRequest *req)
 
 /* Record that data is available for a queued command.  Returns zero if
the device was reselected, nonzero if the IO is deferred.  */
-static int lsi_queue_tag(LSIState *s, uint32_t tag, uint32_t len)
+static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len)
 {
-lsi_request *p;
+lsi_request *p = req-hba_private;
 
-p = lsi_find_by_tag(s, tag);
 if (!p) {
-BADF(IO with unknown tag %d\n, tag);
+BADF(IO with unknown reference %p\n, req-hba_private);
 return 1;
 }
 
 if (p-pending) {
-BADF(Multiple IO pending for tag %d\n, tag);
+BADF(Multiple IO pending for request %p\n, p);
 }
 p-pending = len;
 /* Reselect if waiting for it, or if reselection triggers an IRQ
@@ -743,9 +742,9 @@ static void lsi_transfer_data(SCSIRequest *req, uint32_t 
len)
 LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent);
 int out;
 
-if (s-waiting == 1 || !s-current || req-tag != s-current-tag ||
+if (s-waiting == 1 || !s-current || req-hba_private != s-current ||
 (lsi_irq_on_rsl(s)  !(s-scntl1  LSI_SCNTL1_CON))) {
-if (lsi_queue_tag(s, req-tag, len)) {
+if (lsi_queue_req(s, req, len)) {
 return;
 }
 }
@@ -789,7 +788,7 @@ static void lsi_do_command(LSIState *s)
 assert(s-current == NULL);
 s-current = qemu_mallocz(sizeof(lsi_request));
 s-current-tag = s-select_tag;
-s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun);
+s-current-req = scsi_req_new(dev, s-current_lun, s-current);
 
 n = scsi_req_enqueue(s-current-req, buf);
 if (n) {
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index ad6a730..d1fc481 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -131,7 +131,7 @@ int scsi_bus_legacy_handle_cmdline(SCSIBus *bus)
 return res;
 }
 
-SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t 
lun)
+SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t lun, void 
*hba_private)
 {
 SCSIRequest *req;
 
@@ -139,16 +139,16 @@ SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, 
uint32_t tag, uint32_t l
 req-refcount = 1;
 req-bus = scsi_bus_from_device(d);
 req-dev = d;
-req-tag = tag;
 req-lun = lun;
+req-hba_private = hba_private;
 req-status = -1;
-trace_scsi_req_alloc(req-dev-id, req-lun, req-tag);
+trace_scsi_req_alloc(req-dev-id, req-lun, req-hba_private);
 return req;
 }
 
-SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun)
+SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t lun, void *hba_private)
 {
-return d-info-alloc_req(d, tag, lun);
+return d-info-alloc_req(d, lun, hba_private);
 }
 
 uint8_t *scsi_req_get_buf(SCSIRequest *req)
@@ -182,7 +182,7 @@ int32_t scsi_req_enqueue(SCSIRequest *req, uint8_t *buf)
 
 static void scsi_req_dequeue(SCSIRequest *req)
 {
-trace_scsi_req_dequeue(req-dev-id, req-lun, req-tag);
+trace_scsi_req_dequeue(req-dev-id, req-lun, req-hba_private);
 if (req-enqueued) {
 QTAILQ_REMOVE(req-dev-requests, req, next);
 req-enqueued = false;
@@ -214,7 +214,7 @@ static int scsi_req_length(SCSIRequest *req, uint8_t *cmd)
 req-cmd.len = 12;
 break;
 default:
-trace_scsi_req_parse_bad(req-dev-id, req-lun, req-tag, cmd[0]);
+

[PATCH 0/3] KVM test: Windows install fixes

2011-07-01 Thread Lucas Meneghel Rodrigues
These 3 patches fixes problems found when performing a full
round of windows installs.

Lucas Meneghel Rodrigues (3):
  KVM test: Render unattended files more properly
  KVM test: Update Win2003 CD info to match MSDN registers
  KVM test: Reformat sample windows ini style unattended files

 client/tests/kvm/tests/unattended_install.py |  191 +-
 client/tests/kvm/tests_base.cfg.sample   |   44 +--
 client/tests/kvm/unattended/win2000-32.sif   |   95 +++--
 client/tests/kvm/unattended/win2003-32.sif   |   78 ++--
 client/tests/kvm/unattended/win2003-64.sif   |   78 ++--
 client/tests/kvm/unattended/winxp32.sif  |   99 +++---
 client/tests/kvm/unattended/winxp64.sif  |   99 +++---
 7 files changed, 386 insertions(+), 298 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM test: Update Win2003 CD info to match MSDN registers

2011-07-01 Thread Lucas Meneghel Rodrigues
Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/tests_base.cfg.sample |   44 +++
 1 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index bdc9b6c..5313da1 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -2317,17 +2317,27 @@ variants:
 - 32:
 image_name += -32
 install:
-cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso
-md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7
-md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3
+cdrom_cd1 = 
isos/windows/en_win_srv_2003_r2_enterprise_with_sp2_cd1_x13-05460.iso
+md5sum_cd1 = 7c3bc891d20c7e6a110c4f1ad82952ba
+md5sum_1m_cd1 = b1671ecf47a270e49e04982bf1474ff9
+sha1sum_cd1 = 
ee11cc735c695501874d2fa123f7d78449b3de7c
+sha1sum_1m_cd1 = 
e2d49dc3fbe17a6b2ba1812543f2cc08ef9565c4
+#cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso
+#md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7
+#md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3
 user = user
 steps = Win2003-32.steps
 setup:
 steps = Win2003-32-rss.steps
 unattended_install.cdrom, whql.support_vm_install:
-cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso
-md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7
-md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3
+cdrom_cd1 = 
isos/windows/en_win_srv_2003_r2_enterprise_with_sp2_cd1_x13-05460.iso
+md5sum_cd1 = 7c3bc891d20c7e6a110c4f1ad82952ba
+md5sum_1m_cd1 = b1671ecf47a270e49e04982bf1474ff9
+sha1sum_cd1 = 
ee11cc735c695501874d2fa123f7d78449b3de7c
+sha1sum_1m_cd1 = 
e2d49dc3fbe17a6b2ba1812543f2cc08ef9565c4
+#cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso
+#md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7
+#md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3
 unattended_file = unattended/win2003-32.sif
 floppy = images/win2003-32/answer.vfd
 # Uncomment virtio_network_installer_path line if
@@ -2349,17 +2359,27 @@ variants:
 - 64:
 image_name += -64
 install:
-cdrom_cd1 = isos/windows/Windows2003-x64.iso
-md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd
-md5sum_1m_cd1 = 439393c384116aa09e08a0ad047dcea8
+cdrom_cd1 = 
isos/windows/en_win_srv_2003_r2_enterprise_x64_with_sp2_cd1_x13-06188.iso
+md5sum_cd1 = 09f4cb31796e9802dcc477e397868c9a
+md5sum_1m_cd1 = c11ebcf6c128d94c83fe623566eb29d7
+sha1sum_cd1 = 
d04c8f304047397be486c38a6b769f16993d4b39
+sha1sum_1m_cd1 = 
3daf6fafda8ba48779df65e4713a3cdbd6c9d136
+#cdrom_cd1 = isos/windows/Windows2003-x64.iso
+#md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd
+#md5sum_1m_cd1 = 439393c384116aa09e08a0ad047dcea8
 user = user
 steps = Win2003-64.steps
 setup:
 steps = Win2003-64-rss.steps
 unattended_install.cdrom, whql.support_vm_install:
-cdrom_cd1 = isos/windows/Windows2003-x64.iso
-md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd
-md5sum_1m_cd1 = 439393c384116aa09e08a0ad047dcea8
+cdrom_cd1 = 
isos/windows/en_win_srv_2003_r2_enterprise_x64_with_sp2_cd1_x13-06188.iso
+md5sum_cd1 = 09f4cb31796e9802dcc477e397868c9a
+md5sum_1m_cd1 = c11ebcf6c128d94c83fe623566eb29d7
+sha1sum_cd1 = 
d04c8f304047397be486c38a6b769f16993d4b39
+sha1sum_1m_cd1 = 
3daf6fafda8ba48779df65e4713a3cdbd6c9d136
+#cdrom_cd1 = isos/windows/Windows2003-x64.iso
+#md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd
+#md5sum_1m_cd1 = 

[PATCH 3/3] KVM test: Reformat sample windows ini style unattended files

2011-07-01 Thread Lucas Meneghel Rodrigues
If we prepend spaces on the key=value lines, ConfigParser will
fail to parse the file. So let's reformat the files in a way
that we won't have this problem again.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/unattended/win2000-32.sif |   95 ++-
 client/tests/kvm/unattended/win2003-32.sif |   78 +++---
 client/tests/kvm/unattended/win2003-64.sif |   78 +++---
 client/tests/kvm/unattended/winxp32.sif|   99 ++--
 client/tests/kvm/unattended/winxp64.sif|   99 ++--
 5 files changed, 225 insertions(+), 224 deletions(-)

diff --git a/client/tests/kvm/unattended/win2000-32.sif 
b/client/tests/kvm/unattended/win2000-32.sif
index 8720851..6aa1848 100644
--- a/client/tests/kvm/unattended/win2000-32.sif
+++ b/client/tests/kvm/unattended/win2000-32.sif
@@ -1,73 +1,76 @@
-;SetupMgrTag
 [Data]
-AutoPartition=1
-MsDosInitiated=0
-UnattendedInstall=Yes
+AutoPartition = 1
+MsDosInitiated = 0
+UnattendedInstall = Yes
 
 [Unattended]
-Repartition=Yes
-UnattendMode=FullUnattended
-OemSkipEula=Yes
-OemPreinstall=No
-TargetPath=\WINDOWS
-UnattendSwitch=Yes
-CrashDumpSetting=1
-DriverSigningPolicy=ignore
-WaitForReboot=no
+Repartition = Yes
+UnattendMode = FullUnattended
+OemSkipEula = Yes
+OemPreinstall = No
+TargetPath = \WINDOWS
+UnattendSwitch = Yes
+CrashDumpSetting = 1
+DriverSigningPolicy = ignore
+OemPnPDriversPath = KVM_TEST_NETWORK_DRIVER_PATH
+WaitForReboot = no
 
 [GuiUnattended]
-AdminPassword=1q2w3eP
-EncryptedAdminPassword=NO
-TimeZone=85
-OemSkipWelcome=1
-AutoLogon=Yes
-AutoLogonCount=1000
-OEMSkipRegional=1
+AdminPassword = 1q2w3eP
+EncryptedAdminPassword = NO
+TimeZone = 85
+OemSkipWelcome = 1
+AutoLogon = Yes
+AutoLogonCount = 1000
+OEMSkipRegional = 1
 
 [UserData]
-ProductKey=KVM_TEST_CDKEY
-FullName=Autotest Mindless Drone
-OrgName=Autotest
-ComputerName=*
+ProductKey = KVM_TEST_CDKEY
+FullName = Autotest Mindless Drone
+OrgName = Autotest
+ComputerName = *
 
 [Identification]
-JoinWorkgroup=WORKGROUP
+JoinWorkgroup = WORKGROUP
 
 [Networking]
-InstallDefaultComponents=Yes
+InstallDefaultComponents = Yes
 
 [Proxy]
-Proxy_Enable=0
-Use_Same_Proxy=0
+Proxy_Enable = 0
+Use_Same_Proxy = 0
 
 [Components]
-dialer=off
-media_clips=off
-media_utopia=off
-msnexplr=off
-netoc=off
-OEAccess=off
-templates=off
-WMAccess=off
-zonegames=off
+dialer = off
+media_clips = off
+media_utopia = off
+msnexplr = off
+netoc = off
+OEAccess = off
+templates = off
+WMAccess = off
+zonegames = off
 
 [TerminalServices]
-AllowConnections=1
+AllowConnections = 1
 
 [WindowsFirewall]
-Profiles=WindowsFirewall.TurnOffFirewall
+Profiles = WindowsFirewall.TurnOffFirewall
 
 [WindowsFirewall.TurnOffFirewall]
-Mode=0
+Mode = 0
 
 [Branding]
-BrandIEUsingUnattended=Yes
+BrandIEUsingUnattended = Yes
 
 [Display]
-Xresolution=1024
-YResolution=768
+Xresolution = 1024
+YResolution = 768
 
 [GuiRunOnce]
-   Command0=cmd /c E:\setuprss.bat
-   Command1=cmd /c netsh interface ip set address local dhcp
-   Command2=cmd /c A:\finish.exe
+Command0 = cmd /c KVM_TEST_VIRTIO_NETWORK_INSTALLER
+Command1 = cmd /c E:\setuprss.bat
+Command2 = cmd /c netsh interface ip set address local dhcp
+Command3 = cmd /c sc config tlntsvr start= auto
+Command4 = cmd /c net start telnet
+Command5 = cmd /c A:\finish.exe
diff --git a/client/tests/kvm/unattended/win2003-32.sif 
b/client/tests/kvm/unattended/win2003-32.sif
index 207cd2b..6e69b5e 100644
--- a/client/tests/kvm/unattended/win2003-32.sif
+++ b/client/tests/kvm/unattended/win2003-32.sif
@@ -1,66 +1,66 @@
 [Data]
-AutoPartition = 1
-MsDosInitiated = 0
-UnattendedInstall = Yes
+AutoPartition = 1
+MsDosInitiated = 0
+UnattendedInstall = Yes
 
 [Unattended]
-UnattendMode = FullUnattended
-OemSkipEula = Yes
-OemPreinstall = No
-UnattendSwitch = Yes
-CrashDumpSetting = 1
-DriverSigningPolicy = ignore
-OemPnPDriversPath=KVM_TEST_NETWORK_DRIVER_PATH
-WaitForReboot = no
-Repartition = yes
+UnattendMode = FullUnattended
+OemSkipEula = Yes
+OemPreinstall = No
+UnattendSwitch = Yes
+CrashDumpSetting = 1
+DriverSigningPolicy = ignore
+OemPnPDriversPath = KVM_TEST_NETWORK_DRIVER_PATH
+WaitForReboot = no
+Repartition = yes
 
 [GuiUnattended]
-AdminPassword = 1q2w3eP
-AutoLogon = Yes
-AutoLogonCount = 1000
-OEMSkipRegional = 1
-TimeZone = 85
-OemSkipWelcome = 1
+AdminPassword = 1q2w3eP
+AutoLogon = Yes
+AutoLogonCount = 1000
+OEMSkipRegional = 1
+TimeZone = 85
+OemSkipWelcome = 1
 
 [UserData]
-ProductKey=KVM_TEST_CDKEY
-FullName=Autotest Mindless Drone
-OrgName=Autotest
-ComputerName=*
+ProductKey = KVM_TEST_CDKEY
+FullName = Autotest Mindless Drone
+OrgName = Autotest
+ComputerName = *
 
 [LicenseFilePrintData]
-

Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 09:42, Hannes Reinecke wrote:

 Occasionally, the buffer needs to be placed at a offset within
 the iovec when copying the buffer to the iovec.

So this is a buffer into the iovec, right? Wouldn't it make sense to also 
modify iov_to_buf respectively then, so the API stays similar? Also, it'd be 
nice to give the parameter a more obvious name, so potential users can easily 
recognize what it offsets.


Alex

 
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
 hw/virtio-net.c|2 +-
 hw/virtio-serial-bus.c |2 +-
 iov.c  |   23 ++-
 iov.h  |2 +-
 4 files changed, 17 insertions(+), 12 deletions(-)
 
 diff --git a/hw/virtio-net.c b/hw/virtio-net.c
 index 6997e02..a32cc01 100644
 --- a/hw/virtio-net.c
 +++ b/hw/virtio-net.c
 @@ -657,7 +657,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, 
 const uint8_t *buf, size_
 
 /* copy in packet.  ugh */
 len = iov_from_buf(sg, elem.in_num,
 -   buf + offset, size - offset);
 +   buf + offset, 0, size - offset);
 total += len;
 offset += len;
 /* If buffers can't be merged, at this point we
 diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
 index 7f6db7b..53c58d0 100644
 --- a/hw/virtio-serial-bus.c
 +++ b/hw/virtio-serial-bus.c
 @@ -103,7 +103,7 @@ static size_t write_to_port(VirtIOSerialPort *port,
 }
 
 len = iov_from_buf(elem.in_sg, elem.in_num,
 -   buf + offset, size - offset);
 +   buf + offset, 0, size - offset);
 offset += len;
 
 virtqueue_push(vq, elem, len);
 diff --git a/iov.c b/iov.c
 index 588cd04..9ead6ee 100644
 --- a/iov.c
 +++ b/iov.c
 @@ -15,21 +15,26 @@
 #include iov.h
 
 size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
 -const void *buf, size_t size)
 +const void *buf, size_t offset, size_t size)
 {
 -size_t offset;
 +size_t iov_off, buf_off;
 unsigned int i;
 
 -offset = 0;
 -for (i = 0; offset  size  i  iovcnt; i++) {
 -size_t len;
 +iov_off = 0;
 +buf_off = 0;
 +for (i = 0; i  iovcnt  size; i++) {
 +if (offset  (iov_off + iov[i].iov_len)) {
 +size_t len = MIN((iov_off + iov[i].iov_len) - offset, size);
 
 -len = MIN(iov[i].iov_len, size - offset);
 +memcpy(iov[i].iov_base + (offset - iov_off), buf + buf_off, len);
 
 -memcpy(iov[i].iov_base, buf + offset, len);
 -offset += len;
 +buf_off += len;
 +offset += len;
 +size -= len;
 +}
 +iov_off += iov[i].iov_len;
 }
 -return offset;
 +return buf_off;
 }
 
 size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
 diff --git a/iov.h b/iov.h
 index 60a8547..2677527 100644
 --- a/iov.h
 +++ b/iov.h
 @@ -13,7 +13,7 @@
 #include qemu-common.h
 
 size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
 -const void *buf, size_t size);
 +const void *buf, size_t offset, size_t size);
 size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
   void *buf, size_t offset, size_t size);
 size_t iov_size(const struct iovec *iov, const unsigned int iovcnt);
 -- 
 1.6.0.2
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2] access: check SMEP on prefetch pte path

2011-07-01 Thread Xiao Guangrong
On 06/29/2011 06:24 PM, Yang, Wei wrote:

 +
 + /*
 +  * Here we write the ro user page when
 +  * cr0.wp=0, then we execute it and SMEP
 +  * fault should happen.
 +  */
 + err_prepare_notwp = ac_test_do_access(at1);
 + if (!err_prepare_notwp) {
 + printf(%s: SMEP prepare fail\n, __FUNCTION__);
 + goto clean_up;
 + }
 +
 + at1.flags[AC_ACCESS_WRITE] = 0;
 + at1.flags[AC_ACCESS_FETCH] = 1;
 + ac_set_expected_status(at1);
 + err_smep_notwp = ac_test_do_access(at1);
 +

The address is accessed in the fist test, it is really prefetch-ed
in the second test?

  
  int ac_test_run(void)
 @@ -669,16 +765,22 @@ int ac_test_run(void)
  ac_test_t at;
  ac_pool_t pool;
  int i, tests, successes;
 +extern u64 ptl2[];
  
  printf(run\n);
  tests = successes = 0;
  ac_env_int(pool);
  ac_test_init(at, (void *)(0x1234 + 16 * smp_id()));
  do {
 + if (at.flags[AC_CPU_CR4_SMEP]  (ptl2[2]  0x4))
 + ptl2[2] -= 0x4;

It seems you just remove the U/S bit, but forget to recover it, it can 
break the test if AC_ACCESS_USER  !SMEP

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

  size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
-const void *buf, size_t size)
+const void *buf, size_t offset, size_t size)


Wrong commit subject, it seems. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()

2011-07-01 Thread Hannes Reinecke

On 07/01/2011 10:03 AM, Paolo Bonzini wrote:

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
- const void *buf, size_t size)
+ const void *buf, size_t offset, size_t size)


Wrong commit subject, it seems. :)


Bummer.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()

2011-07-01 Thread Hannes Reinecke

On 07/01/2011 10:02 AM, Alexander Graf wrote:


On 01.07.2011, at 09:42, Hannes Reinecke wrote:


Occasionally, the buffer needs to be placed at a offset within
the iovec when copying the buffer to the iovec.


So this is a buffer into the iovec, right? Wouldn't it make sense
 to also modify iov_to_buf respectively then, so the API stays 
similar?


Ahem. That's exactly what the patch does. Except from the mixed-up 
subject.


iov_to_buff() has an offset parameter, iov_from_buf() has not.
For no obvious reasons.


Also, it'd be nice to give the parameter a more obvious name, so potential

 users can easily recognize what it offsets.



Yes, that sounds reasonable.

What about 'iov_off' ?
(And possibly rename 'iovcnt' to 'iov_cnt' for consistency ?)

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM test: Render unattended files more properly

2011-07-01 Thread Lucas Meneghel Rodrigues
Windows2008 install program does not behave well when
we specify dummy paths to alternate install drivers,
unlike Windows Vista and Windows 7. This is enough
motivation to rewrite the unattended install file
rendering code, now:

1) XML files will be properly modified using an XML API
2) ini files will be properly modified using ConfigParser
3) kickstart files use a simplified version of the old
logic (re.sub).

Tested with the guest OS that motivated the patch and
of course, other linux and windows guests, everything
looks good.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/tests/unattended_install.py |  191 +-
 1 files changed, 129 insertions(+), 62 deletions(-)

diff --git a/client/tests/kvm/tests/unattended_install.py 
b/client/tests/kvm/tests/unattended_install.py
index d1c700d..6d6ee07 100644
--- a/client/tests/kvm/tests/unattended_install.py
+++ b/client/tests/kvm/tests/unattended_install.py
@@ -1,4 +1,5 @@
 import logging, time, socket, re, os, shutil, tempfile, glob, ConfigParser
+import xml.dom.minidom
 from autotest_lib.client.common_lib import error
 from autotest_lib.client.bin import utils
 from autotest_lib.client.virt import virt_vm, virt_utils
@@ -47,8 +48,8 @@ class Disk(object):
 self.path = None
 
 
-def setup_answer_file(self, filename, contents):
-utils.open_write_close(os.path.join(self.mount, filename), contents)
+def get_answer_file_path(self, filename):
+return os.path.join(self.mount, filename)
 
 
 def copy_to(self, src):
@@ -258,8 +259,7 @@ class UnattendedInstallConfig(object):
 self.image_path = os.path.dirname(self.kernel)
 
 
-@error.context_aware
-def render_answer_file(self):
+def answer_kickstart(self, answer_path):
 
 Replace KVM_TEST_CDKEY (in the unattended file) with the cdkey
 provided for this test and replace the KVM_TEST_MEDIUM with
@@ -267,17 +267,12 @@ class UnattendedInstallConfig(object):
 
 @return: Answer file contents
 
-error.base_context('Rendering final answer file')
-error.context('Reading answer file %s' % self.unattended_file)
-unattended_contents = open(self.unattended_file).read()
+contents = open(self.unattended_file).read()
+
 dummy_cdkey_re = r'\bKVM_TEST_CDKEY\b'
-if re.search(dummy_cdkey_re, unattended_contents):
+if re.search(dummy_cdkey_re, contents):
 if self.cdkey:
-unattended_contents = re.sub(dummy_cdkey_re, self.cdkey,
- unattended_contents)
-else:
-print (WARNING: 'cdkey' required but not specified for 
-   this unattended installation)
+contents = re.sub(dummy_cdkey_re, self.cdkey, contents)
 
 dummy_medium_re = r'\bKVM_TEST_MEDIUM\b'
 if self.medium == cdrom:
@@ -290,67 +285,135 @@ class UnattendedInstallConfig(object):
 else:
 raise ValueError(Unexpected installation medium %s % self.url)
 
-unattended_contents = re.sub(dummy_medium_re, content,
- unattended_contents)
+contents = re.sub(dummy_medium_re, content, contents)
 
-def replace_virtio_key(contents, dummy_re, attribute_name):
-
-Replace a virtio dummy string with contents.
+logging.debug(Unattended install contents:)
+for line in contents.splitlines():
+logging.debug(line)
 
-If install_virtio is not set, replace it with a dummy string.
+utils.open_write_close(answer_path, contents)
 
-@param contents: Contents of the unattended file
-@param dummy_re: Regular expression used to search on the.
-unattended file contents.
-@param env: Name of the environment variable.
-
-dummy_path = C:
-driver = getattr(self, attribute_name, '')
 
-if re.search(dummy_re, contents):
-if self.install_virtio == yes:
-if driver.endswith(msi):
-driver = 'msiexec /passive /package ' + driver
-else:
-try:
-# Let's escape windows style paths properly
-drive, path = driver.split(:)
-driver = drive + : + re.escape(path)
-except:
-pass
-contents = re.sub(dummy_re, driver, contents)
-else:
-contents = re.sub(dummy_re, dummy_path, contents)
-return contents
-
-vdict = {r'\bKVM_TEST_STORAGE_DRIVER_PATH\b':
- 'virtio_storage_path',
- r'\bKVM_TEST_NETWORK_DRIVER_PATH\b':
- 'virtio_network_path',
- 

Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 10:07, Hannes Reinecke wrote:

 On 07/01/2011 10:02 AM, Alexander Graf wrote:
 
 On 01.07.2011, at 09:42, Hannes Reinecke wrote:
 
 Occasionally, the buffer needs to be placed at a offset within
 the iovec when copying the buffer to the iovec.
 
 So this is a buffer into the iovec, right? Wouldn't it make sense
  to also modify iov_to_buf respectively then, so the API stays similar?
 
 Ahem. That's exactly what the patch does. Except from the mixed-up subject.
 
 iov_to_buff() has an offset parameter, iov_from_buf() has not.
 For no obvious reasons.

Ah, I see. Please state this in your patch description :). Makes it a lot 
easier to understand the rationale that you're merely moving the from API 
towards the same parameters as to to one.

 
 Also, it'd be nice to give the parameter a more obvious name, so potential
  users can easily recognize what it offsets.
 
 Yes, that sounds reasonable.
 
 What about 'iov_off' ?
 (And possibly rename 'iovcnt' to 'iov_cnt' for consistency ?)

Yup, that'd be a lot more readable :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 7/9] KVM-GST: KVM Steal time accounting

2011-07-01 Thread Peter Zijlstra
On Thu, 2011-06-30 at 23:50 -0300, Glauber Costa wrote:
 I was under the impression that the proper use of jump labels required 
 each label to be tied to a single location. If we make it inline, the 
 same key would point to multiple locations, and we would have trouble
 altering all of the locations. I might be wrong, of course. Isn't it the 
 case? 

Nope, you can have as many patch sites per key as you want.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.


This makes tracing a bit harder to follow.  Perhaps you can keep the 
transport tag (a uint64_t) in the SCSIRequest for debugging purposes?



Signed-off-by: Hannes Reineckeh...@suse.de
---
  hw/esp.c  |2 +-
  hw/lsi53c895a.c   |   17 -
  hw/scsi-bus.c |   22 +++---
  hw/scsi-disk.c|5 ++---
  hw/scsi-generic.c |4 ++--
  hw/scsi.h |8 
  hw/spapr_vscsi.c  |   41 -
  hw/usb-msd.c  |   10 +-
  trace-events  |   14 +++---
  9 files changed, 52 insertions(+), 71 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 6d3f5d2..912ff89 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t 
busid)

  DPRINTF(do_busid_cmd: busid 0x%x\n, busid);
  lun = busid  7;
-s-current_req = scsi_req_new(s-current_dev, 0, lun);
+s-current_req = scsi_req_new(s-current_dev, lun, s);


Might as well pass NULL here.  The hba_private value is basically 
unnecessary when the adapter doesn't support tagged command queuing.



diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 86582cc..4e2ea03 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -216,8 +216,8 @@ static void usb_msd_transfer_data(SCSIRequest *req, 
uint32_t len)
  MSDState *s = DO_UPCAST(MSDState, dev.qdev, req-bus-qbus.parent);
  USBPacket *p = s-packet;

-if (req-tag != s-tag) {
-fprintf(stderr, usb-msd: Unexpected SCSI Tag 0x%x\n, req-tag);
+if (req-hba_private != s) {
+fprintf(stderr, usb-msd: Unexpected SCSI command 0x%p\n, req);
  }


Same here, just pass NULL and remove these ifs.

Otherwise looks like a very good idea.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 7/9] KVM-GST: KVM Steal time accounting

2011-07-01 Thread Peter Zijlstra
On Thu, 2011-06-30 at 23:53 -0300, Glauber Costa wrote:
 On 06/30/2011 06:54 PM, Peter Zijlstra wrote:
  On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
  +   if (static_branch(paravirt_steal_enabled)) {
 
  How is that going to compile on !CONFIG_PARAVIRT or !x86 in general?
  Only x86-PARAVIRT will provide that variable.
 
 
 
 Good point. I'd wrap it into CONFIG_PARAVIRT.
 To be clear, the reason I did not put it inside 
 CONFIG_PARAVIRT_TIME_ACCOUNTING, is because I wanted to have the mere 
 display of steal time separated from the rest - unless, of course, you 
 object this idea.
 
 Using CONFIG_PARAVIRT achieves this goal well.

ia64 seems to also have CONFIG_PARAVIRT
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:

Actually, the kernel does _not_ do a LUN remapping.


Not the kernel, the in-kernel target.  The in-kernel target can and will
map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
(mapped_lun).


Put in another way: the virtio-scsi device is itself a SCSI
target,


Argl. No way. The virtio-scsi device has to map to a single LUN.


I think we are talking about different things. By virtio-scsi device
I meant the virtio-scsi HBA.  When I referred to a LUN as seen by the
guest, I was calling it a virtual SCSI device.  So yes, we were
calling things with different names.  Perhaps from now on
we can call them virtio-scsi {initiator,target,LUN} and have no
ambiguity?  I'll also modify the spec in this sense.


The SCSI spec itself only deals with LUNs, so anything you'll read in
there obviously will only handle the interaction between the
initiator (read: host) and the LUN itself. However, the actual
command is send via an intermediat target, hence you'll always see
the reference to the ITL (initiator-target-lun) nexus.


Yes, this I understand.


The SCSI spec details discovery of the individual LUNs presented by a
given target, it does _NOT_ detail the discovery of the targets
themselves.  That is being delegated to the underlying transport


And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
transport:

When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
is sent for LUN 0, the driver should ask the initiator to rescan
the target, in order to detect the case when an entire target has
appeared or disappeared.

[If the device fails] to report an event due to missing buffers,
[...] the driver should poll the logical units for unit attention
conditions, and/or do whatever form of bus scan is appropriate for
the guest operating system.


In the case of NPIV it would make sense to map the virtual SCSI host
 to the backend, so that all devices presented to the virtual SCSI
host will be presented to the backend, too. However, when doing so
these devices will normally be referenced by their original LUN, as
these will be presented to the guest via eg 'REPORT LUNS'.


Right.


The above thread now tries to figure out if we should remap those LUN
numbers or just expose them as they are. If we decide on remapping,
we have to emulate _all_ commands referring explicitely to those LUN
numbers (persistent reservations, anyone?).


But it seems to me that commands referring explicitly to LUN numbers
most likely have to be reimplemented anyway for virtualization.  I'm
thinking exactly of persistent reservations.  If two guests on the same
host try a persistent reservation, they should conflict with each other.
If reservation commands were just passed through, they would be seen
as coming from the same initiator (the HBA driver or iSCSI initiator in
the host OS).

etc.


If we don't, we would expose some hardware detail to the guest, but
would save us _a lot_ of processing.


But can we afford it?  And would the architecture allow that at all?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster

2011-07-01 Thread Dan Carpenter
On Thu, Jun 30, 2011 at 04:28:14PM -0700, Dan Magenheimer wrote:
 Hi Dan --
 
 Thanks for the careful review.  You're right... some
 of this was leftover from debugging an off-by-one error,
 though the code as is still works.
 
 OTOH, there's a good chance that much of this sysfs
 code will disappear before zcache would get promoted
 out of staging, since it is to help those experimenting
 with zcache to get more insight into what the underlying
 compression/accept-reject algorithms are doing.
 
 So I hope you (and GregKH) are OK that another version is
 not necessary at this time to fix these.

Off by one errors are kind of insidious.  People cut and paste them
and they spread.  If someone adds a new list of chunks then there
are now two examples that are correct and two which have an extra
element, so it's 50/50 that he'll copy the right one.

Btw, looking at it again, this seems like maybe a similar issue in
zbud_evict_zbpg():

   515  /* now try freeing unbuddied pages, starting with least space 
avail */
   516  for (i = 0; i  MAX_CHUNK; i++) {
   517  retry_unbud_list_i:


MAX_CHUNKS is NCHUNKS - 1.  Shouldn't that be i  NCHUNKS so that we
reach the last element in the list?

regards,
dan carpenter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


VT-d2 support inside Linux kernel

2011-07-01 Thread Cyclonus J
Hi,

Is there any information I can find about VT-d2 support inside Linux
kernel? It was marked as WIP on KVM todo list. In current top of tree,
I am seeing some ats support for intel-iommu. Does that mean the ATS
part is already finished?

 git log -p -1 93a23a72
commit 93a23a7271dfb811b3adb72779054c3a24433112
Author: Yu Zhao yu.z...@intel.com
Date:   Mon May 18 13:51:37 2009 +0800

VT-d: support the device IOTLB

Enable the device IOTLB (i.e. ATS) for both the bare metal and KVM
environments.

Signed-off-by: Yu Zhao yu.z...@intel.com
Signed-off-by: David Woodhouse david.woodho...@intel.com

Thanks,
CJ
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer

2011-07-01 Thread Hannes Reinecke

On 07/01/2011 10:27 AM, Paolo Bonzini wrote:

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.


This makes tracing a bit harder to follow. Perhaps you can keep the
transport tag (a uint64_t) in the SCSIRequest for debugging purposes?


Sure. Anything to get the patches accepted :-)


Signed-off-by: Hannes Reineckeh...@suse.de
---
hw/esp.c | 2 +-
hw/lsi53c895a.c | 17 -
hw/scsi-bus.c | 22 +++---
hw/scsi-disk.c | 5 ++---
hw/scsi-generic.c | 4 ++--
hw/scsi.h | 8 
hw/spapr_vscsi.c | 41 -
hw/usb-msd.c | 10 +-
trace-events | 14 +++---
9 files changed, 52 insertions(+), 71 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 6d3f5d2..912ff89 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t
*buf, uint8_t busid)

DPRINTF(do_busid_cmd: busid 0x%x\n, busid);
lun = busid 7;
- s-current_req = scsi_req_new(s-current_dev, 0, lun);
+ s-current_req = scsi_req_new(s-current_dev, lun, s);


Might as well pass NULL here. The hba_private value is basically
unnecessary when the adapter doesn't support tagged command queuing.


diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 86582cc..4e2ea03 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -216,8 +216,8 @@ static void usb_msd_transfer_data(SCSIRequest
*req, uint32_t len)
MSDState *s = DO_UPCAST(MSDState, dev.qdev, req-bus-qbus.parent);
USBPacket *p = s-packet;

- if (req-tag != s-tag) {
- fprintf(stderr, usb-msd: Unexpected SCSI Tag 0x%x\n, req-tag);
+ if (req-hba_private != s) {
+ fprintf(stderr, usb-msd: Unexpected SCSI command 0x%p\n, req);
}


Same here, just pass NULL and remove these ifs.

Otherwise looks like a very good idea.


Ok, I'll be resending both.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 09:42, Hannes Reinecke wrote:

 This patch adds an emulation for the LSI Megaraid SAS 8708EM2 HBA.
 
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
 Makefile.objs   |1 +
 default-configs/pci.mak |1 +
 hw/megasas.c| 1923 +++
 hw/mfi.h| 1197 +
 hw/pci_ids.h|3 +-
 5 files changed, 3124 insertions(+), 1 deletions(-)
 create mode 100644 hw/megasas.c
 create mode 100644 hw/mfi.h
 
 diff --git a/Makefile.objs b/Makefile.objs
 index cea15e4..6f5d113 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -258,6 +258,7 @@ hw-obj-$(CONFIG_AHCI) += ide/ich.o
 
 # SCSI layer
 hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
 +hw-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
 hw-obj-$(CONFIG_ESP) += esp.o
 
 hw-obj-y += dma-helpers.o sysbus.o isa-bus.o
 diff --git a/default-configs/pci.mak b/default-configs/pci.mak
 index 22bd350..fabb56c 100644
 --- a/default-configs/pci.mak
 +++ b/default-configs/pci.mak
 @@ -9,6 +9,7 @@ CONFIG_EEPRO100_PCI=y
 CONFIG_PCNET_PCI=y
 CONFIG_PCNET_COMMON=y
 CONFIG_LSI_SCSI_PCI=y
 +CONFIG_MEGASAS_SCSI_PCI=y
 CONFIG_RTL8139_PCI=y
 CONFIG_E1000_PCI=y
 CONFIG_IDE_CORE=y
 diff --git a/hw/megasas.c b/hw/megasas.c
 new file mode 100644
 index 000..75f9be3
 --- /dev/null
 +++ b/hw/megasas.c
 @@ -0,0 +1,1923 @@
 +/*
 + * QEMU MegaRAID SAS 8708EM2 Host Bus Adapter emulation
 + *
 + * Copyright (c) 2009-2011 Hannes Reinecke, SUSE Labs
 + *
 + * This code is licenced under the LGPL.

Please take a look at the license header of other LGPL code and just copy it :).

 + */
 +
 +#include time.h
 +#include assert.h

Are you sure you need to manually include those?

 +
 +#include hw.h
 +#include pci.h
 +#include dma.h
 +#include iov.h
 +#include scsi.h
 +#include scsi-defs.h
 +#include block_int.h
 +#ifdef __linux__
 +# include scsi/sg.h

Is this really necessary? Device code shouldn't be host dependent IMHO. I also 
haven't found any user of this in the actual code, so it might be as easy as 
merely removing the include :).

 +#endif
 +
 +#include mfi.h
 +
 +#define DEBUG_MEGASAS
 +#undef DEBUG_MEGASAS_REG
 +#undef DEBUG_MEGASAS_QUEUE
 +#undef DEBUG_MEGASAS_MFI
 +#undef DEBUG_MEGASAS_IO
 +#undef DEBUG_MEGASAS_DCMD
 +
 +#ifdef DEBUG_MEGASAS
 +#define DPRINTF(fmt, ...) \
 +do { printf(megasas:  fmt , ## __VA_ARGS__); } while (0)
 +#define BADF(fmt, ...) \
 +do { fprintf(stderr, megasas: error:  fmt , ## __VA_ARGS__); exit(1);} 
 while (0)
 +#ifdef DEBUG_MEGASAS_REG
 +#define DPRINTF_REG DPRINTF
 +#else
 +#define DPRINTF_REG(fmt, ...) do {} while(0)
 +#endif
 +#ifdef DEBUG_MEGASAS_QUEUE
 +#define DPRINTF_QUEUE DPRINTF
 +#else
 +#define DPRINTF_QUEUE(fmt, ...) do {} while(0)
 +#endif
 +#ifdef DEBUG_MEGASAS_MFI
 +#define DPRINTF_MFI DPRINTF
 +#else
 +#define DPRINTF_MFI(fmt, ...) do {} while(0)
 +#endif
 +#ifdef DEBUG_MEGASAS_IO
 +#define DPRINTF_IO DPRINTF
 +#else
 +#define DPRINTF_IO(fmt, ...) do {} while(0)
 +#endif
 +#ifdef DEBUG_MEGASAS_DCMD
 +#define DPRINTF_DCMD DPRINTF
 +#else
 +#define DPRINTF_DCMD(fmt, ...) do {} while(0)
 +#endif
 +#else
 +#define DPRINTF(fmt, ...) do {} while(0)
 +#define DPRINTF_REG DPRINTF
 +#define DPRINTF_QUEUE DPRINTF
 +#define DPRINTF_MFI DPRINTF
 +#define DPRINTF_IO DPRINTF
 +#define DPRINTF_DCMD DPRINTF
 +#define BADF(fmt, ...) \
 +do { fprintf(stderr, megasas: error:  fmt , ## __VA_ARGS__);} while (0)
 +#endif
 +
 +/* Static definitions */
 +#define MEGASAS_VERSION 1.20
 +#define MEGASAS_MAX_FRAMES 2048 /* Firmware limit at 65535 */
 +#define MEGASAS_DEFAULT_FRAMES 1000 /* Windows requires this */
 +#define MEGASAS_MAX_SGE 256 /* Firmware limit */
 +#define MEGASAS_DEFAULT_SGE 80
 +#define MEGASAS_MAX_SECTORS 0x  /* No real limit */
 +#define MEGASAS_MAX_ARRAYS 128
 +
 +const char *mfi_frame_desc[] = {
 +MFI init, LD Read, LD Write, LD SCSI, PD SCSI,
 +MFI Doorbell, MFI Abort, MFI SMP, MFI Stop};
 +
 +struct megasas_cmd_t {
 +int index;
 +int context;
 +int count;
 +
 +target_phys_addr_t pa;
 +target_phys_addr_t pa_size;
 +union mfi_frame *frame;
 +SCSIRequest *req;
 +struct iovec *iov;
 +void *iov_buf;
 +long iov_cnt;
 +long iov_size;
 +long iov_offset;

Why would anything be a long? It's either target_ulong or uintXX_t for device 
code usually :).

 +SCSIDevice *sdev;
 +struct megasas_state_t *state;
 +};
 +
 +typedef struct megasas_state_t {
 +PCIDevice dev;
 +int mmio_io_addr;
 +int io_addr;
 +int queue_addr;
 +uint32_t frame_hi;
 +
 +int fw_state;
 +uint32_t fw_sge;
 +uint32_t fw_cmds;
 +int fw_luns;
 +int intr_mask;
 +int doorbell;
 +int busy;
 +char *raid_mode_str;
 +int is_jbod;
 +
 +int event_count;
 +int shutdown_event;
 +int boot_event;
 +
 +uint64_t reply_queue_pa;
 +void *reply_queue;
 +int reply_queue_len;
 +int reply_queue_index;
 +  

[PATCH resend] compat_ioctl: fix warning caused by qemu

2011-07-01 Thread Johannes Stezenbach
On Linux x86_64 host with 32bit userspace, running
qemu or even just qemu-img create -f qcow2 some.img 1G
causes a kernel warning:

ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(5326){t:'S';sz:0} 
arg(7fff) on some.img
ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} 
arg(fff77350) on some.img

ioctl 5326 is CDROM_DRIVE_STATUS,
ioctl 801c0204 is FDGETPRM.

The warning appears because the Linux compat-ioctl handler for these
ioctls only applies to block devices, while qemu also uses the ioctls on
plain files.

Signed-off-by: Johannes Stezenbach j...@sig21.net
---
(resend with Cc: suggested by get_maintainer.pl)

discussed in http://lkml.kernel.org/r/20110617090424.ga19...@sig21.net

Arnd, is this what you had in mind, or did you mean to move
all floppy compat definitions?  I decided to go with the
minimal change.  Tested on both 2.6.39.2 and 3.0-rc5-63-g0d72c6f.


diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
index cc3eb78..7b72502 100644
--- a/block/compat_ioctl.c
+++ b/block/compat_ioctl.c
@@ -208,19 +208,6 @@ static int compat_blkpg_ioctl(struct block_device *bdev, 
fmode_t mode,
 #define BLKBSZSET_32   _IOW(0x12, 113, int)
 #define BLKGETSIZE64_32_IOR(0x12, 114, int)
 
-struct compat_floppy_struct {
-   compat_uint_t   size;
-   compat_uint_t   sect;
-   compat_uint_t   head;
-   compat_uint_t   track;
-   compat_uint_t   stretch;
-   unsigned char   gap;
-   unsigned char   rate;
-   unsigned char   spec1;
-   unsigned char   fmt_gap;
-   const compat_caddr_t name;
-};
-
 struct compat_floppy_drive_params {
charcmos;
compat_ulong_t  max_dtr;
@@ -288,7 +275,6 @@ struct compat_floppy_write_errors {
 
 #define FDSETPRM32 _IOW(2, 0x42, struct compat_floppy_struct)
 #define FDDEFPRM32 _IOW(2, 0x43, struct compat_floppy_struct)
-#define FDGETPRM32 _IOR(2, 0x04, struct compat_floppy_struct)
 #define FDSETDRVPRM32 _IOW(2, 0x90, struct compat_floppy_drive_params)
 #define FDGETDRVPRM32 _IOR(2, 0x11, struct compat_floppy_drive_params)
 #define FDGETDRVSTAT32 _IOR(2, 0x12, struct compat_floppy_drive_struct)
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 61abb63..8be086e 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -68,6 +68,8 @@
 
 #ifdef CONFIG_BLOCK
 #include linux/loop.h
+#include linux/cdrom.h
+#include linux/fd.h
 #include scsi/scsi.h
 #include scsi/scsi_ioctl.h
 #include scsi/sg.h
@@ -944,6 +946,9 @@ COMPATIBLE_IOCTL(FIOQSIZE)
 IGNORE_IOCTL(LOOP_CLR_FD)
 /* md calls this on random blockdevs */
 IGNORE_IOCTL(RAID_VERSION)
+/* qemu/qemu-img might call these two on plain files for probing */
+IGNORE_IOCTL(CDROM_DRIVE_STATUS)
+IGNORE_IOCTL(FDGETPRM32)
 /* SG stuff */
 COMPATIBLE_IOCTL(SG_SET_TIMEOUT)
 COMPATIBLE_IOCTL(SG_GET_TIMEOUT)
diff --git a/include/linux/fd.h b/include/linux/fd.h
index f5d194a..c6a68d0 100644
--- a/include/linux/fd.h
+++ b/include/linux/fd.h
@@ -3,6 +3,7 @@
 
 #include linux/ioctl.h
 #include linux/compiler.h
+#include linux/compat.h
 
 /* New file layout: Now the ioctl definitions immediately follow the
  * definitions of the structures that they use */
@@ -377,4 +378,21 @@ struct floppy_raw_cmd {
 #define FDEJECT _IO(2, 0x5a)
 /* eject the disk */
 
+#ifdef CONFIG_COMPAT
+struct compat_floppy_struct {
+   compat_uint_t   size;
+   compat_uint_t   sect;
+   compat_uint_t   head;
+   compat_uint_t   track;
+   compat_uint_t   stretch;
+   unsigned char   gap;
+   unsigned char   rate;
+   unsigned char   spec1;
+   unsigned char   fmt_gap;
+   const compat_caddr_t name;
+};
+
+#define FDGETPRM32 _IOR(2, 0x04, struct compat_floppy_struct)
+#endif
+
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/17] KVM: PPC: Fix machine checks on 32-bit Book3S

2011-07-01 Thread Alexander Graf

On 29.06.2011, at 12:16, Paul Mackerras wrote:

 Commit 69acc0d3ba (KVM: PPC: Resolve real-mode handlers through
 function exports) resulted in vcpu-arch.trampoline_lowmem and
 vcpu-arch.trampoline_enter ending up with kernel virtual addresses
 rather than physical addresses.  This is OK on 64-bit Book3S machines,
 which ignore the top 4 bits of the effective address in real mode,
 but on 32-bit Book3S machines, accessing these addresses in real mode
 causes machine check interrupts, as the hardware uses the whole
 effective address as the physical address in real mode.
 
 This fixes the problem by using __pa() to convert these addresses
 to physical addresses.

Ouch. Thanks for the catch! I really need to include book3s_32 in my automated 
testing :(.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate

2011-07-01 Thread Paul Mackerras
On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote:
 On 06/29/2011 12:41 PM, Paul Mackerras wrote:
 +struct kvm_ppc_set_platform {
 +__u16 platform; /* defines the OS/hypervisor ABI */
 +__u16 guest_arch;   /* e.g. decimal 206 for v2.06 */
 +__u32 flags;
 
 Please add some padding so we can extend it later if necessary.
 
 +};
 +
 +/* Values for platform */
 +#define KVM_PPC_PV_NONE 0   /* bare-metal, 
 non-paravirtualized */
 +#define KVM_PPC_PV_KVM  1   /* as defined in kvm_para.h */
 +#define KVM_PPC_PV_SPAPR2   /* IBM Server PAPR (a la PowerVM) */
 
 We also support BookE which would be useful to also include in the list.
 Furthermore, KVM is more of a feature flag than a platform. We can
 easily support KVM extensions on an SPAPR platform, no?

Yes, I guess so.  The hypercall sequence will have to be different,
since ordinary system call interrupts go straight to the guest.  But I
guess you've allowed for that with the hypercall sequence property in
the device tree.

 This whole interface also could deprecate the PVR setting one, so we
 can simply include PVR as well and not require kernel space to jump
 through hoops to figure out its capabilities.

I debated about whether to include a PVR value in this structure.

The thing is that POWER7 has the Processor Compatibility Register
(PCR), which has a bit which makes the processor behave in user mode
as if it were a POWER6.  So, we could run a book3s_hv guest in POWER6
mode by setting this bit (which we might want to do to run older
distros).  However, this bit doesn't affect the PVR value that the
guest sees.  That's why I went for an architecture level rather than a
specific PVR value.

We could go with a PVR value and use the logical PVR values defined
in PAPR to represent architecture levels, e.g. 0x0f02 for
architecture v2.05 (POWER6).

 And we need to identify 32-bit BookS processors, so we can go into
 32-bit mode when necessary. That should also be a different
 guest_arch, right?

Right.  If we go with a PVR value then we just use the PVR value for a
suitable 32-bit processor.

 +
 +/* Values for flags */
 +#define KVM_PPC_CROSS_ARCH  1   /* guest architecture != host */
 
 User space shouldn't have to worry about this one. It's up to the
 kernel to decide that it's cross.

I put that in because we might want to force the use of book3s_pr, for
example if we know we're going to want to do emulated MMIO or
something else that isn't implemented in book3s_hv just yet.

Ultimately, yes, the kernel should be able to decide whether it's
cross or not.  However, I don't think we should make it completely
opaque to userspace as to whether the kernel is using _pr or _hv.
If nothing else, userspace should be able to find out and tell the
user so that performance expectations can be set correctly.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 12:09, Paul Mackerras wrote:

 On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote:
 On 06/29/2011 12:41 PM, Paul Mackerras wrote:
 +struct kvm_ppc_set_platform {
 +   __u16 platform; /* defines the OS/hypervisor ABI */
 +   __u16 guest_arch;   /* e.g. decimal 206 for v2.06 */
 +   __u32 flags;
 
 Please add some padding so we can extend it later if necessary.
 
 +};
 +
 +/* Values for platform */
 +#define KVM_PPC_PV_NONE0   /* bare-metal, 
 non-paravirtualized */
 +#define KVM_PPC_PV_KVM 1   /* as defined in kvm_para.h */
 +#define KVM_PPC_PV_SPAPR   2   /* IBM Server PAPR (a la PowerVM) */
 
 We also support BookE which would be useful to also include in the list.
 Furthermore, KVM is more of a feature flag than a platform. We can
 easily support KVM extensions on an SPAPR platform, no?
 
 Yes, I guess so.  The hypercall sequence will have to be different,
 since ordinary system call interrupts go straight to the guest.  But I
 guess you've allowed for that with the hypercall sequence property in
 the device tree.
 
 This whole interface also could deprecate the PVR setting one, so we
 can simply include PVR as well and not require kernel space to jump
 through hoops to figure out its capabilities.
 
 I debated about whether to include a PVR value in this structure.
 
 The thing is that POWER7 has the Processor Compatibility Register
 (PCR), which has a bit which makes the processor behave in user mode
 as if it were a POWER6.  So, we could run a book3s_hv guest in POWER6
 mode by setting this bit (which we might want to do to run older
 distros).  However, this bit doesn't affect the PVR value that the
 guest sees.  That's why I went for an architecture level rather than a
 specific PVR value.
 
 We could go with a PVR value and use the logical PVR values defined
 in PAPR to represent architecture levels, e.g. 0x0f02 for
 architecture v2.05 (POWER6).

IIUC the PVR values are somewhat standardized to contain major and minor 
revision numbers. Can't we just mask out the minor ones and match for known 
good systems?

 
 And we need to identify 32-bit BookS processors, so we can go into
 32-bit mode when necessary. That should also be a different
 guest_arch, right?
 
 Right.  If we go with a PVR value then we just use the PVR value for a
 suitable 32-bit processor.

Well, we need to have some way of mapping PVR to arch then. KVM easily supports 
-cpu G3 and G4. We might also want to have some information on feature flags, 
such as Altivec or SPE mode available. Or paired singles :). I'm not sure I 
want to have all that mapping information inside the kernel.

So what we could do is we just provide as much information as we can from user 
space, including PVR, architecture (2.01 for example), features (32/64-bit, 
booke/books, fpu, altivec, spe, ...).

 
 +
 +/* Values for flags */
 +#define KVM_PPC_CROSS_ARCH 1   /* guest architecture != host */
 
 User space shouldn't have to worry about this one. It's up to the
 kernel to decide that it's cross.
 
 I put that in because we might want to force the use of book3s_pr, for
 example if we know we're going to want to do emulated MMIO or
 something else that isn't implemented in book3s_hv just yet.

Ah, I see. Well, we could just add a flag to the feature list saying MMIO. If 
that's impossible to satisfy (HV only), fail the call. Otherwise switch to _pr 
mode. Later when _hv might be able to support MMIO, we can use it without 
changing user space.

 Ultimately, yes, the kernel should be able to decide whether it's
 cross or not.  However, I don't think we should make it completely
 opaque to userspace as to whether the kernel is using _pr or _hv.
 If nothing else, userspace should be able to find out and tell the
 user so that performance expectations can be set correctly.

Hrm. Sure, but the decision should be done in kernel land based on all 
information required to actually make it. And the kernel has more information 
regarding the system it's running on, so that's the place to actually do the 
decision. Bubbling it up to user space again is certainly fine by me :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/31] kvm tools: Add UDP support for uip

2011-07-01 Thread Ingo Molnar

* Asias He asias.he...@gmail.com wrote:

 +static void *uip_udp_socket_thread(void *p)
 +{
 + struct epoll_event events[UIP_UDP_MAX_EVENTS];
 + struct uip_udp_socket *sk;
 + struct uip_info *info;
 + struct uip_eth *eth2;
 + struct uip_udp *udp2;
 + struct uip_buf *buf;
 + struct uip_ip *ip2;
 + u8 *payload;
 + int nfds;
 + int ret;
 + int i;
 +
 + info = p;
 +
 + do {
 + payload = malloc(UIP_MAX_UDP_PAYLOAD);
 + } while (!payload);
 +
 + while (1) {
 + nfds = epoll_wait(info-udp_epollfd, events, 
 UIP_UDP_MAX_EVENTS, -1);
 +
 + if (nfds == -1)
 + continue;
 +
 + for (i = 0; i  nfds; i++) {
 +
 + sk = events[i].data.ptr;
 + ret = recvfrom(sk-fd, payload, UIP_MAX_UDP_PAYLOAD, 0, 
 NULL, NULL);
 + if (ret  0)
 + continue;
 +
 + /*
 +  * Get free buffer to send data to guest
 +  */
 + buf = uip_buf_get_free(info);
 +
 + /*
 +  * Cook a ethernet frame
 +  */
 + udp2= (struct uip_udp *)(buf-eth);
 + eth2= (struct uip_eth *)buf-eth;
 + ip2 = (struct uip_ip *)(buf-eth);
 +
 + eth2-src   = info-host_mac;
 + eth2-dst   = info-guest_mac;
 + eth2-type  = htons(UIP_ETH_P_IP);
 +
 + ip2-vhl= UIP_IP_VER_4 | UIP_IP_HDR_LEN;
 + ip2-tos= 0;
 + ip2-id = 0;
 + ip2-flgfrag= 0;
 + ip2-ttl= UIP_IP_TTL;
 + ip2-proto  = UIP_IP_P_UDP;
 + ip2-csum   = 0;
 + ip2-sip= sk-dip;
 + ip2-dip= sk-sip;
 +
 + udp2-sport = sk-dport;
 + udp2-dport = sk-sport;
 + udp2-len   = htons(ret + uip_udp_hdrlen(udp2));
 + udp2-csum  = 0;
 +
 + memcpy(udp2-payload, payload, ret);
 +
 + ip2-len= udp2-len + htons(uip_ip_hdrlen(ip2));
 + ip2-csum   = uip_csum_ip(ip2);
 + udp2-csum  = uip_csum_udp(udp2);
 +
 + /*
 +  * virtio_net_hdr
 +  */
 + buf-vnet_len   = sizeof(struct virtio_net_hdr);
 + memset(buf-vnet, 0, buf-vnet_len);
 +
 + buf-eth_len= ntohs(ip2-len) + 
 uip_eth_hdrlen(ip2-eth);
 +
 + /*
 +  * Send data received from socket to guest
 +  */
 + uip_buf_set_used(info, buf);
 + }
 + }
 +
 + free(payload);
 + pthread_exit(NULL);
 + return NULL;
 +}

This function is way too large, please split out the meat of it into 
a separate helper inline.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/17] KVM: PPC: Deliver program interrupts right away instead of queueing them

2011-07-01 Thread Alexander Graf

On 29.06.2011, at 12:18, Paul Mackerras wrote:

 Doing so means that we don't have to save the flags anywhere and gets
 rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c.
 
 Doing so is OK because a program interrupt won't be generated at the
 same time as any other synchronous interrupt.  If a program interrupt
 and an asynchronous interrupt (external or decrementer) are generated
 at the same time, the program interrupt will be delivered, which is
 correct because it has a higher priority, and then the asynchronous
 interrupt will be masked.
 
 We don't ever generate system reset or machine check interrupts to the
 guest, but if we did, then we would need to make sure they got delivered
 rather than the program interrupt.  The current code would be wrong in
 this situation anyway since it would deliver the program interrupt as
 well as the reset/machine check interrupt.
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
 arch/powerpc/kvm/book3s.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
 index 163e3e1..f68a34d 100644
 --- a/arch/powerpc/kvm/book3s.c
 +++ b/arch/powerpc/kvm/book3s.c
 @@ -129,8 +129,8 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, 
 unsigned int vec)
 
 void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
 - to_book3s(vcpu)-prog_flags = flags;

Now that prog_flags is unused, please remove it from the headers.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/31] Implement user mode network for kvm tools

2011-07-01 Thread Ingo Molnar

* Asias He asias.he...@gmail.com wrote:

  Usermode TCP/IP can be quite cumbersome for users as things like 
  ping and ip6 won't work properly.
 
 Yes, usermode TCP/IP do have limits. But it's more cumbersome for 
 user to setup bridge/nat thing with privileged networking. The 
 network setup is a headache for some users.

That group of 'some users' includes me for example.

The thing is, when i test an existing distro image there's better 
things to do with my time than to figure out that year's preferred 
method of configuring the network and troubleshooting it if it goes 
wrong. So having zero-config networking (assuming we grow some DHCP 
capability as well) would be a real plus.

 This patchset implements things like 'qemu -net user' without the 
 slirp.
 
 I just took at a look the LOC in qemu and uip.
 
 qemu.git$ cat slirp/*.{c,h} net/slirp.{c,h}| wc -l
 11514
 
 kernel.git/tools/kvm$ cat uip/*.{c,h} include/kvm/uip.h | wc -l
 1312

That's pretty impressive (if it does not come at the expensive of 
features that Qemu's slirp code has) - and the thing is that we don't 
actually have to implement the vast majority of TCP-IP features, 
because the transport between the guest and the host is obviously 
reliable.

This patch-set turned out to be a *lot* more simple than i first 
thought it would end up.

Simpler also means potentially faster and potentially more secure.

( The lack of ipv6 is not something we should worry about too much, 
  ipv4 should scale up to a couple of hundred thousand virtual
  machines per box, right? )

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer

2011-07-01 Thread Hannes Reinecke

On 07/01/2011 10:27 AM, Paolo Bonzini wrote:

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.


This makes tracing a bit harder to follow. Perhaps you can keep the
transport tag (a uint64_t) in the SCSIRequest for debugging purposes?


Hmm. The transport tag wouldn't have any meaning outside scsi-bus.c.
And it's a 64-bit value. So why can't we use the hba_private pointer 
directly here?
After some I/O has been ongoing the linear 'tag' number becomes 
unreadable very quickly, so there's not much difference here ...


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/31] Implement user mode network for kvm tools

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 13:53, Ingo Molnar wrote:

 
 * Asias He asias.he...@gmail.com wrote:
 
 Usermode TCP/IP can be quite cumbersome for users as things like 
 ping and ip6 won't work properly.
 
 Yes, usermode TCP/IP do have limits. But it's more cumbersome for 
 user to setup bridge/nat thing with privileged networking. The 
 network setup is a headache for some users.
 
 That group of 'some users' includes me for example.
 
 The thing is, when i test an existing distro image there's better 
 things to do with my time than to figure out that year's preferred 
 method of configuring the network and troubleshooting it if it goes 
 wrong. So having zero-config networking (assuming we grow some DHCP 
 capability as well) would be a real plus.
 
 This patchset implements things like 'qemu -net user' without the 
 slirp.
 
 I just took at a look the LOC in qemu and uip.
 
 qemu.git$ cat slirp/*.{c,h} net/slirp.{c,h}| wc -l
 11514
 
 kernel.git/tools/kvm$ cat uip/*.{c,h} include/kvm/uip.h | wc -l
 1312
 
 That's pretty impressive (if it does not come at the expensive of 
 features that Qemu's slirp code has) - and the thing is that we don't 
 actually have to implement the vast majority of TCP-IP features, 
 because the transport between the guest and the host is obviously 
 reliable.

I don't see how it would. Once you overrun device buffers, you have to do 
something. Either you drop packets or you stall the guest. I'd usually prefer 
the former :).

 This patch-set turned out to be a *lot* more simple than i first 
 thought it would end up.
 
 Simpler also means potentially faster and potentially more secure.
 
 ( The lack of ipv6 is not something we should worry about too much, 
  ipv4 should scale up to a couple of hundred thousand virtual
  machines per box, right? )

Well, if the system you're trying to connect to supports ipv4, sure. If it 
doesn't, tough luck :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster

2011-07-01 Thread Dan Magenheimer
 From: Dan Carpenter [mailto:erro...@gmail.com]
 
 On Thu, Jun 30, 2011 at 04:28:14PM -0700, Dan Magenheimer wrote:
  Hi Dan --
 
  Thanks for the careful review.  You're right... some
  of this was leftover from debugging an off-by-one error,
  though the code as is still works.
 
  OTOH, there's a good chance that much of this sysfs
  code will disappear before zcache would get promoted
  out of staging, since it is to help those experimenting
  with zcache to get more insight into what the underlying
  compression/accept-reject algorithms are doing.
 
  So I hope you (and GregKH) are OK that another version is
  not necessary at this time to fix these.
 
 Off by one errors are kind of insidious.  People cut and paste them
 and they spread.  If someone adds a new list of chunks then there
 are now two examples that are correct and two which have an extra
 element, so it's 50/50 that he'll copy the right one.

True, but these are NOT off-by-one errors... they are
correct-but-slightly-ugly code snippets.  (To clarify, I said
the *ugliness* arose when debugging an off-by-one error.)

Patches always welcome, and I agree that these should be
fixed eventually, assuming the code doesn't go away completely
first.. I'm simply stating the position
that going through another test/submit cycling to fix
correct-but-slightly-ugly code which is present only to
surface information for experiments is not high on my priority
list right now... unless GregKH says he won't accept the patch.
 
 Btw, looking at it again, this seems like maybe a similar issue in
 zbud_evict_zbpg():
 
516  for (i = 0; i  MAX_CHUNK; i++) {
517  retry_unbud_list_i:
 
 
 MAX_CHUNKS is NCHUNKS - 1.  Shouldn't that be i  NCHUNKS so that we
 reach the last element in the list?

No, the last element in that list is unused.  There is a comment
to that effect someplace in the code.  (These lists are keeping
track of pages with chunks of available space and the last
entry would have no available space so is always empty.)

Thanks again for your interest... are you using zcache?

Dan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 03:11 PM, Hannes Reinecke wrote:

On 07/01/2011 10:27 AM, Paolo Bonzini wrote:

On 07/01/2011 09:42 AM, Hannes Reinecke wrote:

'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.


This makes tracing a bit harder to follow. Perhaps you can keep the
transport tag (a uint64_t) in the SCSIRequest for debugging purposes?


Hmm. The transport tag wouldn't have any meaning outside scsi-bus.c.


It depends, in vmw_pvscsi I take it from a field in the request block 
that is 0..255.  So either you have a small tag that is recycled but 
stays nice, or a large tag that is unwieldy but should not be recycled 
ever.  A pointer is unwieldy _and_ is recycled, so it gives the worse of 
both worlds.


But I'm not very attached to this, I may even do it myself if/when I 
find the need.  Won't ack yet because of the nit with ESP/USB, but even 
if you do not bother I will ack the next respin.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] compat_ioctl: fix warning caused by qemu

2011-07-01 Thread Arnd Bergmann
On Friday 01 July 2011, Johannes Stezenbach wrote:
 
 On Linux x86_64 host with 32bit userspace, running
 qemu or even just qemu-img create -f qcow2 some.img 1G
 causes a kernel warning:
 
 ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(5326){t:'S';sz:0} 
 arg(7fff) on some.img
 ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} 
 arg(fff77350) on some.img
 
 ioctl 5326 is CDROM_DRIVE_STATUS,
 ioctl 801c0204 is FDGETPRM.
 
 The warning appears because the Linux compat-ioctl handler for these
 ioctls only applies to block devices, while qemu also uses the ioctls on
 plain files.
 
 Signed-off-by: Johannes Stezenbach j...@sig21.net

Acked-by: Arnd Bergmann a...@arndb.de

 ---
 (resend with Cc: suggested by get_maintainer.pl)
 
 discussed in http://lkml.kernel.org/r/20110617090424.ga19...@sig21.net
 
 Arnd, is this what you had in mind, or did you mean to move
 all floppy compat definitions?  I decided to go with the
 minimal change.  Tested on both 2.6.39.2 and 3.0-rc5-63-g0d72c6f.

Yes, that should be fine, unless Jens would like to see a different
solution for the struct definitions, e.g. moving all of the floppy
compat ioctl numbers to fd.h. I'm fine with it either way.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/31] kvm tools: Add UDP support for uip

2011-07-01 Thread Asias He
On 07/01/2011 07:46 PM, Ingo Molnar wrote:
 
 * Asias He asias.he...@gmail.com wrote:
 
 +static void *uip_udp_socket_thread(void *p)
 +{
 +struct epoll_event events[UIP_UDP_MAX_EVENTS];
 +struct uip_udp_socket *sk;
 +struct uip_info *info;
 +struct uip_eth *eth2;
 +struct uip_udp *udp2;
 +struct uip_buf *buf;
 +struct uip_ip *ip2;
 +u8 *payload;
 +int nfds;
 +int ret;
 +int i;
 +
 +info = p;
 +
 +do {
 +payload = malloc(UIP_MAX_UDP_PAYLOAD);
 +} while (!payload);
 +
 +while (1) {
 +nfds = epoll_wait(info-udp_epollfd, events, 
 UIP_UDP_MAX_EVENTS, -1);
 +
 +if (nfds == -1)
 +continue;
 +
 +for (i = 0; i  nfds; i++) {
 +
 +sk = events[i].data.ptr;
 +ret = recvfrom(sk-fd, payload, UIP_MAX_UDP_PAYLOAD, 0, 
 NULL, NULL);
 +if (ret  0)
 +continue;
 +
 +/*
 + * Get free buffer to send data to guest
 + */
 +buf = uip_buf_get_free(info);
 +
 +/*
 + * Cook a ethernet frame
 + */
 +udp2= (struct uip_udp *)(buf-eth);
 +eth2= (struct uip_eth *)buf-eth;
 +ip2 = (struct uip_ip *)(buf-eth);
 +
 +eth2-src   = info-host_mac;
 +eth2-dst   = info-guest_mac;
 +eth2-type  = htons(UIP_ETH_P_IP);
 +
 +ip2-vhl= UIP_IP_VER_4 | UIP_IP_HDR_LEN;
 +ip2-tos= 0;
 +ip2-id = 0;
 +ip2-flgfrag= 0;
 +ip2-ttl= UIP_IP_TTL;
 +ip2-proto  = UIP_IP_P_UDP;
 +ip2-csum   = 0;
 +ip2-sip= sk-dip;
 +ip2-dip= sk-sip;
 +
 +udp2-sport = sk-dport;
 +udp2-dport = sk-sport;
 +udp2-len   = htons(ret + uip_udp_hdrlen(udp2));
 +udp2-csum  = 0;
 +
 +memcpy(udp2-payload, payload, ret);
 +
 +ip2-len= udp2-len + htons(uip_ip_hdrlen(ip2));
 +ip2-csum   = uip_csum_ip(ip2);
 +udp2-csum  = uip_csum_udp(udp2);
 +
 +/*
 + * virtio_net_hdr
 + */
 +buf-vnet_len   = sizeof(struct virtio_net_hdr);
 +memset(buf-vnet, 0, buf-vnet_len);
 +
 +buf-eth_len= ntohs(ip2-len) + 
 uip_eth_hdrlen(ip2-eth);
 +
 +/*
 + * Send data received from socket to guest
 + */
 +uip_buf_set_used(info, buf);
 +}
 +}
 +
 +free(payload);
 +pthread_exit(NULL);
 +return NULL;
 +}
 
 This function is way too large, please split out the meat of it into 
 a separate helper inline.

Will do. Thanks.

-- 
Best Regards,
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] [v5] Megasas HBA Emulation

2011-07-01 Thread Hannes Reinecke
Hi all,

after getting various feedback from Paolo, Stefan, and Alexander
I've respun the patches.

Chances since the previous version:
- iov: Update parameter usage in iov_(to|from)_buf()
  Updated description for the first patch and clarified the usage
  Renamed arguments for io_XXX for clarification
- scsi: Add 'hba_private' to SCSIRequest
  Kept 'tag' for tracing and just add 'hba_private' as an
  additional field as per request from Paolo
- megasas: checkpatch.pl fixes and update to work with the
  changed interface in scsi_req_new(). Also included the
  suggested fixes from Alex.

Hannes Reinecke (3):
  iov: Update parameter usage in iov_(to|from)_buf()
  scsi: Add 'hba_private' to SCSIRequest
  megasas: LSI Megaraid SAS emulation

 Makefile.objs   |1 +
 default-configs/pci.mak |1 +
 hw/esp.c|2 +-
 hw/lsi53c895a.c |   22 +-
 hw/megasas.c| 1920 +++
 hw/mfi.h| 1197 +
 hw/pci_ids.h|3 +-
 hw/scsi-bus.c   |9 +-
 hw/scsi-disk.c  |4 +-
 hw/scsi-generic.c   |5 +-
 hw/scsi.h   |   10 +-
 hw/spapr_vscsi.c|   29 +-
 hw/usb-msd.c|9 +-
 hw/virtio-net.c |2 +-
 hw/virtio-serial-bus.c  |2 +-
 iov.c   |   49 +-
 iov.h   |   10 +-
 17 files changed, 3192 insertions(+), 83 deletions(-)
 create mode 100644 hw/megasas.c
 create mode 100644 hw/mfi.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] iov: Update parameter usage in iov_(to|from)_buf()

2011-07-01 Thread Hannes Reinecke
iov_to_buf() has an 'offset' parameter, iov_from_buf() hasn't.
This patch adds the missing parameter to iov_from_buf().
It also renames the 'offset' parameter to 'iov_off' to
emphasize it's the offset into the iovec and not the buffer.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 hw/virtio-net.c|2 +-
 hw/virtio-serial-bus.c |2 +-
 iov.c  |   49 ++-
 iov.h  |   10 
 4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 6997e02..a32cc01 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -657,7 +657,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 /* copy in packet.  ugh */
 len = iov_from_buf(sg, elem.in_num,
-   buf + offset, size - offset);
+   buf + offset, 0, size - offset);
 total += len;
 offset += len;
 /* If buffers can't be merged, at this point we
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 7f6db7b..53c58d0 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -103,7 +103,7 @@ static size_t write_to_port(VirtIOSerialPort *port,
 }
 
 len = iov_from_buf(elem.in_sg, elem.in_num,
-   buf + offset, size - offset);
+   buf + offset, 0, size - offset);
 offset += len;
 
 virtqueue_push(vq, elem, len);
diff --git a/iov.c b/iov.c
index 588cd04..1e02791 100644
--- a/iov.c
+++ b/iov.c
@@ -14,56 +14,61 @@
 
 #include iov.h
 
-size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
-const void *buf, size_t size)
+size_t iov_from_buf(struct iovec *iov, unsigned int iov_cnt,
+const void *buf, size_t iov_off, size_t size)
 {
-size_t offset;
+size_t iovec_off, buf_off;
 unsigned int i;
 
-offset = 0;
-for (i = 0; offset  size  i  iovcnt; i++) {
-size_t len;
+iovec_off = 0;
+buf_off = 0;
+for (i = 0; i  iov_cnt  size; i++) {
+if (iov_off  (iovec_off + iov[i].iov_len)) {
+size_t len = MIN((iovec_off + iov[i].iov_len) - iov_off, size);
 
-len = MIN(iov[i].iov_len, size - offset);
+memcpy(iov[i].iov_base + (iov_off - iovec_off), buf + buf_off, 
len);
 
-memcpy(iov[i].iov_base, buf + offset, len);
-offset += len;
+buf_off += len;
+iov_off += len;
+size -= len;
+}
+iovec_off += iov[i].iov_len;
 }
-return offset;
+return buf_off;
 }
 
-size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
-  void *buf, size_t offset, size_t size)
+size_t iov_to_buf(const struct iovec *iov, const unsigned int iov_cnt,
+  void *buf, size_t iov_off, size_t size)
 {
 uint8_t *ptr;
-size_t iov_off, buf_off;
+size_t iovec_off, buf_off;
 unsigned int i;
 
 ptr = buf;
-iov_off = 0;
+iovec_off = 0;
 buf_off = 0;
-for (i = 0; i  iovcnt  size; i++) {
-if (offset  (iov_off + iov[i].iov_len)) {
-size_t len = MIN((iov_off + iov[i].iov_len) - offset , size);
+for (i = 0; i  iov_cnt  size; i++) {
+if (iov_off  (iovec_off + iov[i].iov_len)) {
+size_t len = MIN((iovec_off + iov[i].iov_len) - iov_off , size);
 
-memcpy(ptr + buf_off, iov[i].iov_base + (offset - iov_off), len);
+memcpy(ptr + buf_off, iov[i].iov_base + (iov_off - iovec_off), 
len);
 
 buf_off += len;
-offset += len;
+iov_off += len;
 size -= len;
 }
-iov_off += iov[i].iov_len;
+iovec_off += iov[i].iov_len;
 }
 return buf_off;
 }
 
-size_t iov_size(const struct iovec *iov, const unsigned int iovcnt)
+size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt)
 {
 size_t len;
 unsigned int i;
 
 len = 0;
-for (i = 0; i  iovcnt; i++) {
+for (i = 0; i  iov_cnt; i++) {
 len += iov[i].iov_len;
 }
 return len;
diff --git a/iov.h b/iov.h
index 60a8547..110f67a 100644
--- a/iov.h
+++ b/iov.h
@@ -12,8 +12,8 @@
 
 #include qemu-common.h
 
-size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
-const void *buf, size_t size);
-size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
-  void *buf, size_t offset, size_t size);
-size_t iov_size(const struct iovec *iov, const unsigned int iovcnt);
+size_t iov_from_buf(struct iovec *iov, unsigned int iov_cnt,
+const void *buf, size_t iov_off, size_t size);
+size_t iov_to_buf(const struct iovec *iov, const unsigned int iov_cnt,
+  void *buf, size_t iov_off, size_t size);
+size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line 

[PATCH 2/3] scsi: Add 'hba_private' to SCSIRequest

2011-07-01 Thread Hannes Reinecke
'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.
'tag' is still being kept for tracing purposes.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 hw/esp.c  |2 +-
 hw/lsi53c895a.c   |   22 --
 hw/scsi-bus.c |9 ++---
 hw/scsi-disk.c|4 ++--
 hw/scsi-generic.c |5 +++--
 hw/scsi.h |   10 +++---
 hw/spapr_vscsi.c  |   29 +
 hw/usb-msd.c  |9 +
 8 files changed, 37 insertions(+), 53 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 6d3f5d2..aa87197 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t 
busid)
 
 DPRINTF(do_busid_cmd: busid 0x%x\n, busid);
 lun = busid  7;
-s-current_req = scsi_req_new(s-current_dev, 0, lun);
+s-current_req = scsi_req_new(s-current_dev, 0, lun, NULL);
 datalen = scsi_req_enqueue(s-current_req, buf);
 s-ti_size = datalen;
 if (datalen != 0) {
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 940b43a..69eec1d 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -661,7 +661,7 @@ static lsi_request *lsi_find_by_tag(LSIState *s, uint32_t 
tag)
 static void lsi_request_cancelled(SCSIRequest *req)
 {
 LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent);
-lsi_request *p;
+lsi_request *p = req-hba_private;
 
 if (s-current  req == s-current-req) {
 scsi_req_unref(req);
@@ -670,7 +670,6 @@ static void lsi_request_cancelled(SCSIRequest *req)
 return;
 }
 
-p = lsi_find_by_tag(s, req-tag);
 if (p) {
 QTAILQ_REMOVE(s-queue, p, next);
 scsi_req_unref(req);
@@ -680,18 +679,12 @@ static void lsi_request_cancelled(SCSIRequest *req)
 
 /* Record that data is available for a queued command.  Returns zero if
the device was reselected, nonzero if the IO is deferred.  */
-static int lsi_queue_tag(LSIState *s, uint32_t tag, uint32_t len)
+static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len)
 {
-lsi_request *p;
-
-p = lsi_find_by_tag(s, tag);
-if (!p) {
-BADF(IO with unknown tag %d\n, tag);
-return 1;
-}
+lsi_request *p = req-hba_private;
 
 if (p-pending) {
-BADF(Multiple IO pending for tag %d\n, tag);
+BADF(Multiple IO pending for request %p\n, p);
 }
 p-pending = len;
 /* Reselect if waiting for it, or if reselection triggers an IRQ
@@ -743,9 +736,9 @@ static void lsi_transfer_data(SCSIRequest *req, uint32_t 
len)
 LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent);
 int out;
 
-if (s-waiting == 1 || !s-current || req-tag != s-current-tag ||
+if (s-waiting == 1 || !s-current || req-hba_private != s-current ||
 (lsi_irq_on_rsl(s)  !(s-scntl1  LSI_SCNTL1_CON))) {
-if (lsi_queue_tag(s, req-tag, len)) {
+if (lsi_queue_req(s, req, len)) {
 return;
 }
 }
@@ -789,7 +782,8 @@ static void lsi_do_command(LSIState *s)
 assert(s-current == NULL);
 s-current = qemu_mallocz(sizeof(lsi_request));
 s-current-tag = s-select_tag;
-s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun);
+s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun,
+   s-current);
 
 n = scsi_req_enqueue(s-current-req, buf);
 if (n) {
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index ad6a730..8b1a412 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -131,7 +131,8 @@ int scsi_bus_legacy_handle_cmdline(SCSIBus *bus)
 return res;
 }
 
-SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t 
lun)
+SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag,
+uint32_t lun, void *hba_private)
 {
 SCSIRequest *req;
 
@@ -141,14 +142,16 @@ SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, 
uint32_t tag, uint32_t l
 req-dev = d;
 req-tag = tag;
 req-lun = lun;
+req-hba_private = hba_private;
 req-status = -1;
 trace_scsi_req_alloc(req-dev-id, req-lun, req-tag);
 return req;
 }
 
-SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun)
+SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun,
+  void *hba_private)
 {
-return d-info-alloc_req(d, tag, lun);
+return d-info-alloc_req(d, tag, lun, hba_private);
 }
 
 uint8_t *scsi_req_get_buf(SCSIRequest *req)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index a8c7372..c2a99fe 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -81,13 +81,13 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, 
int type);
 static int scsi_disk_emulate_command(SCSIDiskReq *r, uint8_t *outbuf);
 
 static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t 

Re: [PATCH 2/3] scsi: Add 'hba_private' to SCSIRequest

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 05:35 PM, Hannes Reinecke wrote:

'tag' is just an abstraction to identify the command
from the driver. So we should make that explicit by
replacing 'tag' with a driver-defined pointer 'hba_private'.
This saves the lookup for driver handling several commands
in parallel.
'tag' is still being kept for tracing purposes.

Signed-off-by: Hannes Reineckeh...@suse.de
---
  hw/esp.c  |2 +-
  hw/lsi53c895a.c   |   22 --
  hw/scsi-bus.c |9 ++---
  hw/scsi-disk.c|4 ++--
  hw/scsi-generic.c |5 +++--
  hw/scsi.h |   10 +++---
  hw/spapr_vscsi.c  |   29 +
  hw/usb-msd.c  |9 +
  8 files changed, 37 insertions(+), 53 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 6d3f5d2..aa87197 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t 
busid)

  DPRINTF(do_busid_cmd: busid 0x%x\n, busid);
  lun = busid  7;
-s-current_req = scsi_req_new(s-current_dev, 0, lun);
+s-current_req = scsi_req_new(s-current_dev, 0, lun, NULL);
  datalen = scsi_req_enqueue(s-current_req, buf);
  s-ti_size = datalen;
  if (datalen != 0) {
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 940b43a..69eec1d 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -661,7 +661,7 @@ static lsi_request *lsi_find_by_tag(LSIState *s, uint32_t 
tag)
  static void lsi_request_cancelled(SCSIRequest *req)
  {
  LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent);
-lsi_request *p;
+lsi_request *p = req-hba_private;

  if (s-current  req == s-current-req) {
  scsi_req_unref(req);
@@ -670,7 +670,6 @@ static void lsi_request_cancelled(SCSIRequest *req)
  return;
  }

-p = lsi_find_by_tag(s, req-tag);
  if (p) {
  QTAILQ_REMOVE(s-queue, p, next);
  scsi_req_unref(req);
@@ -680,18 +679,12 @@ static void lsi_request_cancelled(SCSIRequest *req)

  /* Record that data is available for a queued command.  Returns zero if
 the device was reselected, nonzero if the IO is deferred.  */
-static int lsi_queue_tag(LSIState *s, uint32_t tag, uint32_t len)
+static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len)
  {
-lsi_request *p;
-
-p = lsi_find_by_tag(s, tag);
-if (!p) {
-BADF(IO with unknown tag %d\n, tag);
-return 1;
-}
+lsi_request *p = req-hba_private;

  if (p-pending) {
-BADF(Multiple IO pending for tag %d\n, tag);
+BADF(Multiple IO pending for request %p\n, p);
  }
  p-pending = len;
  /* Reselect if waiting for it, or if reselection triggers an IRQ
@@ -743,9 +736,9 @@ static void lsi_transfer_data(SCSIRequest *req, uint32_t 
len)
  LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent);
  int out;

-if (s-waiting == 1 || !s-current || req-tag != s-current-tag ||
+if (s-waiting == 1 || !s-current || req-hba_private != s-current ||
  (lsi_irq_on_rsl(s)  !(s-scntl1  LSI_SCNTL1_CON))) {
-if (lsi_queue_tag(s, req-tag, len)) {
+if (lsi_queue_req(s, req, len)) {
  return;
  }
  }
@@ -789,7 +782,8 @@ static void lsi_do_command(LSIState *s)
  assert(s-current == NULL);
  s-current = qemu_mallocz(sizeof(lsi_request));
  s-current-tag = s-select_tag;
-s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun);
+s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun,
+   s-current);

  n = scsi_req_enqueue(s-current-req, buf);
  if (n) {
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index ad6a730..8b1a412 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -131,7 +131,8 @@ int scsi_bus_legacy_handle_cmdline(SCSIBus *bus)
  return res;
  }

-SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t 
lun)
+SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag,
+uint32_t lun, void *hba_private)
  {
  SCSIRequest *req;

@@ -141,14 +142,16 @@ SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, 
uint32_t tag, uint32_t l
  req-dev = d;
  req-tag = tag;
  req-lun = lun;
+req-hba_private = hba_private;
  req-status = -1;
  trace_scsi_req_alloc(req-dev-id, req-lun, req-tag);
  return req;
  }

-SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun)
+SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun,
+  void *hba_private)
  {
-return d-info-alloc_req(d, tag, lun);
+return d-info-alloc_req(d, tag, lun, hba_private);
  }

  uint8_t *scsi_req_get_buf(SCSIRequest *req)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index a8c7372..c2a99fe 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -81,13 +81,13 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, 
int type);
  static int 

[PATCH] virt: Add more flexible way to specify comm ports host - guest

2011-07-01 Thread Lucas Meneghel Rodrigues
When running the virt guest windows tests using the (now default)
autotest private bridge, noticed that some ports needed for host
and guest communication weren't specified. So, add a config file
knob to allow people to specify additional ports to be added to
the default firewall configuration. The config tracks some important
ports used on tests, such as the remote shell ports and remote
shell file transfer ports.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/tests_base.cfg.sample |3 ++
 client/virt/virt_test_setup.py |   47 +--
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 5313da1..1a86265 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -64,6 +64,9 @@ bridge = private
 # be a specific bridge
 # name, such as 'virbr0'
 #bridge = virbr0
+# If you need more ports to be available for comm between host and guest,
+# please add them here
+priv_bridge_ports = 53 67
 run_tcpdump = yes
 
 # Misc
diff --git a/client/virt/virt_test_setup.py b/client/virt/virt_test_setup.py
index 6e2d477..1539cac 100644
--- a/client/virt/virt_test_setup.py
+++ b/client/virt/virt_test_setup.py
@@ -308,21 +308,38 @@ class PrivateBridgeConfig(object):
 self.subnet = params.get(priv_subnet, '192.168.58')
 self.ip_version = params.get(bridge_ip_version, ipv4)
 self.dhcp_server_pid = None
-self.iptables_rules = [
-INPUT 1 -i %s -p udp -m udp --dport 53 -j ACCEPT % self.brname,
-INPUT 2 -i %s -p tcp -m tcp --dport 53 -j ACCEPT % self.brname,
-INPUT 3 -i %s -p udp -m udp --dport 67 -j ACCEPT % self.brname,
-INPUT 4 -i %s -p tcp -m tcp --dport 67 -j ACCEPT % self.brname,
-INPUT 5 -i %s -p tcp -m tcp --dport 12323 -j ACCEPT % 
self.brname,
-FORWARD 1 -m physdev --physdev-is-bridged -j ACCEPT,
-FORWARD 2 -d %s.0/24 -o %s -m state --state RELATED,ESTABLISHED 
--j ACCEPT % (self.subnet, self.brname),
-FORWARD 3 -s %s.0/24 -i %s -j ACCEPT % (self.subnet, 
self.brname),
-FORWARD 4 -i %s -o %s -j ACCEPT % (self.brname, self.brname),
-(FORWARD 5 -o %s -j REJECT --reject-with icmp-port-unreachable %
- self.brname),
-(FORWARD 6 -i %s -j REJECT --reject-with icmp-port-unreachable %
- self.brname)]
+ports = params.get(priv_bridge_ports, '53 67').split()
+s_port = params.get(guest_port_remote_shell, 10022)
+if s_port not in ports:
+ports.append(s_port)
+ft_port = params.get(guest_port_file_transfer, 10023)
+if ft_port not in ports:
+ports.append(ft_port)
+u_port = params.get(guest_port_unattended_install, 13323)
+if u_port not in ports:
+ports.append(u_port)
+self.iptables_rules = self._assemble_iptables_rules(ports)
+
+
+def _assemble_iptables_rules(self, port_list):
+rules = []
+index = 0
+for port in port_list:
+index += 1
+rules.append(INPUT %s -i %s -p tcp -m tcp --dport %s -j ACCEPT %
+ (index, self.brname, port))
+index += 1
+rules.append(INPUT %s -i %s -p udp -m udp --dport %s -j ACCEPT %
+ (index, self.brname, port))
+rules.append(FORWARD 1 -m physdev --physdev-is-bridged -j ACCEPT)
+rules.append(FORWARD 2 -d %s.0/24 -o %s -m state 
+ --state RELATED,ESTABLISHED -j ACCEPT %
+ (self.subnet, self.brname))
+rules.append(FORWARD 3 -s %s.0/24 -i %s -j ACCEPT %
+ (self.subnet, self.brname))
+rules.append(FORWARD 4 -i %s -o %s -j ACCEPT %
+ (self.brname, self.brname))
+return rules
 
 
 def _add_bridge(self):
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] iov: Update parameter usage in iov_(to|from)_buf()

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 17:35, Hannes Reinecke wrote:

 iov_to_buf() has an 'offset' parameter, iov_from_buf() hasn't.
 This patch adds the missing parameter to iov_from_buf().
 It also renames the 'offset' parameter to 'iov_off' to
 emphasize it's the offset into the iovec and not the buffer.
 
 Signed-off-by: Hannes Reinecke h...@suse.de

Acked-by: Alexander Graf ag...@suse.de

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 17:35, Hannes Reinecke wrote:

 This patch adds an emulation for the LSI Megaraid SAS 8708EM2 HBA.

Have you tried to execute the current version of megasas and actually do 
something with it? I just booted up openSUSE 11.4 rescue from DVD with a 
megasas adapter that contained a raw file backed by tmpfs. Creating a partition 
worked fine, but when running mkfs.ext3 and mounting afterwards, the mount 
fails saying there is no ext3 on the disk.

Sounds like data corruption to me :). I know that this used to work a while 
back, so it might be a regression recently?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/31] Implement user mode network for kvm tools

2011-07-01 Thread Stefan Hajnoczi
On Fri, Jul 1, 2011 at 12:38 AM, Asias He asias.he...@gmail.com wrote:
 On 06/30/2011 04:56 PM, Stefan Hajnoczi wrote:
 On Thu, Jun 30, 2011 at 9:40 AM, Asias He asias.he...@gmail.com wrote:
 uip stands for user mode {TCP,UDP}/IP. Currently, uip supports ARP, ICMP,
 IPV4, UDP, TCP. So any network protocols above UDP/TCP should work as well,
 e.g., HTTP, FTP, SSH, DNS.

 There is an existing uIP which might cause confusion, not sure if
 you've seen it.  First I thought you were using that :).

 I heard about uIP, but this patchset have nothing to do with uIP ;-)

 At first I was naming the user mode network as UNET which is User mode
 NETwork, however, I though uip looks better because it is shorter.

 Anyway, if uip do cause confusion. I'd like to change this naming.

It's up to you but now is the right time to do it.  Consider if
another program wants to reuse this code or if you ever want to make
it a library, it wouldn't help to have a confusing name.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster

2011-07-01 Thread Dan Carpenter
On Fri, Jul 01, 2011 at 07:31:54AM -0700, Dan Magenheimer wrote:
  Off by one errors are kind of insidious.  People cut and paste them
  and they spread.  If someone adds a new list of chunks then there
  are now two examples that are correct and two which have an extra
  element, so it's 50/50 that he'll copy the right one.
 
 True, but these are NOT off-by-one errors... they are
 correct-but-slightly-ugly code snippets.  (To clarify, I said
 the *ugliness* arose when debugging an off-by-one error.)
 

What I meant was the new arrays are *one* element too large.

 Patches always welcome, and I agree that these should be
 fixed eventually, assuming the code doesn't go away completely
 first.. I'm simply stating the position
 that going through another test/submit cycling to fix
 correct-but-slightly-ugly code which is present only to
 surface information for experiments is not high on my priority
 list right now... unless GregKH says he won't accept the patch.
  
  Btw, looking at it again, this seems like maybe a similar issue in
  zbud_evict_zbpg():
  
 516  for (i = 0; i  MAX_CHUNK; i++) {
 517  retry_unbud_list_i:
  
  
  MAX_CHUNKS is NCHUNKS - 1.  Shouldn't that be i  NCHUNKS so that we
  reach the last element in the list?
 
 No, the last element in that list is unused.  There is a comment
 to that effect someplace in the code.  (These lists are keeping
 track of pages with chunks of available space and the last
 entry would have no available space so is always empty.)

The comment says that the first element isn't used.  Perhaps the
comment is out of date and now it's the last element that isn't
used.  To me, it makes sense to have an unused first element, but it
doesn't make sense to have an unused last element.  Why not just
make the array smaller?

Also if the last element of the original arrays isn't used, then
does that mean the last *two* elements of the new arrays aren't
used?

Getting array sizes wrong is not a correct-but-slightly-ugly
thing.  *grumble* *grumble* *grumble*.  But it doesn't crash the
system so I'm fine with it going in as is...

 
 Thanks again for your interest... are you using zcache?

No.  I was just on the driver-devel list reviewing patches at
random.

regards,
dan carpenter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/17] Hypervisor-mode KVM on POWER7 and PPC970

2011-07-01 Thread Alexander Graf

On 29.06.2011, at 12:15, Paul Mackerras wrote:

 The first patch of the following series is a pure bug-fix for 32-bit
 kernels.
 
 The remainder of the following series of patches enable KVM to exploit
 the hardware hypervisor mode on 64-bit Power ISA Book3S machines.  At
 present, POWER7 and PPC970 processors are supported.  (Note that the
 PPC970 processors in Apple G5 machines don't have a usable hypervisor
 mode and are not supported by these patches.)
 
 Running the KVM host in hypervisor mode means that the guest can use
 both supervisor mode and user mode.  That means that the guest can
 execute supervisor-privilege instructions and access supervisor-
 privilege registers.  In addition the hardware directs most exceptions
 to the guest.  Thus we don't need to emulate any instructions in the
 host.  Generally, the only times we need to exit the guest are when it
 does a hypercall or when an external interrupt or host timer
 (decrementer) interrupt occurs.
 
 The focus of this KVM implementation is to run guests that use the
 PAPR (Power Architecture Platform Requirements) paravirtualization
 interface, which is the interface supplied by PowerVM on IBM pSeries
 machines.  Currently the pseries machine type in qemu is only
 supported by book3s_hv KVM, and book3s_hv KVM only supports the
 pseries machine type.  That will hopefully change in future.
 
 These patches are against the master branch of the kvm tree.

Something seems to be broken with signals. When running without io-thread, I 
can't even do ctrl-c on -nographic while the guest is in sleep mode. But that 
might not be related to your patches.

I've applied 01-16 now. Sending them through some more testing and if they're 
good, sending a pull request.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/17] KVM: PPC: Add support for Book3S processors in hypervisor mode

2011-07-01 Thread Dave Hansen
On Wed, 2011-06-29 at 20:21 +1000, Paul Mackerras wrote: 
 +struct kvmppc_pginfo {
 + unsigned long pfn;
 + atomic_t refcnt;
 +};

I only see this refcnt inc'd in one spot and never decremented or read.
Is the refcnt just the number of hptes we have for this particular page
at the moment?  

 +long kvmppc_alloc_hpt(struct kvm *kvm)
 +{
 + unsigned long hpt;
 + unsigned long lpid;
 +
 + hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
 +HPT_ORDER - PAGE_SHIFT);
 + if (!hpt) {
 + pr_err(kvm_alloc_hpt: Couldn't alloc HPT\n);
 + return -ENOMEM;
 + }
 + kvm-arch.hpt_virt = hpt;
 +
 + do {
 + lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS);
 + if (lpid = NR_LPIDS) {
 + pr_err(kvm_alloc_hpt: No LPIDs free\n);
 + free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
 + return -ENOMEM;
 + }
 + } while (test_and_set_bit(lpid, lpid_inuse));
 +
 + kvm-arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18);
 + kvm-arch.lpid = lpid;
 + kvm-arch.host_sdr1 = mfspr(SPRN_SDR1);
 + kvm-arch.host_lpid = mfspr(SPRN_LPID);
 + kvm-arch.host_lpcr = mfspr(SPRN_LPCR);
 +
 + pr_info(KVM guest htab at %lx, LPID %lx\n, hpt, lpid);
 + return 0;
 +}

 +static unsigned long user_page_size(unsigned long addr)
 +{
 + struct vm_area_struct *vma;
 + unsigned long size = PAGE_SIZE;
 +
 + down_read(current-mm-mmap_sem);
 + vma = find_vma(current-mm, addr);
 + if (vma)
 + size = vma_kernel_pagesize(vma);
 + up_read(current-mm-mmap_sem);
 + return size;
 +}

That one looks pretty arch-independent and like it could use some
consolidation with: virt/kvm/kvm_main.c::kvm_host_page_size()

 +void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region 
 *mem)
 +{
 + unsigned long i;
 + unsigned long npages = kvm-arch.ram_npages;
 + unsigned long pfn;
 + unsigned long *hpte;
 + unsigned long hash;
 + struct kvmppc_pginfo *pginfo = kvm-arch.ram_pginfo;
 +
 + if (!pginfo)
 + return;
 +
 + /* VRMA can't be  1TB */
 + if (npages  1ul  (40 - kvm-arch.ram_porder))
 + npages = 1ul  (40 - kvm-arch.ram_porder);

Is that because it can only be a single segment?  Does that mean that we
can't ever have guests larger than 1TB?  Or just that they have to live
with 1TB until they get their own page tables up?

 + /* Can't use more than 1 HPTE per HPTEG */
 + if (npages  HPT_NPTEG)
 + npages = HPT_NPTEG;
 +
 + for (i = 0; i  npages; ++i) {
 + pfn = pginfo[i].pfn;
 + /* can't use hpt_hash since va  64 bits */
 + hash = (i ^ (VRMA_VSID ^ (VRMA_VSID  25)))  HPT_HASH_MASK;

Is that because 'i' could potentially have a very large pfn?  Nish
thought it might have something to do with the hpte entries being larger
than 64-bits themselves with the vsid included, but we got thoroughly
confused. :)

-- Dave


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/17] KVM: PPC: Add support for Book3S processors in hypervisor mode

2011-07-01 Thread Alexander Graf

On 01.07.2011, at 20:37, Dave Hansen wrote:

 On Wed, 2011-06-29 at 20:21 +1000, Paul Mackerras wrote: 
 +struct kvmppc_pginfo {
 +unsigned long pfn;
 +atomic_t refcnt;
 +};
 
 I only see this refcnt inc'd in one spot and never decremented or read.
 Is the refcnt just the number of hptes we have for this particular page
 at the moment?  
 
 +long kvmppc_alloc_hpt(struct kvm *kvm)
 +{
 +unsigned long hpt;
 +unsigned long lpid;
 +
 +hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
 +   HPT_ORDER - PAGE_SHIFT);
 +if (!hpt) {
 +pr_err(kvm_alloc_hpt: Couldn't alloc HPT\n);
 +return -ENOMEM;
 +}
 +kvm-arch.hpt_virt = hpt;
 +
 +do {
 +lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS);
 +if (lpid = NR_LPIDS) {
 +pr_err(kvm_alloc_hpt: No LPIDs free\n);
 +free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
 +return -ENOMEM;
 +}
 +} while (test_and_set_bit(lpid, lpid_inuse));
 +
 +kvm-arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18);
 +kvm-arch.lpid = lpid;
 +kvm-arch.host_sdr1 = mfspr(SPRN_SDR1);
 +kvm-arch.host_lpid = mfspr(SPRN_LPID);
 +kvm-arch.host_lpcr = mfspr(SPRN_LPCR);
 +
 +pr_info(KVM guest htab at %lx, LPID %lx\n, hpt, lpid);
 +return 0;
 +}
 
 +static unsigned long user_page_size(unsigned long addr)
 +{
 +struct vm_area_struct *vma;
 +unsigned long size = PAGE_SIZE;
 +
 +down_read(current-mm-mmap_sem);
 +vma = find_vma(current-mm, addr);
 +if (vma)
 +size = vma_kernel_pagesize(vma);
 +up_read(current-mm-mmap_sem);
 +return size;
 +}
 
 That one looks pretty arch-independent and like it could use some
 consolidation with: virt/kvm/kvm_main.c::kvm_host_page_size()

Yep, I'd deem that a cleanup for later though. Good point however! We have 
similar code in e500 kvm today.

 
 +void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region 
 *mem)
 +{
 +unsigned long i;
 +unsigned long npages = kvm-arch.ram_npages;
 +unsigned long pfn;
 +unsigned long *hpte;
 +unsigned long hash;
 +struct kvmppc_pginfo *pginfo = kvm-arch.ram_pginfo;
 +
 +if (!pginfo)
 +return;
 +
 +/* VRMA can't be  1TB */
 +if (npages  1ul  (40 - kvm-arch.ram_porder))
 +npages = 1ul  (40 - kvm-arch.ram_porder);
 
 Is that because it can only be a single segment?  Does that mean that we
 can't ever have guests larger than 1TB?  Or just that they have to live
 with 1TB until they get their own page tables up?

The VRMA is only important in real mode, so this part looks good. The RMA is 
usually a lot smaller than 1TB ;).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] compat_ioctl: fix warning caused by qemu

2011-07-01 Thread Jens Axboe
On 2011-07-01 16:46, Arnd Bergmann wrote:
 Yes, that should be fine, unless Jens would like to see a different
 solution for the struct definitions, e.g. moving all of the floppy
 compat ioctl numbers to fd.h. I'm fine with it either way.

Looks OK to me, I've queued it up for 3.1 with your ack. Thanks
Johannes.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/31] Implement user mode network for kvm tools

2011-07-01 Thread Pekka Enberg
On Fri, Jul 1, 2011 at 7:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 12:38 AM, Asias He asias.he...@gmail.com wrote:
 On 06/30/2011 04:56 PM, Stefan Hajnoczi wrote:
 On Thu, Jun 30, 2011 at 9:40 AM, Asias He asias.he...@gmail.com wrote:
 uip stands for user mode {TCP,UDP}/IP. Currently, uip supports ARP, ICMP,
 IPV4, UDP, TCP. So any network protocols above UDP/TCP should work as well,
 e.g., HTTP, FTP, SSH, DNS.

 There is an existing uIP which might cause confusion, not sure if
 you've seen it.  First I thought you were using that :).

 I heard about uIP, but this patchset have nothing to do with uIP ;-)

 At first I was naming the user mode network as UNET which is User mode
 NETwork, however, I though uip looks better because it is shorter.

 Anyway, if uip do cause confusion. I'd like to change this naming.

 It's up to you but now is the right time to do it.  Consider if
 another program wants to reuse this code or if you ever want to make
 it a library, it wouldn't help to have a confusing name.

I don't care too much what we use as the namespace prefix but as a
directory name tools/kvm/uip is pretty meaningless. I'd just move the
code under tools/kvm/net to mirror what the kernel already has.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/9] Steal time series again

2011-07-01 Thread Glauber Costa
Here follows the fourth version of the steal time series.
Hope it is acceptable for all involved parties now. The main differences
from v3 are:

* The Changelogs seem to have been writen by an actual person now, not of a
monkey. Yet, I am the aforementioned person, so don't expect much.
* Forcing delayacct on the hypervisor side allow us to simplify the guest
code dramatically, since now we don't need to test for is_idle: if we're idle,
we won't have steal time and end of story.

Hope you enjoy.

Glauber Costa (8):
  KVM-HDR Add constant to represent KVM MSRs enabled bit
  KVM-HDR: KVM Steal time implementation
  KVM-HV: KVM Steal time implementation
  KVM-GST: Add a pv_ops stub for steal time
  add jump labels for ia64 paravirt
  KVM-GST: KVM Steal time accounting
  KVM-GST: adjust scheduler cpu power
  KVM-GST: KVM Steal time registration

Gleb Natapov (1):
  introduce kvm_read_guest_cached

 Documentation/kernel-parameters.txt   |4 ++
 Documentation/virtual/kvm/msr.txt |   35 ++
 arch/ia64/include/asm/paravirt.h  |4 ++
 arch/ia64/kernel/paravirt.c   |2 +
 arch/x86/Kconfig  |   12 +
 arch/x86/include/asm/kvm_host.h   |8 +++
 arch/x86/include/asm/kvm_para.h   |   15 ++
 arch/x86/include/asm/paravirt.h   |9 
 arch/x86/include/asm/paravirt_types.h |1 +
 arch/x86/kernel/kvm.c |   73 ++
 arch/x86/kernel/kvmclock.c|2 +
 arch/x86/kernel/paravirt.c|9 
 arch/x86/kvm/Kconfig  |1 +
 arch/x86/kvm/x86.c|   56 ++-
 include/linux/kvm_host.h  |2 +
 kernel/sched.c|   80 
 kernel/sched_features.h   |4 +-
 virt/kvm/kvm_main.c   |   20 
 18 files changed, 322 insertions(+), 15 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/9] KVM-HV: KVM Steal time implementation

2011-07-01 Thread Glauber Costa
To implement steal time, we need the hypervisor to pass the guest
information about how much time was spent running other processes
outside the VM, while the vcpu had meaningful work to do - halt
time does not count.

This information is acquired through the run_delay field of
delayacct/schedstats infrastructure, that counts time spent in a
runqueue but not running.

Steal time is a per-cpu information, so the traditional MSR-based
infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
memory area address containing information about steal time

This patch contains the hypervisor part of the steal time infrasructure,
and can be backported independently of the guest portion.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 arch/x86/include/asm/kvm_host.h |8 +
 arch/x86/include/asm/kvm_para.h |4 +++
 arch/x86/kvm/Kconfig|1 +
 arch/x86/kvm/x86.c  |   56 --
 4 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index da6bbee..9ba354d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -389,6 +389,14 @@ struct kvm_vcpu_arch {
unsigned int hw_tsc_khz;
unsigned int time_offset;
struct page *time_page;
+
+   struct {
+   u64 msr_val;
+   u64 last_steal;
+   struct gfn_to_hva_cache stime;
+   struct kvm_steal_time steal;
+   } st;
+
u64 last_guest_tsc;
u64 last_kernel_ns;
u64 last_tsc_nsec;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 65f8bb9..c484ba8 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -45,6 +45,10 @@ struct kvm_steal_time {
__u32 pad[12];
 };
 
+#define KVM_STEAL_ALIGNMENT_BITS 5
+#define KVM_STEAL_VALID_BITS ((-1ULL  (KVM_STEAL_ALIGNMENT_BITS + 1)))
+#define KVM_STEAL_RESERVED_MASK (((1  KVM_STEAL_ALIGNMENT_BITS) - 1 )  1)
+
 #define KVM_MAX_MMU_OP_BATCH   32
 
 #define KVM_ASYNC_PF_ENABLED   (1  0)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 50f6364..99c3f05 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_ASYNC_PF
select USER_RETURN_NOTIFIER
select KVM_MMIO
+   select TASK_DELAY_ACCT
---help---
  Support hosting fully virtualized guest machines using hardware
  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7167717..237bcdc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -808,12 +808,12 @@ EXPORT_SYMBOL_GPL(kvm_get_dr);
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN8
+#define KVM_SAVE_MSRS_BEGIN9
 static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
-   HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN,
+   HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_STAR,
 #ifdef CONFIG_X86_64
@@ -1491,6 +1491,27 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
}
 }
 
+static void record_steal_time(struct kvm_vcpu *vcpu)
+{
+   u64 delta;
+
+   if (!(vcpu-arch.st.msr_val  KVM_MSR_ENABLED))
+   return;
+
+   if (unlikely(kvm_read_guest_cached(vcpu-kvm, vcpu-arch.st.stime,
+   vcpu-arch.st.steal, sizeof(struct kvm_steal_time
+   return;
+
+   delta = current-sched_info.run_delay - vcpu-arch.st.last_steal;
+   vcpu-arch.st.last_steal = current-sched_info.run_delay;
+
+   vcpu-arch.st.steal.steal += delta;
+   vcpu-arch.st.steal.version += 2;
+
+   kvm_write_guest_cached(vcpu-kvm, vcpu-arch.st.stime,
+   vcpu-arch.st.steal, sizeof(struct kvm_steal_time));
+}
+
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
switch (msr) {
@@ -1573,6 +1594,28 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
if (kvm_pv_enable_async_pf(vcpu, data))
return 1;
break;
+   case MSR_KVM_STEAL_TIME:
+   vcpu-arch.st.msr_val = data;
+
+   if (!(data  KVM_MSR_ENABLED)) {
+   break;
+   }
+
+   if (unlikely(!sched_info_on()))
+   break;
+
+   if (data  KVM_STEAL_RESERVED_MASK)
+

[PATCH v4 2/9] KVM-HDR Add constant to represent KVM MSRs enabled bit

2011-07-01 Thread Glauber Costa
This patch is simple, put in a different commit so it can be more easily
shared between guest and hypervisor. It just defines a named constant
to indicate the enable bit for KVM-specific MSRs.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 arch/x86/include/asm/kvm_para.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index a427bf7..d6cd79b 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -30,6 +30,7 @@
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
+#define KVM_MSR_ENABLED 1
 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 8/9] KVM-GST: adjust scheduler cpu power

2011-07-01 Thread Glauber Costa
This patch makes update_rq_clock() aware of steal time.
The mechanism of operation is not different from irq_time,
and follows the same principles. This lives in a CONFIG
option itself, and can be compiled out independently of
the rest of steal time reporting. The effect of disabling it
is that the scheduler will still report steal time (that cannot be
disabled), but won't use this information for cpu power adjustments.

Everytime update_rq_clock_task() is invoked, we query information
about how much time was stolen since last call, and feed it into
sched_rt_avg_update().

Although steal time reporting in account_process_tick() keeps
track of the last time we read the steal clock, in prev_steal_time,
this patch do it independently using another field,
prev_steal_time_rq. This is because otherwise, information about time
accounted in update_process_tick() would never reach us in update_rq_clock().

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 arch/x86/Kconfig|   12 
 kernel/sched.c  |   47 +--
 kernel/sched_features.h |4 ++--
 3 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index da34972..b26f312 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -512,6 +512,18 @@ menuconfig PARAVIRT_GUEST
 
 if PARAVIRT_GUEST
 
+config PARAVIRT_TIME_ACCOUNTING
+   bool Paravirtual steal time accounting
+   select PARAVIRT
+   default n
+   ---help---
+ Select this option to enable fine granularity task steal time 
+ accounting. Time spent executing other tasks in parallel with
+ the current vCPU is discounted from the vCPU power. To account for
+ that, there can be a small performance impact.
+
+ If in doubt, say N here.
+
 source arch/x86/xen/Kconfig
 
 config KVM_CLOCK
diff --git a/kernel/sched.c b/kernel/sched.c
index 247dd51..c40b118 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -532,6 +532,9 @@ struct rq {
 #ifdef CONFIG_PARAVIRT
u64 prev_steal_time;
 #endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+   u64 prev_steal_time_rq;
+#endif
 
/* calc_load related fields */
unsigned long calc_load_update;
@@ -1971,8 +1974,14 @@ static inline u64 steal_ticks(u64 steal)
 
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
-   s64 irq_delta;
-
+/*
+ * In theory, the compile should just see 0 here, and optimize out the call
+ * to sched_rt_avg_update. But I don't trust it...
+ */
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+   s64 steal = 0, irq_delta = 0;
+#endif
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;
 
/*
@@ -1995,12 +2004,35 @@ static void update_rq_clock_task(struct rq *rq, s64 
delta)
 
rq-prev_irq_time += irq_delta;
delta -= irq_delta;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+   if (static_branch((paravirt_steal_rq_enabled))) {
+   u64 st;
+
+   steal = paravirt_steal_clock(cpu_of(rq));
+   steal -= rq-prev_steal_time_rq;
+
+   if (unlikely(steal  delta))
+   steal = delta;
+
+   st = steal_ticks(steal);
+   steal = st * TICK_NSEC;
+
+   rq-prev_steal_time_rq += steal;
+
+   delta -= steal;
+   }
+#endif
+
rq-clock_task += delta;
 
-   if (irq_delta  sched_feat(NONIRQ_POWER))
-   sched_rt_avg_update(rq, irq_delta);
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+   if ((irq_delta + steal)  sched_feat(NONTASK_POWER))
+   sched_rt_avg_update(rq, irq_delta + steal);
+#endif
 }
 
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
 static int irqtime_account_hi_update(void)
 {
struct cpu_usage_stat *cpustat = kstat_this_cpu.cpustat;
@@ -2035,12 +2067,7 @@ static int irqtime_account_si_update(void)
 
 #define sched_clock_irqtime(0)
 
-static void update_rq_clock_task(struct rq *rq, s64 delta)
-{
-   rq-clock_task += delta;
-}
-
-#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+#endif
 
 #include sched_idletask.c
 #include sched_fair.c
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index be40f73..ca3b025 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -61,9 +61,9 @@ SCHED_FEAT(LB_BIAS, 1)
 SCHED_FEAT(OWNER_SPIN, 1)
 
 /*
- * Decrement CPU power based on irq activity
+ * Decrement CPU power based on time not spent running tasks
  */
-SCHED_FEAT(NONIRQ_POWER, 1)
+SCHED_FEAT(NONTASK_POWER, 1)
 
 /*
  * Queue remote wakeups on the target CPU and process them
-- 
1.7.3.4

--
To unsubscribe from 

[PATCH v4 1/9] introduce kvm_read_guest_cached

2011-07-01 Thread Glauber Costa
From: Gleb Natapov g...@redhat.com

Introduce kvm_read_guest_cached() function in addition to write one we
already have.

[ by glauber: export function signature in kvm header ]

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Glauber Costa glom...@redhat.com
---
 include/linux/kvm_host.h |2 ++
 virt/kvm/kvm_main.c  |   20 
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 31ebb59..f7df0a3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -381,6 +381,8 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void 
*data, int offset,
 int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
  unsigned long len);
 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
+int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+  void *data, unsigned long len);
 int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
 int offset, int len);
 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 11d2783..d5ef9eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1418,6 +1418,26 @@ int kvm_write_guest_cached(struct kvm *kvm, struct 
gfn_to_hva_cache *ghc,
 }
 EXPORT_SYMBOL_GPL(kvm_write_guest_cached);
 
+int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+  void *data, unsigned long len)
+{
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   int r;
+
+   if (slots-generation != ghc-generation)
+   kvm_gfn_to_hva_cache_init(kvm, ghc, ghc-gpa);
+
+   if (kvm_is_error_hva(ghc-hva))
+   return -EFAULT;
+
+   r = __copy_from_user(data, (void __user *)ghc-hva, len);
+   if (r)
+   return -EFAULT;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_read_guest_cached);
+
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
 {
return kvm_write_guest_page(kvm, gfn, (const void *) empty_zero_page,
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 5/9] KVM-GST: Add a pv_ops stub for steal time

2011-07-01 Thread Glauber Costa
This patch adds a function pointer in one of the many paravirt_ops
structs, to allow guests to register a steal time function. Besides
a steal time function, we also declare two jump_labels. They will be
used to allow the steal time code to be easily bypassed when not
in use.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 arch/x86/include/asm/paravirt.h   |9 +
 arch/x86/include/asm/paravirt_types.h |1 +
 arch/x86/kernel/paravirt.c|9 +
 3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index ebbc4d8..a7d2db9 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -230,6 +230,15 @@ static inline unsigned long long paravirt_sched_clock(void)
return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
 }
 
+struct jump_label_key;
+extern struct jump_label_key paravirt_steal_enabled;
+extern struct jump_label_key paravirt_steal_rq_enabled;
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+}
+
 static inline unsigned long long paravirt_read_pmc(int counter)
 {
return PVOP_CALL1(u64, pv_cpu_ops.read_pmc, counter);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 8288509..2c76521 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -89,6 +89,7 @@ struct pv_lazy_ops {
 
 struct pv_time_ops {
unsigned long long (*sched_clock)(void);
+   unsigned long long (*steal_clock)(int cpu);
unsigned long (*get_tsc_khz)(void);
 };
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 869e1ae..613a793 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -202,6 +202,14 @@ static void native_flush_tlb_single(unsigned long addr)
__native_flush_tlb_single(addr);
 }
 
+struct jump_label_key paravirt_steal_enabled;
+struct jump_label_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
 /* These are in entry.S */
 extern void native_iret(void);
 extern void native_irq_enable_sysexit(void);
@@ -307,6 +315,7 @@ struct pv_init_ops pv_init_ops = {
 
 struct pv_time_ops pv_time_ops = {
.sched_clock = native_sched_clock,
+   .steal_clock = native_steal_clock,
 };
 
 struct pv_irq_ops pv_irq_ops = {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 6/9] add jump labels for ia64 paravirt

2011-07-01 Thread Glauber Costa
Since in a later patch I intend to call jump labels inside
CONFIG_PARAVIRT, IA64 would fail to compile if they are not
provided. This patch provides those jump labels for the IA64
architecture.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Isaku Yamahata yamah...@valinux.co.jp
CC: Eddie Dong eddie.d...@intel.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 arch/ia64/include/asm/paravirt.h |4 
 arch/ia64/kernel/paravirt.c  |2 ++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/include/asm/paravirt.h b/arch/ia64/include/asm/paravirt.h
index 2eb0a98..32551d3 100644
--- a/arch/ia64/include/asm/paravirt.h
+++ b/arch/ia64/include/asm/paravirt.h
@@ -281,6 +281,10 @@ paravirt_init_missing_ticks_accounting(int cpu)
pv_time_ops.init_missing_ticks_accounting(cpu);
 }
 
+struct jump_label_key;
+extern struct jump_label_key paravirt_steal_enabled;
+extern struct jump_label_key paravirt_steal_rq_enabled;
+
 static inline int
 paravirt_do_steal_accounting(unsigned long *new_itm)
 {
diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c
index a21d7bb..1008682 100644
--- a/arch/ia64/kernel/paravirt.c
+++ b/arch/ia64/kernel/paravirt.c
@@ -634,6 +634,8 @@ struct pv_irq_ops pv_irq_ops = {
  * pv_time_ops
  * time operations
  */
+struct jump_label_key paravirt_steal_enabled;
+struct jump_label_key paravirt_steal_rq_enabled;
 
 static int
 ia64_native_do_steal_accounting(unsigned long *new_itm)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 7/9] KVM-GST: KVM Steal time accounting

2011-07-01 Thread Glauber Costa
This patch accounts steal time time in account_process_tick.
If one or more tick is considered stolen in the current
accounting cycle, user/system accounting is skipped. Idle is fine,
since the hypervisor does not report steal time if the guest
is halted.

Accounting steal time from the core scheduler give us the
advantage of direct access to the runqueue data. In a later
opportunity, it can be used to tweak cpu power and make
the scheduler aware of the time it lost.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 kernel/sched.c |   33 +
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 3f2e502..247dd51 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -75,6 +75,7 @@
 #include asm/tlb.h
 #include asm/irq_regs.h
 #include asm/mutex.h
+#include asm/paravirt.h
 
 #include sched_cpupri.h
 #include workqueue_sched.h
@@ -528,6 +529,9 @@ struct rq {
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
u64 prev_irq_time;
 #endif
+#ifdef CONFIG_PARAVIRT
+   u64 prev_steal_time;
+#endif
 
/* calc_load related fields */
unsigned long calc_load_update;
@@ -1953,6 +1957,18 @@ void account_system_vtime(struct task_struct *curr)
 }
 EXPORT_SYMBOL_GPL(account_system_vtime);
 
+#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+#ifdef CONFIG_PARAVIRT
+static inline u64 steal_ticks(u64 steal)
+{
+   if (unlikely(steal  NSEC_PER_SEC))
+   return div_u64(steal, TICK_NSEC);
+
+   return __iter_div_u64_rem(steal, TICK_NSEC, steal);
+}
+#endif
+
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
s64 irq_delta;
@@ -3929,6 +3945,23 @@ void account_process_tick(struct task_struct *p, int 
user_tick)
return;
}
 
+#ifdef CONFIG_PARAVIRT
+   if (static_branch(paravirt_steal_enabled)) {
+   u64 steal, st = 0;
+
+   steal = paravirt_steal_clock(smp_processor_id());
+   steal -= this_rq()-prev_steal_time;
+
+   st = steal_ticks(steal);
+   this_rq()-prev_steal_time += st * TICK_NSEC;
+
+   if (st) {
+   account_steal_time(st);
+   return;
+   }
+   }
+#endif
+
if (user_tick)
account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
else if ((p != rq-idle) || (irq_count() != HARDIRQ_OFFSET))
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 9/9] KVM-GST: KVM Steal time registration

2011-07-01 Thread Glauber Costa
This patch implements the kvm bits of the steal time infrastructure.
The most important part of it, is the steal time clock. It is an
continuous clock that shows the accumulated amount of steal time
since vcpu creation. It is supposed to survive cpu offlining/onlining.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 Documentation/kernel-parameters.txt |4 ++
 arch/x86/include/asm/kvm_para.h |1 +
 arch/x86/kernel/kvm.c   |   73 +++
 arch/x86/kernel/kvmclock.c  |2 +
 4 files changed, 80 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index fd248a31..a722574 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1737,6 +1737,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
no-kvmapf   [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
 
+   no-steal-acc[X86,KVM] Disable paravirtualized steal time accounting.
+   steal time is computed, but won't influence scheduler
+   behaviour
+
nolapic [X86-32,APIC] Do not enable or use the local APIC.
 
nolapic_timer   [X86-32,APIC] Do not use the local APIC timer.
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index c484ba8..35d732d 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -94,6 +94,7 @@ struct kvm_vcpu_pv_apf_data {
 
 extern void kvmclock_init(void);
 extern int kvm_register_clock(char *txt);
+extern void kvm_disable_steal_time(void);
 
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 33c07b0..58331c2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -51,6 +51,15 @@ static int parse_no_kvmapf(char *arg)
 
 early_param(no-kvmapf, parse_no_kvmapf);
 
+static int steal_acc = 1;
+static int parse_no_stealacc(char *arg)
+{
+steal_acc = 0;
+return 0;
+}
+
+early_param(no-steal-acc, parse_no_stealacc);
+
 struct kvm_para_state {
u8 mmu_queue[MMU_QUEUE_SIZE];
int mmu_queue_len;
@@ -58,6 +67,8 @@ struct kvm_para_state {
 
 static DEFINE_PER_CPU(struct kvm_para_state, para_state);
 static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
+static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
+static int has_steal_clock = 0;
 
 static struct kvm_para_state *kvm_para_state(void)
 {
@@ -441,6 +452,21 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+static void kvm_register_steal_time(void)
+{
+   int cpu = smp_processor_id();
+   struct kvm_steal_time *st = per_cpu(steal_time, cpu);
+
+   if (!has_steal_clock)
+   return;
+
+   memset(st, 0, sizeof(*st));
+
+   wrmsrl(MSR_KVM_STEAL_TIME, (__pa(st) | KVM_MSR_ENABLED));
+   printk(KERN_INFO kvm-stealtime: cpu %d, msr %lx\n,
+   cpu, __pa(st));
+}
+
 void __cpuinit kvm_guest_cpu_init(void)
 {
if (!kvm_para_available())
@@ -457,6 +483,9 @@ void __cpuinit kvm_guest_cpu_init(void)
printk(KERN_INFOKVM setup async PF for cpu %d\n,
   smp_processor_id());
}
+
+   if (has_steal_clock)
+   kvm_register_steal_time();
 }
 
 static void kvm_pv_disable_apf(void *unused)
@@ -483,6 +512,31 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
+static u64 kvm_steal_clock(int cpu)
+{
+   u64 steal;
+   struct kvm_steal_time *src;
+   int version;
+
+   src = per_cpu(steal_time, cpu);
+   do {
+   version = src-version;
+   rmb();
+   steal = src-steal;
+   rmb();
+   } while ((version  1) || (version != src-version));
+
+   return steal;
+}
+
+void kvm_disable_steal_time(void)
+{
+   if (!has_steal_clock)
+   return;
+
+   wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -500,6 +554,7 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
 
 static void kvm_guest_cpu_offline(void *dummy)
 {
+   kvm_disable_steal_time();
kvm_pv_disable_apf(NULL);
apf_task_wake_all();
 }
@@ -548,6 +603,11 @@ void __init kvm_guest_init(void)
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
x86_init.irqs.trap_init = kvm_apf_trap_init;
 
+   if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
+   has_steal_clock = 1;
+   pv_time_ops.steal_clock = 

[PATCH v4 3/9] KVM-HDR: KVM Steal time implementation

2011-07-01 Thread Glauber Costa
To implement steal time, we need the hypervisor to pass the guest information
about how much time was spent running other processes outside the VM.
This is per-vcpu, and using the kvmclock structure for that is an abuse
we decided not to make.

In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that
holds the memory area address containing information about steal time

This patch contains the headers for it. I am keeping it separate to facilitate
backports to people who wants to backport the kernel part but not the
hypervisor, or the other way around.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Rik van Riel r...@redhat.com
CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
CC: Peter Zijlstra pet...@infradead.org
CC: Avi Kivity a...@redhat.com
CC: Anthony Liguori aligu...@us.ibm.com
CC: Eric B Munson emun...@mgebm.net
---
 Documentation/virtual/kvm/msr.txt |   35 +++
 arch/x86/include/asm/kvm_para.h   |9 +
 2 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/msr.txt 
b/Documentation/virtual/kvm/msr.txt
index d079aed..38db3f8 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -185,3 +185,38 @@ MSR_KVM_ASYNC_PF_EN: 0x4b564d02
 
Currently type 2 APF will be always delivered on the same vcpu as
type 1 was, but guest should not rely on that.
+
+MSR_KVM_STEAL_TIME: 0x4b564d03
+
+   data: 64-byte alignment physical address of a memory area which must be
+   in guest RAM, plus an enable bit in bit 0. This memory is expected to
+   hold a copy of the following structure:
+
+   struct kvm_steal_time {
+   __u64 steal;
+   __u32 version;
+   __u32 flags;
+   __u32 pad[12];
+   }
+
+   whose data will be filled in by the hypervisor periodically. Only one
+   write, or registration, is needed for each VCPU. The interval between
+   updates of this structure is arbitrary and implementation-dependent.
+   The hypervisor may update this structure at any time it sees fit until
+   anything with bit0 == 0 is written to it. Guest is required to make sure
+   this structure is initialized to zero.
+
+   Fields have the following meanings:
+
+   version: a sequence counter. In other words, guest has to check
+   this field before and after grabbing time information and make 
+   sure they are both equal and even. An odd version indicates an
+   in-progress update.
+
+   flags: At this point, always zero. May be used to indicate
+   changes in this structure in the future.
+
+   steal: the amount of time in which this vCPU did not run, in
+   nanoseconds. Time during which the vcpu is idle, will not be
+   reported as steal time.
+
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index d6cd79b..65f8bb9 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,7 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 #define KVM_FEATURE_ASYNC_PF   4
+#define KVM_FEATURE_STEAL_TIME 5
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -35,6 +36,14 @@
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
+#define MSR_KVM_STEAL_TIME  0x4b564d03
+
+struct kvm_steal_time {
+   __u64 steal;
+   __u32 version;
+   __u32 flags;
+   __u32 pad[12];
+};
 
 #define KVM_MAX_MMU_OP_BATCH   32
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/8] kvm tools: Don't dynamically allocate threadpool jobs

2011-07-01 Thread Sasha Levin
To allow efficient use of shorter-term threadpool jobs, don't
allocate them dynamically upon creation. Instead, store them
within 'job' structures.

This will prevent some overhead creating/destroying jobs which live
for a short time.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/include/kvm/threadpool.h |   29 ++---
 tools/kvm/include/kvm/virtio-9p.h  |3 ++-
 tools/kvm/threadpool.c |   30 ++
 tools/kvm/virtio/9p.c  |7 +++
 tools/kvm/virtio/blk.c |8 
 tools/kvm/virtio/console.c |   10 +-
 tools/kvm/virtio/rng.c |   16 
 7 files changed, 50 insertions(+), 53 deletions(-)

diff --git a/tools/kvm/include/kvm/threadpool.h 
b/tools/kvm/include/kvm/threadpool.h
index 62826a6..768239f 100644
--- a/tools/kvm/include/kvm/threadpool.h
+++ b/tools/kvm/include/kvm/threadpool.h
@@ -1,14 +1,37 @@
 #ifndef KVM__THREADPOOL_H
 #define KVM__THREADPOOL_H
 
+#include kvm/mutex.h
+
+#include linux/list.h
+
 struct kvm;
 
 typedef void (*kvm_thread_callback_fn_t)(struct kvm *kvm, void *data);
 
-int thread_pool__init(unsigned long thread_count);
+struct thread_pool__job {
+   kvm_thread_callback_fn_tcallback;
+   struct kvm  *kvm;
+   void*data;
+
+   int signalcount;
+   pthread_mutex_t mutex;
 
-void *thread_pool__add_job(struct kvm *kvm, kvm_thread_callback_fn_t callback, 
void *data);
+   struct list_headqueue;
+};
+
+static inline void thread_pool__init_job(struct thread_pool__job *job, struct 
kvm *kvm, kvm_thread_callback_fn_t callback, void *data)
+{
+   *job = (struct thread_pool__job) {
+   .kvm= kvm,
+   .callback   = callback,
+   .data   = data,
+   .mutex  = PTHREAD_MUTEX_INITIALIZER,
+   };
+}
+
+int thread_pool__init(unsigned long thread_count);
 
-void thread_pool__do_job(void *job);
+void thread_pool__do_job(struct thread_pool__job *job);
 
 #endif
diff --git a/tools/kvm/include/kvm/virtio-9p.h 
b/tools/kvm/include/kvm/virtio-9p.h
index eb546bb..b9c10de 100644
--- a/tools/kvm/include/kvm/virtio-9p.h
+++ b/tools/kvm/include/kvm/virtio-9p.h
@@ -2,6 +2,7 @@
 #define KVM__VIRTIO_9P_H
 #include kvm/virtio.h
 #include kvm/pci.h
+#include kvm/threadpool.h
 
 #include sys/types.h
 #include dirent.h
@@ -34,7 +35,7 @@ struct p9_fid {
 struct p9_dev_job {
struct virt_queue   *vq;
struct p9_dev   *p9dev;
-   void*job_id;
+   struct thread_pool__job job_id;
 };
 
 struct p9_dev {
diff --git a/tools/kvm/threadpool.c b/tools/kvm/threadpool.c
index 2db02184..fdc5fa7 100644
--- a/tools/kvm/threadpool.c
+++ b/tools/kvm/threadpool.c
@@ -6,17 +6,6 @@
 #include pthread.h
 #include stdbool.h
 
-struct thread_pool__job {
-   kvm_thread_callback_fn_tcallback;
-   struct kvm  *kvm;
-   void*data;
-
-   int signalcount;
-   pthread_mutex_t mutex;
-
-   struct list_headqueue;
-};
-
 static pthread_mutex_t job_mutex   = PTHREAD_MUTEX_INITIALIZER;
 static pthread_mutex_t thread_mutex= PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t  job_cond= PTHREAD_COND_INITIALIZER;
@@ -139,26 +128,11 @@ int thread_pool__init(unsigned long thread_count)
return i;
 }
 
-void *thread_pool__add_job(struct kvm *kvm,
-  kvm_thread_callback_fn_t callback, void *data)
-{
-   struct thread_pool__job *job = calloc(1, sizeof(*job));
-
-   *job = (struct thread_pool__job) {
-   .kvm= kvm,
-   .data   = data,
-   .callback   = callback,
-   .mutex  = PTHREAD_MUTEX_INITIALIZER
-   };
-
-   return job;
-}
-
-void thread_pool__do_job(void *job)
+void thread_pool__do_job(struct thread_pool__job *job)
 {
struct thread_pool__job *jobinfo = job;
 
-   if (jobinfo == NULL)
+   if (jobinfo == NULL || jobinfo-callback == NULL)
return;
 
mutex_lock(jobinfo-mutex);
diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index 69e534f..d927688 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -18,7 +18,6 @@
 #include linux/virtio_9p.h
 #include net/9p/9p.h
 
-
 /* Warning: Immediately use value returned from this function */
 static const char *rel_to_abs(struct p9_dev *p9dev,
  const char *path, char *abs_path)
@@ -659,7 +658,7 @@ static void ioevent_callback(struct kvm *kvm, void *param)
 {
struct p9_dev_job *job = param;
 
-   thread_pool__do_job(job-job_id);
+   thread_pool__do_job(job-job_id);
 }
 
 static bool 

[PATCH v2 2/8] kvm tools: Process virtio-blk requests in parallel

2011-07-01 Thread Sasha Levin
Process multiple requests within a virtio-blk device's vring
in parallel.

Doing so may improve performance in cases when a request which can
be completed using data which is present in a cache is queued after
a request with un-cached data.

bonnie++ benchmarks have shown a 6% improvement with reads, and 2%
improvement in writes.

Suggested-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/virtio/blk.c |   74 ---
 1 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index 1fdfc1e..f2a728c 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -31,6 +31,8 @@
 struct blk_dev_job {
struct virt_queue   *vq;
struct blk_dev  *bdev;
+   struct ioveciov[VIRTIO_BLK_QUEUE_SIZE];
+   u16 out, in, head;
struct thread_pool__job job_id;
 };
 
@@ -51,7 +53,8 @@ struct blk_dev {
u16 queue_selector;
 
struct virt_queue   vqs[NUM_VIRT_QUEUES];
-   struct blk_dev_job  jobs[NUM_VIRT_QUEUES];
+   struct blk_dev_job  jobs[VIRTIO_BLK_QUEUE_SIZE];
+   u16 job_idx;
struct pci_device_headerpci_hdr;
 };
 
@@ -118,20 +121,26 @@ static bool virtio_blk_pci_io_in(struct ioport *ioport, 
struct kvm *kvm, u16 por
return ret;
 }
 
-static bool virtio_blk_do_io_request(struct kvm *kvm,
-   struct blk_dev *bdev,
-   struct virt_queue *queue)
+static void virtio_blk_do_io_request(struct kvm *kvm, void *param)
 {
-   struct iovec iov[VIRTIO_BLK_QUEUE_SIZE];
struct virtio_blk_outhdr *req;
-   ssize_t block_cnt = -1;
-   u16 out, in, head;
u8 *status;
+   ssize_t block_cnt;
+   struct blk_dev_job *job;
+   struct blk_dev *bdev;
+   struct virt_queue *queue;
+   struct iovec *iov;
+   u16 out, in, head;
 
-   head= virt_queue__get_iov(queue, iov, out, in, 
kvm);
-
-   /* head */
-   req = iov[0].iov_base;
+   block_cnt   = -1;
+   job = param;
+   bdev= job-bdev;
+   queue   = job-vq;
+   iov = job-iov;
+   out = job-out;
+   in  = job-in;
+   head= job-head;
+   req = iov[0].iov_base;
 
switch (req-type) {
case VIRTIO_BLK_T_IN:
@@ -153,24 +162,27 @@ static bool virtio_blk_do_io_request(struct kvm *kvm,
status  = iov[out + in - 1].iov_base;
*status = (block_cnt  0) ? VIRTIO_BLK_S_IOERR : 
VIRTIO_BLK_S_OK;
 
+   mutex_lock(bdev-mutex);
virt_queue__set_used_elem(queue, head, block_cnt);
+   mutex_unlock(bdev-mutex);
 
-   return true;
+   virt_queue__trigger_irq(queue, bdev-pci_hdr.irq_line, bdev-isr, kvm);
 }
 
-static void virtio_blk_do_io(struct kvm *kvm, void *param)
+static void virtio_blk_do_io(struct kvm *kvm, struct virt_queue *vq, struct 
blk_dev *bdev)
 {
-   struct blk_dev_job *job = param;
-   struct virt_queue *vq;
-   struct blk_dev *bdev;
+   while (virt_queue__available(vq)) {
+   struct blk_dev_job *job = bdev-jobs[bdev-job_idx++ % 
VIRTIO_BLK_QUEUE_SIZE];
 
-   vq  = job-vq;
-   bdev= job-bdev;
-
-   while (virt_queue__available(vq))
-   virtio_blk_do_io_request(kvm, bdev, vq);
+   *job= (struct blk_dev_job) {
+   .vq = vq,
+   .bdev   = bdev,
+   };
+   job-head = virt_queue__get_iov(vq, job-iov, job-out, 
job-in, kvm);
 
-   virt_queue__trigger_irq(vq, bdev-pci_hdr.irq_line, bdev-isr, kvm);
+   thread_pool__init_job(job-job_id, kvm, 
virtio_blk_do_io_request, job);
+   thread_pool__do_job(job-job_id);
+   }
 }
 
 static bool virtio_blk_pci_io_out(struct ioport *ioport, struct kvm *kvm, u16 
port, void *data, int size, u32 count)
@@ -190,24 +202,14 @@ static bool virtio_blk_pci_io_out(struct ioport *ioport, 
struct kvm *kvm, u16 po
break;
case VIRTIO_PCI_QUEUE_PFN: {
struct virt_queue *queue;
-   struct blk_dev_job *job;
void *p;
 
-   job = bdev-jobs[bdev-queue_selector];
-
queue   = bdev-vqs[bdev-queue_selector];
queue-pfn  = ioport__read32(data);
p   = guest_pfn_to_host(kvm, queue-pfn);
 
vring_init(queue-vring, VIRTIO_BLK_QUEUE_SIZE, p, 
VIRTIO_PCI_VRING_ALIGN);
 
-

[PATCH v2 3/8] kvm tools: Allow giving instance names

2011-07-01 Thread Sasha Levin
This will allow tracking instance names and sending commands
to specific instances if multiple instances are running.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/include/kvm/kvm.h |5 +++-
 tools/kvm/kvm-run.c |5 +++-
 tools/kvm/kvm.c |   56 ++-
 tools/kvm/term.c|3 ++
 4 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 7d90d35..5ad3236 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -41,9 +41,11 @@ struct kvm {
const char  *vmlinux;
struct disk_image   **disks;
int nr_disks;
+
+   const char  *name;
 };
 
-struct kvm *kvm__init(const char *kvm_dev, u64 ram_size);
+struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name);
 int kvm__max_cpus(struct kvm *kvm);
 void kvm__init_ram(struct kvm *kvm);
 void kvm__delete(struct kvm *kvm);
@@ -61,6 +63,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
 void kvm__pause(void);
 void kvm__continue(void);
 void kvm__notify_paused(void);
+int kvm__get_pid_by_instance(const char *name);
 
 /*
  * Debugging
diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c
index efae3c0..56c39ab 100644
--- a/tools/kvm/kvm-run.c
+++ b/tools/kvm/kvm-run.c
@@ -69,6 +69,7 @@ static const char *network;
 static const char *host_ip_addr;
 static const char *guest_mac;
 static const char *script;
+static const char *guest_name;
 static bool single_step;
 static bool readonly_image[MAX_DISK_IMAGES];
 static bool vnc;
@@ -132,6 +133,8 @@ static int virtio_9p_rootdir_parser(const struct option 
*opt, const char *arg, i
 
 static const struct option options[] = {
OPT_GROUP(Basic options:),
+   OPT_STRING('\0', name, guest_name, guest name,
+   A name for the guest),
OPT_INTEGER('c', cpus, nrcpus, Number of CPUs),
OPT_U64('m', mem, ram_size, Virtual machine memory size in MiB.),
OPT_CALLBACK('d', disk, NULL, image, Disk image, img_name_parser),
@@ -546,7 +549,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
term_init();
 
-   kvm = kvm__init(kvm_dev, ram_size);
+   kvm = kvm__init(kvm_dev, ram_size, guest_name);
 
ioeventfd__init();
 
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index c400c70..23d31a3 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -31,6 +31,7 @@
 #include asm/unistd.h
 
 #define DEFINE_KVM_EXIT_REASON(reason) [reason] = #reason
+#define KVM_PID_FILE_PATH ~/.kvm-tools/
 
 const char *kvm_exit_reasons[] = {
DEFINE_KVM_EXIT_REASON(KVM_EXIT_UNKNOWN),
@@ -113,11 +114,60 @@ static struct kvm *kvm__new(void)
return kvm;
 }
 
+static void kvm__create_pidfile(struct kvm *kvm)
+{
+   int fd;
+   char full_name[PATH_MAX], pid[10];
+
+   if (!kvm-name)
+   return;
+
+   mkdir(KVM_PID_FILE_PATH, 0777);
+   sprintf(full_name, %s/%s.pid, KVM_PID_FILE_PATH, kvm-name);
+   fd = open(full_name, O_CREAT | O_WRONLY, 0666);
+   sprintf(pid, %u\n, getpid());
+   if (write(fd, pid, strlen(pid)) = 0)
+   die(Failed creating PID file);
+   close(fd);
+}
+
+static void kvm__remove_pidfile(struct kvm *kvm)
+{
+   char full_name[PATH_MAX];
+
+   if (!kvm-name)
+   return;
+
+   sprintf(full_name, %s/%s.pid, KVM_PID_FILE_PATH, kvm-name);
+   unlink(full_name);
+}
+
+int kvm__get_pid_by_instance(const char *name)
+{
+   int fd, pid;
+   char pid_str[10], pid_file[PATH_MAX];
+
+   sprintf(pid_file, %s/%s.pid, KVM_PID_FILE_PATH, name);
+   fd = open(pid_file, O_RDONLY);
+   if (fd  0)
+   return -1;
+
+   if (read(fd, pid_str, 10) == 0)
+   return -1;
+
+   pid = atoi(pid_str);
+   if (pid  0)
+   return -1;
+
+   return pid;
+}
+
 void kvm__delete(struct kvm *kvm)
 {
kvm__stop_timer(kvm);
 
munmap(kvm-ram_start, kvm-ram_size);
+   kvm__remove_pidfile(kvm);
free(kvm);
 }
 
@@ -237,7 +287,7 @@ int kvm__max_cpus(struct kvm *kvm)
return ret;
 }
 
-struct kvm *kvm__init(const char *kvm_dev, u64 ram_size)
+struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name)
 {
struct kvm_pit_config pit_config = { .flags = 0, };
struct kvm *kvm;
@@ -300,6 +350,10 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size)
if (ret  0)
die_perror(KVM_CREATE_IRQCHIP ioctl);
 
+   kvm-name = name;
+
+   kvm__create_pidfile(kvm);
+
return kvm;
 }
 
diff --git a/tools/kvm/term.c b/tools/kvm/term.c
index 9947223..a0cb03f 100644
--- a/tools/kvm/term.c
+++ b/tools/kvm/term.c
@@ -9,7 +9,9 @@
 #include kvm/read-write.h
 #include kvm/term.h
 #include kvm/util.h
+#include kvm/kvm.h
 
+extern struct kvm *kvm;
 static struct termios  

[PATCH v2 5/8] kvm tools: Provide instance name when running 'kvm pause'

2011-07-01 Thread Sasha Levin
Instead of sending a signal to the first instance found, send it
to a specific instance.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/kvm-pause.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/kvm-pause.c b/tools/kvm/kvm-pause.c
index fdf8714..0cb6f29 100644
--- a/tools/kvm/kvm-pause.c
+++ b/tools/kvm/kvm-pause.c
@@ -5,9 +5,18 @@
 #include kvm/util.h
 #include kvm/kvm-cmd.h
 #include kvm/kvm-pause.h
+#include kvm/kvm.h
 
 int kvm_cmd_pause(int argc, const char **argv, const char *prefix)
 {
-   signal(SIGUSR2, SIG_IGN);
-   return system(kill -USR2 $(pidof kvm));
+   int pid;
+
+   if (argc != 1)
+   die(Usage: kvm debug [instance name]\n);
+
+   pid = kvm__get_pid_by_instance(argv[0]);
+   if (pid  0)
+   die(Failed locating instance name);
+
+   return kill(pid, SIGUSR2);
 }
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/8] kvm tools: Provide instance name when running 'kvm debug'

2011-07-01 Thread Sasha Levin
Instead of sending a signal to the first instance found, send it
to a specific instance.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/kvm-debug.c |   19 +++
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/tools/kvm/kvm-debug.c b/tools/kvm/kvm-debug.c
index 58782dd..432ae84 100644
--- a/tools/kvm/kvm-debug.c
+++ b/tools/kvm/kvm-debug.c
@@ -1,11 +1,22 @@
-#include stdio.h
-#include string.h
-
 #include kvm/util.h
 #include kvm/kvm-cmd.h
 #include kvm/kvm-debug.h
+#include kvm/kvm.h
+
+#include stdio.h
+#include string.h
+#include signal.h
 
 int kvm_cmd_debug(int argc, const char **argv, const char *prefix)
 {
-   return system(kill -3 $(pidof kvm));
+   int pid;
+
+   if (argc != 1)
+   die(Usage: kvm debug [instance name]\n);
+
+   pid = kvm__get_pid_by_instance(argv[0]);
+   if (pid  0)
+   die(Failed locating instance name);
+
+   return kill(pid, SIGQUIT);
 }
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/8] kvm tools: Add virtio-balloon device

2011-07-01 Thread Sasha Levin
From the virtio spec:

The virtio memory balloon device is a primitive device for managing guest
memory: the device asks for a certain amount of memory, and the guest supplies
it (or withdraws it, if the device has more than it asks for). This allows the
guest to adapt to changes in allowance of underlying physical memory.

To activate the virtio-balloon device run kvm tools with the '--balloon'
command line parameter.

Current implementation listens for two signals:

 - SIGKVMADDMEM: Adds 1M to the balloon driver (inflate). This will decrease
available memory within the guest.
 - SIGKVMDELMEM: Remove 1M from the balloon driver (deflate). This will
increase available memory within the guest.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/Makefile |1 +
 tools/kvm/include/kvm/kvm.h|3 +
 tools/kvm/include/kvm/virtio-balloon.h |8 +
 tools/kvm/include/kvm/virtio-pci-dev.h |1 +
 tools/kvm/kvm-run.c|6 +
 tools/kvm/virtio/balloon.c |  265 
 6 files changed, 284 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/include/kvm/virtio-balloon.h
 create mode 100644 tools/kvm/virtio/balloon.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 6d6a0a4..1ec75da 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -40,6 +40,7 @@ OBJS  += virtio/console.o
 OBJS   += virtio/core.o
 OBJS   += virtio/net.o
 OBJS   += virtio/rng.o
+OBJS+= virtio/balloon.o
 OBJS   += disk/blk.o
 OBJS   += disk/qcow.o
 OBJS   += disk/raw.o
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 5ad3236..1fdfcf7 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -6,6 +6,7 @@
 #include stdbool.h
 #include linux/types.h
 #include time.h
+#include signal.h
 
 #define KVM_NR_CPUS(255)
 
@@ -17,6 +18,8 @@
 
 #define SIGKVMEXIT (SIGRTMIN + 0)
 #define SIGKVMPAUSE(SIGRTMIN + 1)
+#define SIGKVMADDMEM   (SIGRTMIN + 2)
+#define SIGKVMDELMEM   (SIGRTMIN + 3)
 
 struct kvm {
int sys_fd; /* For system ioctls(), i.e. 
/dev/kvm */
diff --git a/tools/kvm/include/kvm/virtio-balloon.h 
b/tools/kvm/include/kvm/virtio-balloon.h
new file mode 100644
index 000..eb49fd4
--- /dev/null
+++ b/tools/kvm/include/kvm/virtio-balloon.h
@@ -0,0 +1,8 @@
+#ifndef KVM__BLN_VIRTIO_H
+#define KVM__BLN_VIRTIO_H
+
+struct kvm;
+
+void virtio_bln__init(struct kvm *kvm);
+
+#endif /* KVM__BLN_VIRTIO_H */
diff --git a/tools/kvm/include/kvm/virtio-pci-dev.h 
b/tools/kvm/include/kvm/virtio-pci-dev.h
index ca373df..4eee831 100644
--- a/tools/kvm/include/kvm/virtio-pci-dev.h
+++ b/tools/kvm/include/kvm/virtio-pci-dev.h
@@ -12,6 +12,7 @@
 #define PCI_DEVICE_ID_VIRTIO_BLK   0x1001
 #define PCI_DEVICE_ID_VIRTIO_CONSOLE   0x1003
 #define PCI_DEVICE_ID_VIRTIO_RNG   0x1004
+#define PCI_DEVICE_ID_VIRTIO_BLN   0x1005
 #define PCI_DEVICE_ID_VIRTIO_P90x1009
 #define PCI_DEVICE_ID_VESA 0x2000
 
diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c
index 56c39ab..a7f010c 100644
--- a/tools/kvm/kvm-run.c
+++ b/tools/kvm/kvm-run.c
@@ -18,6 +18,7 @@
 #include kvm/virtio-net.h
 #include kvm/virtio-console.h
 #include kvm/virtio-rng.h
+#include kvm/virtio-balloon.h
 #include kvm/disk-image.h
 #include kvm/util.h
 #include kvm/pci.h
@@ -74,6 +75,7 @@ static bool single_step;
 static bool readonly_image[MAX_DISK_IMAGES];
 static bool vnc;
 static bool sdl;
+static bool balloon;
 extern bool ioport_debug;
 extern int  active_console;
 extern int  debug_iodelay;
@@ -145,6 +147,7 @@ static const struct option options[] = {
OPT_STRING('\0', kvm-dev, kvm_dev, kvm-dev, KVM device file),
OPT_CALLBACK('\0', virtio-9p, NULL, dirname,tag_name,
 Enable 9p over virtio, virtio_9p_rootdir_parser),
+   OPT_BOOLEAN('\0', balloon, balloon, Enable virtio balloon),
OPT_BOOLEAN('\0', vnc, vnc, Enable VNC framebuffer),
OPT_BOOLEAN('\0', sdl, sdl, Enable SDL framebuffer),
 
@@ -629,6 +632,9 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
while (virtio_rng--)
virtio_rng__init(kvm);
 
+   if (balloon)
+   virtio_bln__init(kvm);
+
if (!network)
network = DEFAULT_NETWORK;
 
diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
new file mode 100644
index 000..ab9ccb7
--- /dev/null
+++ b/tools/kvm/virtio/balloon.c
@@ -0,0 +1,265 @@
+#include kvm/virtio-balloon.h
+
+#include kvm/virtio-pci-dev.h
+
+#include kvm/disk-image.h
+#include kvm/virtio.h
+#include kvm/ioport.h
+#include kvm/util.h
+#include kvm/kvm.h
+#include kvm/pci.h
+#include kvm/threadpool.h
+#include kvm/irq.h
+#include kvm/ioeventfd.h
+
+#include linux/virtio_ring.h
+#include linux/virtio_balloon.h
+
+#include 

[PATCH v2 8/8] kvm tools: Add 'kvm balloon' command

2011-07-01 Thread Sasha Levin
Add a command to allow easily inflate/deflate the balloon driver in running
instances.

Usage:
kvm balloon [command] [instance name] [size]

command is either inflate or deflate, and size is represented in MB.
Target instance must be named (started with '--name').

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/Makefile  |1 +
 tools/kvm/include/kvm/kvm-balloon.h |6 ++
 tools/kvm/kvm-balloon.c |   34 ++
 tools/kvm/kvm-cmd.c |   12 +++-
 tools/kvm/virtio/balloon.c  |8 
 5 files changed, 52 insertions(+), 9 deletions(-)
 create mode 100644 tools/kvm/include/kvm/kvm-balloon.h
 create mode 100644 tools/kvm/kvm-balloon.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 1ec75da..90ad708 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -58,6 +58,7 @@ OBJS  += kvm-cmd.o
 OBJS   += kvm-debug.o
 OBJS   += kvm-help.o
 OBJS+= kvm-pause.o
+OBJS+= kvm-balloon.o
 OBJS   += kvm-run.o
 OBJS   += mptable.o
 OBJS   += rbtree.o
diff --git a/tools/kvm/include/kvm/kvm-balloon.h 
b/tools/kvm/include/kvm/kvm-balloon.h
new file mode 100644
index 000..f5f92b9
--- /dev/null
+++ b/tools/kvm/include/kvm/kvm-balloon.h
@@ -0,0 +1,6 @@
+#ifndef KVM__BALLOON_H
+#define KVM__BALLOON_H
+
+int kvm_cmd_balloon(int argc, const char **argv, const char *prefix);
+
+#endif
diff --git a/tools/kvm/kvm-balloon.c b/tools/kvm/kvm-balloon.c
new file mode 100644
index 000..277cada
--- /dev/null
+++ b/tools/kvm/kvm-balloon.c
@@ -0,0 +1,34 @@
+#include stdio.h
+#include string.h
+#include signal.h
+
+#include kvm/util.h
+#include kvm/kvm-cmd.h
+#include kvm/kvm-balloon.h
+#include kvm/kvm.h
+
+int kvm_cmd_balloon(int argc, const char **argv, const char *prefix)
+{
+   int pid;
+   int amount, i;
+   int inflate = 0;
+
+   if (argc != 3)
+   die(Usage: kvm balloon [command] [instance name] [amount]\n);
+
+   pid = kvm__get_pid_by_instance(argv[1]);
+   if (pid  0)
+   die(Failed locating instance name);
+
+   if (strcmp(argv[0], inflate) == 0)
+   inflate = 1;
+   else if (strcmp(argv[0], deflate))
+   die(command can be either 'inflate' or 'deflate');
+
+   amount = atoi(argv[2]);
+
+   for (i = 0; i  amount; i++)
+   kill(pid, inflate ? SIGKVMADDMEM : SIGKVMDELMEM);
+
+   return 0;
+}
diff --git a/tools/kvm/kvm-cmd.c b/tools/kvm/kvm-cmd.c
index ffbc4ff..1598781 100644
--- a/tools/kvm/kvm-cmd.c
+++ b/tools/kvm/kvm-cmd.c
@@ -7,16 +7,18 @@
 /* user defined header files */
 #include kvm/kvm-debug.h
 #include kvm/kvm-pause.h
+#include kvm/kvm-balloon.h
 #include kvm/kvm-help.h
 #include kvm/kvm-cmd.h
 #include kvm/kvm-run.h
 
 struct cmd_struct kvm_commands[] = {
-   { pause, kvm_cmd_pause, NULL, 0 },
-   { debug, kvm_cmd_debug, NULL, 0 },
-   { help,  kvm_cmd_help,  NULL, 0 },
-   { run,   kvm_cmd_run,   kvm_run_help, 0 },
-   { NULL,NULL,  NULL, 0 },
+   { pause,  kvm_cmd_pause,  NULL, 0 },
+   { debug,  kvm_cmd_debug,  NULL, 0 },
+   { balloon,kvm_cmd_balloon,NULL, 0 },
+   { help,   kvm_cmd_help,   NULL, 0 },
+   { run,kvm_cmd_run,kvm_run_help, 0 },
+   { NULL, NULL,   NULL, 0 },
 };
 
 /*
diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
index ab9ccb7..854d04b 100644
--- a/tools/kvm/virtio/balloon.c
+++ b/tools/kvm/virtio/balloon.c
@@ -39,7 +39,7 @@ struct bln_dev {
/* virtio queue */
u16 queue_selector;
struct virt_queue   vqs[NUM_VIRT_QUEUES];
-   void*jobs[NUM_VIRT_QUEUES];
+   struct thread_pool__job jobs[NUM_VIRT_QUEUES];
 
struct virtio_balloon_config config;
 };
@@ -174,13 +174,13 @@ static bool virtio_bln_pci_io_out(struct ioport *ioport, 
struct kvm *kvm, u16 po
 
vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, 
VIRTIO_PCI_VRING_ALIGN);
 
-   bdev.jobs[bdev.queue_selector] = thread_pool__add_job(kvm, 
virtio_bln_do_io, queue);
+   thread_pool__init_job(bdev.jobs[bdev.queue_selector], kvm, 
virtio_bln_do_io, queue);
 
ioevent = (struct ioevent) {
.io_addr= bdev.base_addr + 
VIRTIO_PCI_QUEUE_NOTIFY,
.io_len = sizeof(u16),
.fn = ioevent_callback,
-   .fn_ptr = 
bdev.jobs[bdev.queue_selector],
+   .fn_ptr = 
bdev.jobs[bdev.queue_selector],
.datamatch  = bdev.queue_selector,
.fn_kvm = kvm,
.fd = 

[PATCH 9/9] kvm tools: Stop VCPUs before freeing struct kvm

2011-07-01 Thread Sasha Levin
Not stopping VCPUs before leads to seg faults and other errors due to
synchronization between threads.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/term.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/term.c b/tools/kvm/term.c
index a0cb03f..2a3e1f0 100644
--- a/tools/kvm/term.c
+++ b/tools/kvm/term.c
@@ -10,6 +10,7 @@
 #include kvm/term.h
 #include kvm/util.h
 #include kvm/kvm.h
+#include kvm/kvm-cpu.h
 
 extern struct kvm *kvm;
 static struct termios  orig_term;
@@ -34,6 +35,7 @@ int term_getc(int who)
if (term_got_escape) {
term_got_escape = false;
if (c == 'x') {
+   kvm_cpu__reboot();
kvm__delete(kvm);
printf(\n  # KVM session terminated.\n);
exit(1);
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/8] kvm tools: Advise memory allocated for guest RAM as KSM mergable

2011-07-01 Thread Sasha Levin
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/kvm.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 23d31a3..2f5d633 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -346,6 +346,8 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, 
const char *name)
if (kvm-ram_start == MAP_FAILED)
die(out of memory);
 
+   madvise(kvm-ram_start, kvm-ram_size, MADV_MERGEABLE);
+
ret = ioctl(kvm-vm_fd, KVM_CREATE_IRQCHIP);
if (ret  0)
die_perror(KVM_CREATE_IRQCHIP ioctl);
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio_balloon: Notify guest only after deflating the balloon

2011-07-01 Thread Sasha Levin
Unless the host requires that requested pages won't be used until
he us notified (VIRTIO_BALLOON_F_MUST_TELL_HOST), only notify after
deflating the balloon.

This will avoid having to take an exit before actually using the pages.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: virtualizat...@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 drivers/virtio/virtio_balloon.c |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index e058ace..055f95d 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -148,14 +148,18 @@ static void leak_balloon(struct virtio_balloon *vb, 
size_t num)
vb-num_pages--;
}
 
-
/*
-* Note that if
-* virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
-* is true, we *have* to do it in this order
+* If the host doesn't require us to notify him before using
+* pages which belong to the balloon, update him only after
+* freeing those pages for guest use.
 */
-   tell_host(vb, vb-deflate_vq);
-   release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   if (virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST)) {
+   tell_host(vb, vb-deflate_vq);
+   release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   } else {
+   release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   tell_host(vb, vb-deflate_vq);
+   }
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/31] Implement user mode network for kvm tools

2011-07-01 Thread Asias He
On 07/02/2011 04:36 AM, Pekka Enberg wrote:
 On Fri, Jul 1, 2011 at 7:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jul 1, 2011 at 12:38 AM, Asias He asias.he...@gmail.com wrote:
 On 06/30/2011 04:56 PM, Stefan Hajnoczi wrote:
 On Thu, Jun 30, 2011 at 9:40 AM, Asias He asias.he...@gmail.com wrote:
 uip stands for user mode {TCP,UDP}/IP. Currently, uip supports ARP, ICMP,
 IPV4, UDP, TCP. So any network protocols above UDP/TCP should work as 
 well,
 e.g., HTTP, FTP, SSH, DNS.

 There is an existing uIP which might cause confusion, not sure if
 you've seen it.  First I thought you were using that :).

 I heard about uIP, but this patchset have nothing to do with uIP ;-)

 At first I was naming the user mode network as UNET which is User mode
 NETwork, however, I though uip looks better because it is shorter.

 Anyway, if uip do cause confusion. I'd like to change this naming.

 It's up to you but now is the right time to do it.  Consider if
 another program wants to reuse this code or if you ever want to make
 it a library, it wouldn't help to have a confusing name.
 
 I don't care too much what we use as the namespace prefix but as a
 directory name tools/kvm/uip is pretty meaningless. I'd just move the
 code under tools/kvm/net to mirror what the kernel already has.
 

I have thought about putting user mode net code in tools/kvm/net.
However, we have net code in tools/kvm/virtio as well. Is this a problem
in terms of clean code organization?

And I think splitting the tap code in virtio/net.c into tools/kvm/net is
a good idea. Further, we can put macvtap related code into tools/kvm/net
as well.

-- 
Best Regards,
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate

2011-07-01 Thread Paul Mackerras
On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote:
 On 06/29/2011 12:41 PM, Paul Mackerras wrote:
 +struct kvm_ppc_set_platform {
 +__u16 platform; /* defines the OS/hypervisor ABI */
 +__u16 guest_arch;   /* e.g. decimal 206 for v2.06 */
 +__u32 flags;
 
 Please add some padding so we can extend it later if necessary.
 
 +};
 +
 +/* Values for platform */
 +#define KVM_PPC_PV_NONE 0   /* bare-metal, 
 non-paravirtualized */
 +#define KVM_PPC_PV_KVM  1   /* as defined in kvm_para.h */
 +#define KVM_PPC_PV_SPAPR2   /* IBM Server PAPR (a la PowerVM) */
 
 We also support BookE which would be useful to also include in the list.
 Furthermore, KVM is more of a feature flag than a platform. We can
 easily support KVM extensions on an SPAPR platform, no?

Yes, I guess so.  The hypercall sequence will have to be different,
since ordinary system call interrupts go straight to the guest.  But I
guess you've allowed for that with the hypercall sequence property in
the device tree.

 This whole interface also could deprecate the PVR setting one, so we
 can simply include PVR as well and not require kernel space to jump
 through hoops to figure out its capabilities.

I debated about whether to include a PVR value in this structure.

The thing is that POWER7 has the Processor Compatibility Register
(PCR), which has a bit which makes the processor behave in user mode
as if it were a POWER6.  So, we could run a book3s_hv guest in POWER6
mode by setting this bit (which we might want to do to run older
distros).  However, this bit doesn't affect the PVR value that the
guest sees.  That's why I went for an architecture level rather than a
specific PVR value.

We could go with a PVR value and use the logical PVR values defined
in PAPR to represent architecture levels, e.g. 0x0f02 for
architecture v2.05 (POWER6).

 And we need to identify 32-bit BookS processors, so we can go into
 32-bit mode when necessary. That should also be a different
 guest_arch, right?

Right.  If we go with a PVR value then we just use the PVR value for a
suitable 32-bit processor.

 +
 +/* Values for flags */
 +#define KVM_PPC_CROSS_ARCH  1   /* guest architecture != host */
 
 User space shouldn't have to worry about this one. It's up to the
 kernel to decide that it's cross.

I put that in because we might want to force the use of book3s_pr, for
example if we know we're going to want to do emulated MMIO or
something else that isn't implemented in book3s_hv just yet.

Ultimately, yes, the kernel should be able to decide whether it's
cross or not.  However, I don't think we should make it completely
opaque to userspace as to whether the kernel is using _pr or _hv.
If nothing else, userspace should be able to find out and tell the
user so that performance expectations can be set correctly.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/17] KVM: PPC: Deliver program interrupts right away instead of queueing them

2011-07-01 Thread Alexander Graf

On 29.06.2011, at 12:18, Paul Mackerras wrote:

 Doing so means that we don't have to save the flags anywhere and gets
 rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c.
 
 Doing so is OK because a program interrupt won't be generated at the
 same time as any other synchronous interrupt.  If a program interrupt
 and an asynchronous interrupt (external or decrementer) are generated
 at the same time, the program interrupt will be delivered, which is
 correct because it has a higher priority, and then the asynchronous
 interrupt will be masked.
 
 We don't ever generate system reset or machine check interrupts to the
 guest, but if we did, then we would need to make sure they got delivered
 rather than the program interrupt.  The current code would be wrong in
 this situation anyway since it would deliver the program interrupt as
 well as the reset/machine check interrupt.
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
 arch/powerpc/kvm/book3s.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
 index 163e3e1..f68a34d 100644
 --- a/arch/powerpc/kvm/book3s.c
 +++ b/arch/powerpc/kvm/book3s.c
 @@ -129,8 +129,8 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, 
 unsigned int vec)
 
 void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
 - to_book3s(vcpu)-prog_flags = flags;

Now that prog_flags is unused, please remove it from the headers.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/17] Hypervisor-mode KVM on POWER7 and PPC970

2011-07-01 Thread Alexander Graf

On 29.06.2011, at 12:15, Paul Mackerras wrote:

 The first patch of the following series is a pure bug-fix for 32-bit
 kernels.
 
 The remainder of the following series of patches enable KVM to exploit
 the hardware hypervisor mode on 64-bit Power ISA Book3S machines.  At
 present, POWER7 and PPC970 processors are supported.  (Note that the
 PPC970 processors in Apple G5 machines don't have a usable hypervisor
 mode and are not supported by these patches.)
 
 Running the KVM host in hypervisor mode means that the guest can use
 both supervisor mode and user mode.  That means that the guest can
 execute supervisor-privilege instructions and access supervisor-
 privilege registers.  In addition the hardware directs most exceptions
 to the guest.  Thus we don't need to emulate any instructions in the
 host.  Generally, the only times we need to exit the guest are when it
 does a hypercall or when an external interrupt or host timer
 (decrementer) interrupt occurs.
 
 The focus of this KVM implementation is to run guests that use the
 PAPR (Power Architecture Platform Requirements) paravirtualization
 interface, which is the interface supplied by PowerVM on IBM pSeries
 machines.  Currently the pseries machine type in qemu is only
 supported by book3s_hv KVM, and book3s_hv KVM only supports the
 pseries machine type.  That will hopefully change in future.
 
 These patches are against the master branch of the kvm tree.

Something seems to be broken with signals. When running without io-thread, I 
can't even do ctrl-c on -nographic while the guest is in sleep mode. But that 
might not be related to your patches.

I've applied 01-16 now. Sending them through some more testing and if they're 
good, sending a pull request.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html