Re: virtio scsi host draft specification, v3
On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote: Of course, when doing so we would be lose the ability to freely remap LUNs. But then remapping LUNs doesn't gain you much imho. Plus you could always use qemu block backend here if you want to hide the details. And you could always use the QEMU block backend with scsi-generic if you want to remap LUNs, instead of true passthrough via the kernel target. IIUC the in-kernel target always does remapping. It passes through individual LUNs rather than entire targets and you pick LU Numbers to map to the backing storage (which may or may not be a SCSI pass-through device). Nicholas Bellinger can confirm whether this is correct. But then I don't understand. If you pick LU numbers both with the in-kernel target and with QEMU, you do not need to use e.g. WWPNs with fiber channel, because we are not passing through the details of the transport protocol (one day we might have virtio-fc, but more likely not). So the LUNs you use might as well be represented by hierarchical LUNs. Using NPIV with KVM would be done by mapping the same virtual N_Port ID in the host(s) to the same LU number in the guest. You might already do this now with virtio-blk, in fact. Put in another way: the virtio-scsi device is itself a SCSI target, so yes, there is a single target port identifier in virtio-scsi. But this SCSI target just passes requests down to multiple real targets, and so will let you do ALUA and all that. Of course if I am dead wrong please correct me. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tools/kvm: Use kernel header version of net/9p/9p.h
don't do a copy of the kernel header Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- NOTE: This patch is against -tip. include/net/9p/9p.h |2 +- tools/kvm/include/net/9p/9p.h | 734 - 2 files changed, 1 insertions(+), 735 deletions(-) delete mode 100644 tools/kvm/include/net/9p/9p.h diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h index 008711e..b7d83e9 100644 --- a/include/net/9p/9p.h +++ b/include/net/9p/9p.h @@ -561,7 +561,7 @@ struct p9_rauth { struct p9_rerror { struct p9_str error; - u32 errno; /* 9p2000.u extension */ + u32 p9_errno; /* 9p2000.u extension */ }; struct p9_tflush { diff --git a/tools/kvm/include/net/9p/9p.h b/tools/kvm/include/net/9p/9p.h deleted file mode 100644 index 61ecff3..000 --- a/tools/kvm/include/net/9p/9p.h +++ /dev/null @@ -1,734 +0,0 @@ -/* - * include/net/9p/9p.h - * - * 9P protocol definitions. - * - * Copyright (C) 2005 by Latchesar Ionkov lu...@ionkov.net - * Copyright (C) 2004 by Eric Van Hensbergen eri...@gmail.com - * Copyright (C) 2002 by Ron Minnich rminn...@lanl.gov - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 - * as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to: - * Free Software Foundation - * 51 Franklin Street, Fifth Floor - * Boston, MA 02111-1301 USA - * - */ - -#ifndef NET_9P_H -#define NET_9P_H - -#pragma pack(1) - -/** - * enum p9_debug_flags - bits for mount time debug parameter - * @P9_DEBUG_ERROR: more verbose error messages including original error string - * @P9_DEBUG_9P: 9P protocol tracing - * @P9_DEBUG_VFS: VFS API tracing - * @P9_DEBUG_CONV: protocol conversion tracing - * @P9_DEBUG_MUX: trace management of concurrent transactions - * @P9_DEBUG_TRANS: transport tracing - * @P9_DEBUG_SLABS: memory management tracing - * @P9_DEBUG_FCALL: verbose dump of protocol messages - * @P9_DEBUG_FID: fid allocation/deallocation tracking - * @P9_DEBUG_PKT: packet marshalling/unmarshalling - * @P9_DEBUG_FSC: FS-cache tracing - * - * These flags are passed at mount time to turn on various levels of - * verbosity and tracing which will be output to the system logs. - */ - -enum p9_debug_flags { - P9_DEBUG_ERROR =(10), - P9_DEBUG_9P = (12), - P9_DEBUG_VFS = (13), - P9_DEBUG_CONV = (14), - P9_DEBUG_MUX = (15), - P9_DEBUG_TRANS =(16), - P9_DEBUG_SLABS =(17), - P9_DEBUG_FCALL =(18), - P9_DEBUG_FID = (19), - P9_DEBUG_PKT = (110), - P9_DEBUG_FSC = (111), -}; - -#ifdef CONFIG_NET_9P_DEBUG -extern unsigned int p9_debug_level; - -#define P9_DPRINTK(level, format, arg...) \ -do { \ - if ((p9_debug_level level) == level) {\ - if (level == P9_DEBUG_9P) \ - printk(KERN_NOTICE (%8.8d) \ - format , task_pid_nr(current) , ## arg); \ - else \ - printk(KERN_NOTICE -- %s (%d): \ - format , __func__, task_pid_nr(current) , ## arg); \ - } \ -} while (0) - -#else -#define P9_DPRINTK(level, format, arg...) do { } while (0) -#endif - -#define P9_EPRINTK(level, format, arg...) \ -do { \ - printk(level 9p: %s (%d): \ - format , __func__, task_pid_nr(current), ## arg); \ -} while (0) - -/** - * enum p9_msg_t - 9P message types - * @P9_TLERROR: not used - * @P9_RLERROR: response for any failed request for 9P2000.L - * @P9_TSTATFS: file system status request - * @P9_RSTATFS: file system status response - * @P9_TSYMLINK: make symlink request - * @P9_RSYMLINK: make symlink response - * @P9_TMKNOD: create a special file object request - * @P9_RMKNOD: create a special file object response - * @P9_TLCREATE: prepare a handle for I/O on an new file for 9P2000.L - * @P9_RLCREATE: response with file access information for 9P2000.L - * @P9_TRENAME: rename request - * @P9_RRENAME: rename response - * @P9_TMKDIR: create a directory request - * @P9_RMKDIR: create a directory response - * @P9_TVERSION: version handshake request - * @P9_RVERSION: version handshake response - * @P9_TAUTH: request to establish authentication channel - * @P9_RAUTH: response with authentication information - * @P9_TATTACH: establish user access to file service - * @P9_RATTACH: response with top level handle to file hierarchy - * @P9_TERROR: not used - * @P9_RERROR: response for any failed request -
Re: virtio scsi host draft specification, v3
On 07/01/2011 08:41 AM, Paolo Bonzini wrote: On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote: Of course, when doing so we would be lose the ability to freely remap LUNs. But then remapping LUNs doesn't gain you much imho. Plus you could always use qemu block backend here if you want to hide the details. And you could always use the QEMU block backend with scsi-generic if you want to remap LUNs, instead of true passthrough via the kernel target. IIUC the in-kernel target always does remapping. It passes through individual LUNs rather than entire targets and you pick LU Numbers to map to the backing storage (which may or may not be a SCSI pass-through device). Nicholas Bellinger can confirm whether this is correct. But then I don't understand. If you pick LU numbers both with the in-kernel target and with QEMU, you do not need to use e.g. WWPNs with fiber channel, because we are not passing through the details of the transport protocol (one day we might have virtio-fc, but more likely not). So the LUNs you use might as well be represented by hierarchical LUNs. Actually, the kernel does _not_ do a LUN remapping. It just so happens that most storage arrays will present the LUN starting with 0, so normally you wouldn't notice. However, some arrays have an array-wide LUN range, so you start seeing LUNs at odd places: [3:0:5:0]diskLSI INF-01-000750 /dev/sdw [3:0:5:7]diskLSI Universal Xport 0750 /dev/sdx Using NPIV with KVM would be done by mapping the same virtual N_Port ID in the host(s) to the same LU number in the guest. You might already do this now with virtio-blk, in fact. The point here is not the mapping. The point is rescanning. You can map existing NPIV devices already. But you _cannot_ rescan the host/device whatever _from the guest_ to detect if new devices are present. That is the problem I'm trying to describe here. To be more explicit: Currently you have to map existing devices directly as individual block or scsi devices to the guest. And rescan within the guest can only be sent to that device, so the only information you will get able to gather is if the device itself is still present. You are unable to detect if there are other devices attached to your guest which you should connect to. So we have to have an enclosing instance (ie the equivalent of a SCSI target), which is capable of telling us exactly this. Put in another way: the virtio-scsi device is itself a SCSI target, so yes, there is a single target port identifier in virtio-scsi. But this SCSI target just passes requests down to multiple real targets, and so will let you do ALUA and all that. Argl. No way. The virtio-scsi device has to map to a single LUN. I thought I mentioned this already, but I'd better clarify this again: The SCSI spec itself only deals with LUNs, so anything you'll read in there obviously will only handle the interaction between the initiator (read: host) and the LUN itself. However, the actual command is send via an intermediat target, hence you'll always see the reference to the ITL (initiator-target-lun) nexus. The SCSI spec details discovery of the individual LUNs presented by a given target, it does _NOT_ detail the discovery of the targets themselves. That is being delegated to the underlying transport, in most cases SAS or FibreChannel. For the same reason the SCSI spec can afford to disdain any reference to path failure, device hot-plugging etc; all of these things are being delegated to the transport. In our context the virtio-scsi device should map to the LUN, and the virtio-scsi _host_ backend should map to the target. The virtio-scsi _guest_ driver will then map to the initiator. So we should be able to attach more than one device to the backend, which then will be presented to the initiator. In the case of NPIV it would make sense to map the virtual SCSI host to the backend, so that all devices presented to the virtual SCSI host will be presented to the backend, too. However, when doing so these devices will normally be referenced by their original LUN, as these will be presented to the guest via eg 'REPORT LUNS'. The above thread now tries to figure out if we should remap those LUN numbers or just expose them as they are. If we decide on remapping, we have to emulate _all_ commands referring explicitely to those LUN numbers (persistent reservations, anyone?). If we don't, we would expose some hardware detail to the guest, but would save us _a lot_ of processing. I'm all for the latter. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Biweekly KVM Test report, kernel 2e0d8e28... qemu d5893103...
Hi All, This is KVM test result against kvm.git 2e0d8e289ef23d0e56923d778e9bea0601a0edb4 based on kernel 3.0.0-rc5+, and qemu-kvm.git d58931037dbb4fbc2fbb33858629d3fabfd1b0d4. We found a make error issue of qemu-kvm.git. This issue was reported in qemu’s bugzilla by someone else. I commented 2 lines in /usr/include/pngconf.h to work around in our kvm build system. And the issue also occurred at http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/888/steps/compile/logs/stdio New issue: 1. qemu-kvm.git make error when ‘CC ui/vnc-enc-tight.o’ https://bugs.launchpad.net/qemu/+bug/802588 Old Issues: 1. ltp diotest running time is 2.54 times than before https://sourceforge.net/tracker/?func=detailaid=2723366group_id=180599atid=893831 2. perfctr wrmsr warning when booting 64bit RHEl5.3 https://sourceforge.net/tracker/?func=detailaid=2721640group_id=180599atid=893831 3. [vt-d] NIC assignment order in command line make some NIC can't work https://bugs.launchpad.net/qemu/+bug/799036 Test environment: == Platform Westmere-EP SanyBridge-EP CPU Cores 24 32 Memory size 10G 32G Report summary of IA32E on Westmere-EP: Summary Test Report of Last Session = Total Pass Fail NoResult Crash = control_panel_ept_vpid 12 12 0 0 0 control_panel_ept 4 4 0 0 0 control_panel_vpid 3 3 0 0 0 control_panel 3 3 0 0 0 gtest_vpid 1 1 0 0 0 gtest_ept 1 1 0 0 0 gtest 3 3 0 0 0 vtd_ept_vpid 3 2 1 0 0 gtest_ept_vpid 12 12 0 0 0 sriov_ept_vpid 6 6 0 0 0 = control_panel_ept_vpid 12 12 0 0 0 :KVM_LM_Continuity_64_g3 1 1 0 0 0 :KVM_four_dguest_64_g32e 1 1 0 0 0 :KVM_1500M_guest_64_gPAE 1 1 0 0 0 :KVM_SR_SMP_64_g32e 1 1 0 0 0 :KVM_LM_SMP_64_g32e 1 1 0 0 0 :KVM_linux_win_64_g32e 1 1 0 0 0 :KVM_two_winxp_64_g32e 1 1 0 0 0 :KVM_1500M_guest_64_g32e 1 1 0 0 0 :KVM_256M_guest_64_gPAE 1 1 0 0 0 :KVM_SR_Continuity_64_g3 1 1 0 0 0 :KVM_256M_guest_64_g32e 1 1 0 0 0 :KVM_four_sguest_64_g32e 1 1 0 0 0 control_panel_ept 4 4 0 0 0 :KVM_linux_win_64_g32e 1 1 0 0 0 :KVM_1500M_guest_64_g32e 1 1 0 0 0 :KVM_1500M_guest_64_gPAE 1 1 0 0 0 :KVM_LM_SMP_64_g32e 1 1 0 0 0 control_panel_vpid 3 3 0 0 0 :KVM_linux_win_64_g32e 1 1 0 0 0 :KVM_1500M_guest_64_g32e 1 1 0 0 0 :KVM_1500M_guest_64_gPAE 1 1 0 0 0 control_panel 3 3 0 0 0 :KVM_1500M_guest_64_g32e 1 1 0 0 0 :KVM_1500M_guest_64_gPAE 1 1 0 0 0 :KVM_LM_SMP_64_g32e 1 1 0 0 0 gtest_vpid 1 1 0 0 0 :boot_smp_win7_ent_64_g3 1 1 0 0 0 gtest_ept 1 1 0 0 0 :boot_smp_win7_ent_64_g3 1 1 0 0 0 gtest 3 3 0 0 0 :boot_smp_win2008_64_g32 1 1 0 0 0 :boot_smp_win7_ent_64_gP 1 1 0 0 0 :boot_smp_vista_64_g32e 1 1 0 0 0 vtd_ept_vpid 3 2 1 0 0 :one_pcie_smp_xp_64_g32e 1 1 0 0 0 :one_pcie_smp_64_g32e 1 1 0 0 0 :two_dev_smp_64_g32e 1 0 1 0 0 gtest_ept_vpid 12 12 0 0 0 :boot_up_acpi_64_g32e 1 1 0 0 0
[PATCH 0/3] [v4] Megasas HBA emulation
Hi all, thanks to Paolo and Stefan most of the SCSI patches are now in, so I've made the next attempt of submitting my Megaraid SAS HBA emulation. To do so, I've done two additional patches, both should be valid cleanups. - Replace 'tag' by 'hba_private' The SCSIRequest structure has a 'tag', which is being used by the drivers to match the SCSIRequest to the internal request structure. The only driver actually to benefit from this is the lsi53c895a driver, everyone else either leaves it blank or uses some internal numberting here. So this patch converts the 'tag' to a 'hba_private' pointer, which allows the driver to store a pointer to the internal structure directly within the SCSIRequest. This saves the lookup and an additional field in the driver internal request structure. - Add an 'offset' parameter to iov_to_buf() iov_from_buf() has it, but iov_to_buf() has it not. But we'll be needing it if the iovec is larger than the buffer. So there. And, of course, the megasas driver itself. Which has been modified to work with the new interface; otherwise there have been no changes to the previous submission. Hannes Reinecke (3): iov: Add 'offset' parameter to iov_to_buf() scsi: replace 'tag' with 'hba_private' pointer megasas: LSI Megaraid SAS emulation Makefile.objs |1 + default-configs/pci.mak |1 + hw/esp.c|2 +- hw/lsi53c895a.c | 17 +- hw/megasas.c| 1923 +++ hw/mfi.h| 1197 + hw/pci_ids.h|3 +- hw/scsi-bus.c | 22 +- hw/scsi-disk.c |5 +- hw/scsi-generic.c |4 +- hw/scsi.h |8 +- hw/spapr_vscsi.c| 41 +- hw/usb-msd.c| 10 +- hw/virtio-net.c |2 +- hw/virtio-serial-bus.c |2 +- iov.c | 23 +- iov.h |2 +- trace-events| 14 +- 18 files changed, 3193 insertions(+), 84 deletions(-) create mode 100644 hw/megasas.c create mode 100644 hw/mfi.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()
Occasionally, the buffer needs to be placed at a offset within the iovec when copying the buffer to the iovec. Signed-off-by: Hannes Reinecke h...@suse.de --- hw/virtio-net.c|2 +- hw/virtio-serial-bus.c |2 +- iov.c | 23 ++- iov.h |2 +- 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 6997e02..a32cc01 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -657,7 +657,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ /* copy in packet. ugh */ len = iov_from_buf(sg, elem.in_num, - buf + offset, size - offset); + buf + offset, 0, size - offset); total += len; offset += len; /* If buffers can't be merged, at this point we diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 7f6db7b..53c58d0 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -103,7 +103,7 @@ static size_t write_to_port(VirtIOSerialPort *port, } len = iov_from_buf(elem.in_sg, elem.in_num, - buf + offset, size - offset); + buf + offset, 0, size - offset); offset += len; virtqueue_push(vq, elem, len); diff --git a/iov.c b/iov.c index 588cd04..9ead6ee 100644 --- a/iov.c +++ b/iov.c @@ -15,21 +15,26 @@ #include iov.h size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, -const void *buf, size_t size) +const void *buf, size_t offset, size_t size) { -size_t offset; +size_t iov_off, buf_off; unsigned int i; -offset = 0; -for (i = 0; offset size i iovcnt; i++) { -size_t len; +iov_off = 0; +buf_off = 0; +for (i = 0; i iovcnt size; i++) { +if (offset (iov_off + iov[i].iov_len)) { +size_t len = MIN((iov_off + iov[i].iov_len) - offset, size); -len = MIN(iov[i].iov_len, size - offset); +memcpy(iov[i].iov_base + (offset - iov_off), buf + buf_off, len); -memcpy(iov[i].iov_base, buf + offset, len); -offset += len; +buf_off += len; +offset += len; +size -= len; +} +iov_off += iov[i].iov_len; } -return offset; +return buf_off; } size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt, diff --git a/iov.h b/iov.h index 60a8547..2677527 100644 --- a/iov.h +++ b/iov.h @@ -13,7 +13,7 @@ #include qemu-common.h size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, -const void *buf, size_t size); +const void *buf, size_t offset, size_t size); size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt, void *buf, size_t offset, size_t size); size_t iov_size(const struct iovec *iov, const unsigned int iovcnt); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer
'tag' is just an abstraction to identify the command from the driver. So we should make that explicit by replacing 'tag' with a driver-defined pointer 'hba_private'. This saves the lookup for driver handling several commands in parallel. Signed-off-by: Hannes Reinecke h...@suse.de --- hw/esp.c |2 +- hw/lsi53c895a.c | 17 - hw/scsi-bus.c | 22 +++--- hw/scsi-disk.c|5 ++--- hw/scsi-generic.c |4 ++-- hw/scsi.h |8 hw/spapr_vscsi.c | 41 - hw/usb-msd.c | 10 +- trace-events | 14 +++--- 9 files changed, 52 insertions(+), 71 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 6d3f5d2..912ff89 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid) DPRINTF(do_busid_cmd: busid 0x%x\n, busid); lun = busid 7; -s-current_req = scsi_req_new(s-current_dev, 0, lun); +s-current_req = scsi_req_new(s-current_dev, lun, s); datalen = scsi_req_enqueue(s-current_req, buf); s-ti_size = datalen; if (datalen != 0) { diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 940b43a..272e919 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -670,7 +670,7 @@ static void lsi_request_cancelled(SCSIRequest *req) return; } -p = lsi_find_by_tag(s, req-tag); +p = req-hba_private; if (p) { QTAILQ_REMOVE(s-queue, p, next); scsi_req_unref(req); @@ -680,18 +680,17 @@ static void lsi_request_cancelled(SCSIRequest *req) /* Record that data is available for a queued command. Returns zero if the device was reselected, nonzero if the IO is deferred. */ -static int lsi_queue_tag(LSIState *s, uint32_t tag, uint32_t len) +static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len) { -lsi_request *p; +lsi_request *p = req-hba_private; -p = lsi_find_by_tag(s, tag); if (!p) { -BADF(IO with unknown tag %d\n, tag); +BADF(IO with unknown reference %p\n, req-hba_private); return 1; } if (p-pending) { -BADF(Multiple IO pending for tag %d\n, tag); +BADF(Multiple IO pending for request %p\n, p); } p-pending = len; /* Reselect if waiting for it, or if reselection triggers an IRQ @@ -743,9 +742,9 @@ static void lsi_transfer_data(SCSIRequest *req, uint32_t len) LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent); int out; -if (s-waiting == 1 || !s-current || req-tag != s-current-tag || +if (s-waiting == 1 || !s-current || req-hba_private != s-current || (lsi_irq_on_rsl(s) !(s-scntl1 LSI_SCNTL1_CON))) { -if (lsi_queue_tag(s, req-tag, len)) { +if (lsi_queue_req(s, req, len)) { return; } } @@ -789,7 +788,7 @@ static void lsi_do_command(LSIState *s) assert(s-current == NULL); s-current = qemu_mallocz(sizeof(lsi_request)); s-current-tag = s-select_tag; -s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun); +s-current-req = scsi_req_new(dev, s-current_lun, s-current); n = scsi_req_enqueue(s-current-req, buf); if (n) { diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index ad6a730..d1fc481 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -131,7 +131,7 @@ int scsi_bus_legacy_handle_cmdline(SCSIBus *bus) return res; } -SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t lun) +SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t lun, void *hba_private) { SCSIRequest *req; @@ -139,16 +139,16 @@ SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t l req-refcount = 1; req-bus = scsi_bus_from_device(d); req-dev = d; -req-tag = tag; req-lun = lun; +req-hba_private = hba_private; req-status = -1; -trace_scsi_req_alloc(req-dev-id, req-lun, req-tag); +trace_scsi_req_alloc(req-dev-id, req-lun, req-hba_private); return req; } -SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun) +SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t lun, void *hba_private) { -return d-info-alloc_req(d, tag, lun); +return d-info-alloc_req(d, lun, hba_private); } uint8_t *scsi_req_get_buf(SCSIRequest *req) @@ -182,7 +182,7 @@ int32_t scsi_req_enqueue(SCSIRequest *req, uint8_t *buf) static void scsi_req_dequeue(SCSIRequest *req) { -trace_scsi_req_dequeue(req-dev-id, req-lun, req-tag); +trace_scsi_req_dequeue(req-dev-id, req-lun, req-hba_private); if (req-enqueued) { QTAILQ_REMOVE(req-dev-requests, req, next); req-enqueued = false; @@ -214,7 +214,7 @@ static int scsi_req_length(SCSIRequest *req, uint8_t *cmd) req-cmd.len = 12; break; default: -trace_scsi_req_parse_bad(req-dev-id, req-lun, req-tag, cmd[0]); +
[PATCH 0/3] KVM test: Windows install fixes
These 3 patches fixes problems found when performing a full round of windows installs. Lucas Meneghel Rodrigues (3): KVM test: Render unattended files more properly KVM test: Update Win2003 CD info to match MSDN registers KVM test: Reformat sample windows ini style unattended files client/tests/kvm/tests/unattended_install.py | 191 +- client/tests/kvm/tests_base.cfg.sample | 44 +-- client/tests/kvm/unattended/win2000-32.sif | 95 +++-- client/tests/kvm/unattended/win2003-32.sif | 78 ++-- client/tests/kvm/unattended/win2003-64.sif | 78 ++-- client/tests/kvm/unattended/winxp32.sif | 99 +++--- client/tests/kvm/unattended/winxp64.sif | 99 +++--- 7 files changed, 386 insertions(+), 298 deletions(-) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM test: Update Win2003 CD info to match MSDN registers
Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests_base.cfg.sample | 44 +++ 1 files changed, 32 insertions(+), 12 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index bdc9b6c..5313da1 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -2317,17 +2317,27 @@ variants: - 32: image_name += -32 install: -cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso -md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7 -md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3 +cdrom_cd1 = isos/windows/en_win_srv_2003_r2_enterprise_with_sp2_cd1_x13-05460.iso +md5sum_cd1 = 7c3bc891d20c7e6a110c4f1ad82952ba +md5sum_1m_cd1 = b1671ecf47a270e49e04982bf1474ff9 +sha1sum_cd1 = ee11cc735c695501874d2fa123f7d78449b3de7c +sha1sum_1m_cd1 = e2d49dc3fbe17a6b2ba1812543f2cc08ef9565c4 +#cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso +#md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7 +#md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3 user = user steps = Win2003-32.steps setup: steps = Win2003-32-rss.steps unattended_install.cdrom, whql.support_vm_install: -cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso -md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7 -md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3 +cdrom_cd1 = isos/windows/en_win_srv_2003_r2_enterprise_with_sp2_cd1_x13-05460.iso +md5sum_cd1 = 7c3bc891d20c7e6a110c4f1ad82952ba +md5sum_1m_cd1 = b1671ecf47a270e49e04982bf1474ff9 +sha1sum_cd1 = ee11cc735c695501874d2fa123f7d78449b3de7c +sha1sum_1m_cd1 = e2d49dc3fbe17a6b2ba1812543f2cc08ef9565c4 +#cdrom_cd1 = isos/windows/Windows2003_r2_VLK.iso +#md5sum_cd1 = 03e921e9b4214773c21a39f5c3f42ef7 +#md5sum_1m_cd1 = 37c2fdec15ac4ec16aa10fdfdb338aa3 unattended_file = unattended/win2003-32.sif floppy = images/win2003-32/answer.vfd # Uncomment virtio_network_installer_path line if @@ -2349,17 +2359,27 @@ variants: - 64: image_name += -64 install: -cdrom_cd1 = isos/windows/Windows2003-x64.iso -md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd -md5sum_1m_cd1 = 439393c384116aa09e08a0ad047dcea8 +cdrom_cd1 = isos/windows/en_win_srv_2003_r2_enterprise_x64_with_sp2_cd1_x13-06188.iso +md5sum_cd1 = 09f4cb31796e9802dcc477e397868c9a +md5sum_1m_cd1 = c11ebcf6c128d94c83fe623566eb29d7 +sha1sum_cd1 = d04c8f304047397be486c38a6b769f16993d4b39 +sha1sum_1m_cd1 = 3daf6fafda8ba48779df65e4713a3cdbd6c9d136 +#cdrom_cd1 = isos/windows/Windows2003-x64.iso +#md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd +#md5sum_1m_cd1 = 439393c384116aa09e08a0ad047dcea8 user = user steps = Win2003-64.steps setup: steps = Win2003-64-rss.steps unattended_install.cdrom, whql.support_vm_install: -cdrom_cd1 = isos/windows/Windows2003-x64.iso -md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd -md5sum_1m_cd1 = 439393c384116aa09e08a0ad047dcea8 +cdrom_cd1 = isos/windows/en_win_srv_2003_r2_enterprise_x64_with_sp2_cd1_x13-06188.iso +md5sum_cd1 = 09f4cb31796e9802dcc477e397868c9a +md5sum_1m_cd1 = c11ebcf6c128d94c83fe623566eb29d7 +sha1sum_cd1 = d04c8f304047397be486c38a6b769f16993d4b39 +sha1sum_1m_cd1 = 3daf6fafda8ba48779df65e4713a3cdbd6c9d136 +#cdrom_cd1 = isos/windows/Windows2003-x64.iso +#md5sum_cd1 = 5703f87c9fd77d28c05ffadd3354dbbd +#md5sum_1m_cd1 =
[PATCH 3/3] KVM test: Reformat sample windows ini style unattended files
If we prepend spaces on the key=value lines, ConfigParser will fail to parse the file. So let's reformat the files in a way that we won't have this problem again. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/unattended/win2000-32.sif | 95 ++- client/tests/kvm/unattended/win2003-32.sif | 78 +++--- client/tests/kvm/unattended/win2003-64.sif | 78 +++--- client/tests/kvm/unattended/winxp32.sif| 99 ++-- client/tests/kvm/unattended/winxp64.sif| 99 ++-- 5 files changed, 225 insertions(+), 224 deletions(-) diff --git a/client/tests/kvm/unattended/win2000-32.sif b/client/tests/kvm/unattended/win2000-32.sif index 8720851..6aa1848 100644 --- a/client/tests/kvm/unattended/win2000-32.sif +++ b/client/tests/kvm/unattended/win2000-32.sif @@ -1,73 +1,76 @@ -;SetupMgrTag [Data] -AutoPartition=1 -MsDosInitiated=0 -UnattendedInstall=Yes +AutoPartition = 1 +MsDosInitiated = 0 +UnattendedInstall = Yes [Unattended] -Repartition=Yes -UnattendMode=FullUnattended -OemSkipEula=Yes -OemPreinstall=No -TargetPath=\WINDOWS -UnattendSwitch=Yes -CrashDumpSetting=1 -DriverSigningPolicy=ignore -WaitForReboot=no +Repartition = Yes +UnattendMode = FullUnattended +OemSkipEula = Yes +OemPreinstall = No +TargetPath = \WINDOWS +UnattendSwitch = Yes +CrashDumpSetting = 1 +DriverSigningPolicy = ignore +OemPnPDriversPath = KVM_TEST_NETWORK_DRIVER_PATH +WaitForReboot = no [GuiUnattended] -AdminPassword=1q2w3eP -EncryptedAdminPassword=NO -TimeZone=85 -OemSkipWelcome=1 -AutoLogon=Yes -AutoLogonCount=1000 -OEMSkipRegional=1 +AdminPassword = 1q2w3eP +EncryptedAdminPassword = NO +TimeZone = 85 +OemSkipWelcome = 1 +AutoLogon = Yes +AutoLogonCount = 1000 +OEMSkipRegional = 1 [UserData] -ProductKey=KVM_TEST_CDKEY -FullName=Autotest Mindless Drone -OrgName=Autotest -ComputerName=* +ProductKey = KVM_TEST_CDKEY +FullName = Autotest Mindless Drone +OrgName = Autotest +ComputerName = * [Identification] -JoinWorkgroup=WORKGROUP +JoinWorkgroup = WORKGROUP [Networking] -InstallDefaultComponents=Yes +InstallDefaultComponents = Yes [Proxy] -Proxy_Enable=0 -Use_Same_Proxy=0 +Proxy_Enable = 0 +Use_Same_Proxy = 0 [Components] -dialer=off -media_clips=off -media_utopia=off -msnexplr=off -netoc=off -OEAccess=off -templates=off -WMAccess=off -zonegames=off +dialer = off +media_clips = off +media_utopia = off +msnexplr = off +netoc = off +OEAccess = off +templates = off +WMAccess = off +zonegames = off [TerminalServices] -AllowConnections=1 +AllowConnections = 1 [WindowsFirewall] -Profiles=WindowsFirewall.TurnOffFirewall +Profiles = WindowsFirewall.TurnOffFirewall [WindowsFirewall.TurnOffFirewall] -Mode=0 +Mode = 0 [Branding] -BrandIEUsingUnattended=Yes +BrandIEUsingUnattended = Yes [Display] -Xresolution=1024 -YResolution=768 +Xresolution = 1024 +YResolution = 768 [GuiRunOnce] - Command0=cmd /c E:\setuprss.bat - Command1=cmd /c netsh interface ip set address local dhcp - Command2=cmd /c A:\finish.exe +Command0 = cmd /c KVM_TEST_VIRTIO_NETWORK_INSTALLER +Command1 = cmd /c E:\setuprss.bat +Command2 = cmd /c netsh interface ip set address local dhcp +Command3 = cmd /c sc config tlntsvr start= auto +Command4 = cmd /c net start telnet +Command5 = cmd /c A:\finish.exe diff --git a/client/tests/kvm/unattended/win2003-32.sif b/client/tests/kvm/unattended/win2003-32.sif index 207cd2b..6e69b5e 100644 --- a/client/tests/kvm/unattended/win2003-32.sif +++ b/client/tests/kvm/unattended/win2003-32.sif @@ -1,66 +1,66 @@ [Data] -AutoPartition = 1 -MsDosInitiated = 0 -UnattendedInstall = Yes +AutoPartition = 1 +MsDosInitiated = 0 +UnattendedInstall = Yes [Unattended] -UnattendMode = FullUnattended -OemSkipEula = Yes -OemPreinstall = No -UnattendSwitch = Yes -CrashDumpSetting = 1 -DriverSigningPolicy = ignore -OemPnPDriversPath=KVM_TEST_NETWORK_DRIVER_PATH -WaitForReboot = no -Repartition = yes +UnattendMode = FullUnattended +OemSkipEula = Yes +OemPreinstall = No +UnattendSwitch = Yes +CrashDumpSetting = 1 +DriverSigningPolicy = ignore +OemPnPDriversPath = KVM_TEST_NETWORK_DRIVER_PATH +WaitForReboot = no +Repartition = yes [GuiUnattended] -AdminPassword = 1q2w3eP -AutoLogon = Yes -AutoLogonCount = 1000 -OEMSkipRegional = 1 -TimeZone = 85 -OemSkipWelcome = 1 +AdminPassword = 1q2w3eP +AutoLogon = Yes +AutoLogonCount = 1000 +OEMSkipRegional = 1 +TimeZone = 85 +OemSkipWelcome = 1 [UserData] -ProductKey=KVM_TEST_CDKEY -FullName=Autotest Mindless Drone -OrgName=Autotest -ComputerName=* +ProductKey = KVM_TEST_CDKEY +FullName = Autotest Mindless Drone +OrgName = Autotest +ComputerName = * [LicenseFilePrintData] -
Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()
On 01.07.2011, at 09:42, Hannes Reinecke wrote: Occasionally, the buffer needs to be placed at a offset within the iovec when copying the buffer to the iovec. So this is a buffer into the iovec, right? Wouldn't it make sense to also modify iov_to_buf respectively then, so the API stays similar? Also, it'd be nice to give the parameter a more obvious name, so potential users can easily recognize what it offsets. Alex Signed-off-by: Hannes Reinecke h...@suse.de --- hw/virtio-net.c|2 +- hw/virtio-serial-bus.c |2 +- iov.c | 23 ++- iov.h |2 +- 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 6997e02..a32cc01 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -657,7 +657,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ /* copy in packet. ugh */ len = iov_from_buf(sg, elem.in_num, - buf + offset, size - offset); + buf + offset, 0, size - offset); total += len; offset += len; /* If buffers can't be merged, at this point we diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 7f6db7b..53c58d0 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -103,7 +103,7 @@ static size_t write_to_port(VirtIOSerialPort *port, } len = iov_from_buf(elem.in_sg, elem.in_num, - buf + offset, size - offset); + buf + offset, 0, size - offset); offset += len; virtqueue_push(vq, elem, len); diff --git a/iov.c b/iov.c index 588cd04..9ead6ee 100644 --- a/iov.c +++ b/iov.c @@ -15,21 +15,26 @@ #include iov.h size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, -const void *buf, size_t size) +const void *buf, size_t offset, size_t size) { -size_t offset; +size_t iov_off, buf_off; unsigned int i; -offset = 0; -for (i = 0; offset size i iovcnt; i++) { -size_t len; +iov_off = 0; +buf_off = 0; +for (i = 0; i iovcnt size; i++) { +if (offset (iov_off + iov[i].iov_len)) { +size_t len = MIN((iov_off + iov[i].iov_len) - offset, size); -len = MIN(iov[i].iov_len, size - offset); +memcpy(iov[i].iov_base + (offset - iov_off), buf + buf_off, len); -memcpy(iov[i].iov_base, buf + offset, len); -offset += len; +buf_off += len; +offset += len; +size -= len; +} +iov_off += iov[i].iov_len; } -return offset; +return buf_off; } size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt, diff --git a/iov.h b/iov.h index 60a8547..2677527 100644 --- a/iov.h +++ b/iov.h @@ -13,7 +13,7 @@ #include qemu-common.h size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, -const void *buf, size_t size); +const void *buf, size_t offset, size_t size); size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt, void *buf, size_t offset, size_t size); size_t iov_size(const struct iovec *iov, const unsigned int iovcnt); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests v2] access: check SMEP on prefetch pte path
On 06/29/2011 06:24 PM, Yang, Wei wrote: + + /* + * Here we write the ro user page when + * cr0.wp=0, then we execute it and SMEP + * fault should happen. + */ + err_prepare_notwp = ac_test_do_access(at1); + if (!err_prepare_notwp) { + printf(%s: SMEP prepare fail\n, __FUNCTION__); + goto clean_up; + } + + at1.flags[AC_ACCESS_WRITE] = 0; + at1.flags[AC_ACCESS_FETCH] = 1; + ac_set_expected_status(at1); + err_smep_notwp = ac_test_do_access(at1); + The address is accessed in the fist test, it is really prefetch-ed in the second test? int ac_test_run(void) @@ -669,16 +765,22 @@ int ac_test_run(void) ac_test_t at; ac_pool_t pool; int i, tests, successes; +extern u64 ptl2[]; printf(run\n); tests = successes = 0; ac_env_int(pool); ac_test_init(at, (void *)(0x1234 + 16 * smp_id())); do { + if (at.flags[AC_CPU_CR4_SMEP] (ptl2[2] 0x4)) + ptl2[2] -= 0x4; It seems you just remove the U/S bit, but forget to recover it, it can break the test if AC_ACCESS_USER !SMEP -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()
On 07/01/2011 09:42 AM, Hannes Reinecke wrote: size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, -const void *buf, size_t size) +const void *buf, size_t offset, size_t size) Wrong commit subject, it seems. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()
On 07/01/2011 10:03 AM, Paolo Bonzini wrote: On 07/01/2011 09:42 AM, Hannes Reinecke wrote: size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, - const void *buf, size_t size) + const void *buf, size_t offset, size_t size) Wrong commit subject, it seems. :) Bummer. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()
On 07/01/2011 10:02 AM, Alexander Graf wrote: On 01.07.2011, at 09:42, Hannes Reinecke wrote: Occasionally, the buffer needs to be placed at a offset within the iovec when copying the buffer to the iovec. So this is a buffer into the iovec, right? Wouldn't it make sense to also modify iov_to_buf respectively then, so the API stays similar? Ahem. That's exactly what the patch does. Except from the mixed-up subject. iov_to_buff() has an offset parameter, iov_from_buf() has not. For no obvious reasons. Also, it'd be nice to give the parameter a more obvious name, so potential users can easily recognize what it offsets. Yes, that sounds reasonable. What about 'iov_off' ? (And possibly rename 'iovcnt' to 'iov_cnt' for consistency ?) Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM test: Render unattended files more properly
Windows2008 install program does not behave well when we specify dummy paths to alternate install drivers, unlike Windows Vista and Windows 7. This is enough motivation to rewrite the unattended install file rendering code, now: 1) XML files will be properly modified using an XML API 2) ini files will be properly modified using ConfigParser 3) kickstart files use a simplified version of the old logic (re.sub). Tested with the guest OS that motivated the patch and of course, other linux and windows guests, everything looks good. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests/unattended_install.py | 191 +- 1 files changed, 129 insertions(+), 62 deletions(-) diff --git a/client/tests/kvm/tests/unattended_install.py b/client/tests/kvm/tests/unattended_install.py index d1c700d..6d6ee07 100644 --- a/client/tests/kvm/tests/unattended_install.py +++ b/client/tests/kvm/tests/unattended_install.py @@ -1,4 +1,5 @@ import logging, time, socket, re, os, shutil, tempfile, glob, ConfigParser +import xml.dom.minidom from autotest_lib.client.common_lib import error from autotest_lib.client.bin import utils from autotest_lib.client.virt import virt_vm, virt_utils @@ -47,8 +48,8 @@ class Disk(object): self.path = None -def setup_answer_file(self, filename, contents): -utils.open_write_close(os.path.join(self.mount, filename), contents) +def get_answer_file_path(self, filename): +return os.path.join(self.mount, filename) def copy_to(self, src): @@ -258,8 +259,7 @@ class UnattendedInstallConfig(object): self.image_path = os.path.dirname(self.kernel) -@error.context_aware -def render_answer_file(self): +def answer_kickstart(self, answer_path): Replace KVM_TEST_CDKEY (in the unattended file) with the cdkey provided for this test and replace the KVM_TEST_MEDIUM with @@ -267,17 +267,12 @@ class UnattendedInstallConfig(object): @return: Answer file contents -error.base_context('Rendering final answer file') -error.context('Reading answer file %s' % self.unattended_file) -unattended_contents = open(self.unattended_file).read() +contents = open(self.unattended_file).read() + dummy_cdkey_re = r'\bKVM_TEST_CDKEY\b' -if re.search(dummy_cdkey_re, unattended_contents): +if re.search(dummy_cdkey_re, contents): if self.cdkey: -unattended_contents = re.sub(dummy_cdkey_re, self.cdkey, - unattended_contents) -else: -print (WARNING: 'cdkey' required but not specified for - this unattended installation) +contents = re.sub(dummy_cdkey_re, self.cdkey, contents) dummy_medium_re = r'\bKVM_TEST_MEDIUM\b' if self.medium == cdrom: @@ -290,67 +285,135 @@ class UnattendedInstallConfig(object): else: raise ValueError(Unexpected installation medium %s % self.url) -unattended_contents = re.sub(dummy_medium_re, content, - unattended_contents) +contents = re.sub(dummy_medium_re, content, contents) -def replace_virtio_key(contents, dummy_re, attribute_name): - -Replace a virtio dummy string with contents. +logging.debug(Unattended install contents:) +for line in contents.splitlines(): +logging.debug(line) -If install_virtio is not set, replace it with a dummy string. +utils.open_write_close(answer_path, contents) -@param contents: Contents of the unattended file -@param dummy_re: Regular expression used to search on the. -unattended file contents. -@param env: Name of the environment variable. - -dummy_path = C: -driver = getattr(self, attribute_name, '') -if re.search(dummy_re, contents): -if self.install_virtio == yes: -if driver.endswith(msi): -driver = 'msiexec /passive /package ' + driver -else: -try: -# Let's escape windows style paths properly -drive, path = driver.split(:) -driver = drive + : + re.escape(path) -except: -pass -contents = re.sub(dummy_re, driver, contents) -else: -contents = re.sub(dummy_re, dummy_path, contents) -return contents - -vdict = {r'\bKVM_TEST_STORAGE_DRIVER_PATH\b': - 'virtio_storage_path', - r'\bKVM_TEST_NETWORK_DRIVER_PATH\b': - 'virtio_network_path', -
Re: [PATCH 1/3] iov: Add 'offset' parameter to iov_to_buf()
On 01.07.2011, at 10:07, Hannes Reinecke wrote: On 07/01/2011 10:02 AM, Alexander Graf wrote: On 01.07.2011, at 09:42, Hannes Reinecke wrote: Occasionally, the buffer needs to be placed at a offset within the iovec when copying the buffer to the iovec. So this is a buffer into the iovec, right? Wouldn't it make sense to also modify iov_to_buf respectively then, so the API stays similar? Ahem. That's exactly what the patch does. Except from the mixed-up subject. iov_to_buff() has an offset parameter, iov_from_buf() has not. For no obvious reasons. Ah, I see. Please state this in your patch description :). Makes it a lot easier to understand the rationale that you're merely moving the from API towards the same parameters as to to one. Also, it'd be nice to give the parameter a more obvious name, so potential users can easily recognize what it offsets. Yes, that sounds reasonable. What about 'iov_off' ? (And possibly rename 'iovcnt' to 'iov_cnt' for consistency ?) Yup, that'd be a lot more readable :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 7/9] KVM-GST: KVM Steal time accounting
On Thu, 2011-06-30 at 23:50 -0300, Glauber Costa wrote: I was under the impression that the proper use of jump labels required each label to be tied to a single location. If we make it inline, the same key would point to multiple locations, and we would have trouble altering all of the locations. I might be wrong, of course. Isn't it the case? Nope, you can have as many patch sites per key as you want. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer
On 07/01/2011 09:42 AM, Hannes Reinecke wrote: 'tag' is just an abstraction to identify the command from the driver. So we should make that explicit by replacing 'tag' with a driver-defined pointer 'hba_private'. This saves the lookup for driver handling several commands in parallel. This makes tracing a bit harder to follow. Perhaps you can keep the transport tag (a uint64_t) in the SCSIRequest for debugging purposes? Signed-off-by: Hannes Reineckeh...@suse.de --- hw/esp.c |2 +- hw/lsi53c895a.c | 17 - hw/scsi-bus.c | 22 +++--- hw/scsi-disk.c|5 ++--- hw/scsi-generic.c |4 ++-- hw/scsi.h |8 hw/spapr_vscsi.c | 41 - hw/usb-msd.c | 10 +- trace-events | 14 +++--- 9 files changed, 52 insertions(+), 71 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 6d3f5d2..912ff89 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid) DPRINTF(do_busid_cmd: busid 0x%x\n, busid); lun = busid 7; -s-current_req = scsi_req_new(s-current_dev, 0, lun); +s-current_req = scsi_req_new(s-current_dev, lun, s); Might as well pass NULL here. The hba_private value is basically unnecessary when the adapter doesn't support tagged command queuing. diff --git a/hw/usb-msd.c b/hw/usb-msd.c index 86582cc..4e2ea03 100644 --- a/hw/usb-msd.c +++ b/hw/usb-msd.c @@ -216,8 +216,8 @@ static void usb_msd_transfer_data(SCSIRequest *req, uint32_t len) MSDState *s = DO_UPCAST(MSDState, dev.qdev, req-bus-qbus.parent); USBPacket *p = s-packet; -if (req-tag != s-tag) { -fprintf(stderr, usb-msd: Unexpected SCSI Tag 0x%x\n, req-tag); +if (req-hba_private != s) { +fprintf(stderr, usb-msd: Unexpected SCSI command 0x%p\n, req); } Same here, just pass NULL and remove these ifs. Otherwise looks like a very good idea. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 7/9] KVM-GST: KVM Steal time accounting
On Thu, 2011-06-30 at 23:53 -0300, Glauber Costa wrote: On 06/30/2011 06:54 PM, Peter Zijlstra wrote: On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote: + if (static_branch(paravirt_steal_enabled)) { How is that going to compile on !CONFIG_PARAVIRT or !x86 in general? Only x86-PARAVIRT will provide that variable. Good point. I'd wrap it into CONFIG_PARAVIRT. To be clear, the reason I did not put it inside CONFIG_PARAVIRT_TIME_ACCOUNTING, is because I wanted to have the mere display of steal time separated from the rest - unless, of course, you object this idea. Using CONFIG_PARAVIRT achieves this goal well. ia64 seems to also have CONFIG_PARAVIRT -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio scsi host draft specification, v3
On 07/01/2011 09:14 AM, Hannes Reinecke wrote: Actually, the kernel does _not_ do a LUN remapping. Not the kernel, the in-kernel target. The in-kernel target can and will map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs (mapped_lun). Put in another way: the virtio-scsi device is itself a SCSI target, Argl. No way. The virtio-scsi device has to map to a single LUN. I think we are talking about different things. By virtio-scsi device I meant the virtio-scsi HBA. When I referred to a LUN as seen by the guest, I was calling it a virtual SCSI device. So yes, we were calling things with different names. Perhaps from now on we can call them virtio-scsi {initiator,target,LUN} and have no ambiguity? I'll also modify the spec in this sense. The SCSI spec itself only deals with LUNs, so anything you'll read in there obviously will only handle the interaction between the initiator (read: host) and the LUN itself. However, the actual command is send via an intermediat target, hence you'll always see the reference to the ITL (initiator-target-lun) nexus. Yes, this I understand. The SCSI spec details discovery of the individual LUNs presented by a given target, it does _NOT_ detail the discovery of the targets themselves. That is being delegated to the underlying transport And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a transport: When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN is sent for LUN 0, the driver should ask the initiator to rescan the target, in order to detect the case when an entire target has appeared or disappeared. [If the device fails] to report an event due to missing buffers, [...] the driver should poll the logical units for unit attention conditions, and/or do whatever form of bus scan is appropriate for the guest operating system. In the case of NPIV it would make sense to map the virtual SCSI host to the backend, so that all devices presented to the virtual SCSI host will be presented to the backend, too. However, when doing so these devices will normally be referenced by their original LUN, as these will be presented to the guest via eg 'REPORT LUNS'. Right. The above thread now tries to figure out if we should remap those LUN numbers or just expose them as they are. If we decide on remapping, we have to emulate _all_ commands referring explicitely to those LUN numbers (persistent reservations, anyone?). But it seems to me that commands referring explicitly to LUN numbers most likely have to be reimplemented anyway for virtualization. I'm thinking exactly of persistent reservations. If two guests on the same host try a persistent reservation, they should conflict with each other. If reservation commands were just passed through, they would be seen as coming from the same initiator (the HBA driver or iSCSI initiator in the host OS). etc. If we don't, we would expose some hardware detail to the guest, but would save us _a lot_ of processing. But can we afford it? And would the architecture allow that at all? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster
On Thu, Jun 30, 2011 at 04:28:14PM -0700, Dan Magenheimer wrote: Hi Dan -- Thanks for the careful review. You're right... some of this was leftover from debugging an off-by-one error, though the code as is still works. OTOH, there's a good chance that much of this sysfs code will disappear before zcache would get promoted out of staging, since it is to help those experimenting with zcache to get more insight into what the underlying compression/accept-reject algorithms are doing. So I hope you (and GregKH) are OK that another version is not necessary at this time to fix these. Off by one errors are kind of insidious. People cut and paste them and they spread. If someone adds a new list of chunks then there are now two examples that are correct and two which have an extra element, so it's 50/50 that he'll copy the right one. Btw, looking at it again, this seems like maybe a similar issue in zbud_evict_zbpg(): 515 /* now try freeing unbuddied pages, starting with least space avail */ 516 for (i = 0; i MAX_CHUNK; i++) { 517 retry_unbud_list_i: MAX_CHUNKS is NCHUNKS - 1. Shouldn't that be i NCHUNKS so that we reach the last element in the list? regards, dan carpenter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
VT-d2 support inside Linux kernel
Hi, Is there any information I can find about VT-d2 support inside Linux kernel? It was marked as WIP on KVM todo list. In current top of tree, I am seeing some ats support for intel-iommu. Does that mean the ATS part is already finished? git log -p -1 93a23a72 commit 93a23a7271dfb811b3adb72779054c3a24433112 Author: Yu Zhao yu.z...@intel.com Date: Mon May 18 13:51:37 2009 +0800 VT-d: support the device IOTLB Enable the device IOTLB (i.e. ATS) for both the bare metal and KVM environments. Signed-off-by: Yu Zhao yu.z...@intel.com Signed-off-by: David Woodhouse david.woodho...@intel.com Thanks, CJ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer
On 07/01/2011 10:27 AM, Paolo Bonzini wrote: On 07/01/2011 09:42 AM, Hannes Reinecke wrote: 'tag' is just an abstraction to identify the command from the driver. So we should make that explicit by replacing 'tag' with a driver-defined pointer 'hba_private'. This saves the lookup for driver handling several commands in parallel. This makes tracing a bit harder to follow. Perhaps you can keep the transport tag (a uint64_t) in the SCSIRequest for debugging purposes? Sure. Anything to get the patches accepted :-) Signed-off-by: Hannes Reineckeh...@suse.de --- hw/esp.c | 2 +- hw/lsi53c895a.c | 17 - hw/scsi-bus.c | 22 +++--- hw/scsi-disk.c | 5 ++--- hw/scsi-generic.c | 4 ++-- hw/scsi.h | 8 hw/spapr_vscsi.c | 41 - hw/usb-msd.c | 10 +- trace-events | 14 +++--- 9 files changed, 52 insertions(+), 71 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 6d3f5d2..912ff89 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid) DPRINTF(do_busid_cmd: busid 0x%x\n, busid); lun = busid 7; - s-current_req = scsi_req_new(s-current_dev, 0, lun); + s-current_req = scsi_req_new(s-current_dev, lun, s); Might as well pass NULL here. The hba_private value is basically unnecessary when the adapter doesn't support tagged command queuing. diff --git a/hw/usb-msd.c b/hw/usb-msd.c index 86582cc..4e2ea03 100644 --- a/hw/usb-msd.c +++ b/hw/usb-msd.c @@ -216,8 +216,8 @@ static void usb_msd_transfer_data(SCSIRequest *req, uint32_t len) MSDState *s = DO_UPCAST(MSDState, dev.qdev, req-bus-qbus.parent); USBPacket *p = s-packet; - if (req-tag != s-tag) { - fprintf(stderr, usb-msd: Unexpected SCSI Tag 0x%x\n, req-tag); + if (req-hba_private != s) { + fprintf(stderr, usb-msd: Unexpected SCSI command 0x%p\n, req); } Same here, just pass NULL and remove these ifs. Otherwise looks like a very good idea. Ok, I'll be resending both. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation
On 01.07.2011, at 09:42, Hannes Reinecke wrote: This patch adds an emulation for the LSI Megaraid SAS 8708EM2 HBA. Signed-off-by: Hannes Reinecke h...@suse.de --- Makefile.objs |1 + default-configs/pci.mak |1 + hw/megasas.c| 1923 +++ hw/mfi.h| 1197 + hw/pci_ids.h|3 +- 5 files changed, 3124 insertions(+), 1 deletions(-) create mode 100644 hw/megasas.c create mode 100644 hw/mfi.h diff --git a/Makefile.objs b/Makefile.objs index cea15e4..6f5d113 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -258,6 +258,7 @@ hw-obj-$(CONFIG_AHCI) += ide/ich.o # SCSI layer hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o +hw-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o hw-obj-$(CONFIG_ESP) += esp.o hw-obj-y += dma-helpers.o sysbus.o isa-bus.o diff --git a/default-configs/pci.mak b/default-configs/pci.mak index 22bd350..fabb56c 100644 --- a/default-configs/pci.mak +++ b/default-configs/pci.mak @@ -9,6 +9,7 @@ CONFIG_EEPRO100_PCI=y CONFIG_PCNET_PCI=y CONFIG_PCNET_COMMON=y CONFIG_LSI_SCSI_PCI=y +CONFIG_MEGASAS_SCSI_PCI=y CONFIG_RTL8139_PCI=y CONFIG_E1000_PCI=y CONFIG_IDE_CORE=y diff --git a/hw/megasas.c b/hw/megasas.c new file mode 100644 index 000..75f9be3 --- /dev/null +++ b/hw/megasas.c @@ -0,0 +1,1923 @@ +/* + * QEMU MegaRAID SAS 8708EM2 Host Bus Adapter emulation + * + * Copyright (c) 2009-2011 Hannes Reinecke, SUSE Labs + * + * This code is licenced under the LGPL. Please take a look at the license header of other LGPL code and just copy it :). + */ + +#include time.h +#include assert.h Are you sure you need to manually include those? + +#include hw.h +#include pci.h +#include dma.h +#include iov.h +#include scsi.h +#include scsi-defs.h +#include block_int.h +#ifdef __linux__ +# include scsi/sg.h Is this really necessary? Device code shouldn't be host dependent IMHO. I also haven't found any user of this in the actual code, so it might be as easy as merely removing the include :). +#endif + +#include mfi.h + +#define DEBUG_MEGASAS +#undef DEBUG_MEGASAS_REG +#undef DEBUG_MEGASAS_QUEUE +#undef DEBUG_MEGASAS_MFI +#undef DEBUG_MEGASAS_IO +#undef DEBUG_MEGASAS_DCMD + +#ifdef DEBUG_MEGASAS +#define DPRINTF(fmt, ...) \ +do { printf(megasas: fmt , ## __VA_ARGS__); } while (0) +#define BADF(fmt, ...) \ +do { fprintf(stderr, megasas: error: fmt , ## __VA_ARGS__); exit(1);} while (0) +#ifdef DEBUG_MEGASAS_REG +#define DPRINTF_REG DPRINTF +#else +#define DPRINTF_REG(fmt, ...) do {} while(0) +#endif +#ifdef DEBUG_MEGASAS_QUEUE +#define DPRINTF_QUEUE DPRINTF +#else +#define DPRINTF_QUEUE(fmt, ...) do {} while(0) +#endif +#ifdef DEBUG_MEGASAS_MFI +#define DPRINTF_MFI DPRINTF +#else +#define DPRINTF_MFI(fmt, ...) do {} while(0) +#endif +#ifdef DEBUG_MEGASAS_IO +#define DPRINTF_IO DPRINTF +#else +#define DPRINTF_IO(fmt, ...) do {} while(0) +#endif +#ifdef DEBUG_MEGASAS_DCMD +#define DPRINTF_DCMD DPRINTF +#else +#define DPRINTF_DCMD(fmt, ...) do {} while(0) +#endif +#else +#define DPRINTF(fmt, ...) do {} while(0) +#define DPRINTF_REG DPRINTF +#define DPRINTF_QUEUE DPRINTF +#define DPRINTF_MFI DPRINTF +#define DPRINTF_IO DPRINTF +#define DPRINTF_DCMD DPRINTF +#define BADF(fmt, ...) \ +do { fprintf(stderr, megasas: error: fmt , ## __VA_ARGS__);} while (0) +#endif + +/* Static definitions */ +#define MEGASAS_VERSION 1.20 +#define MEGASAS_MAX_FRAMES 2048 /* Firmware limit at 65535 */ +#define MEGASAS_DEFAULT_FRAMES 1000 /* Windows requires this */ +#define MEGASAS_MAX_SGE 256 /* Firmware limit */ +#define MEGASAS_DEFAULT_SGE 80 +#define MEGASAS_MAX_SECTORS 0x /* No real limit */ +#define MEGASAS_MAX_ARRAYS 128 + +const char *mfi_frame_desc[] = { +MFI init, LD Read, LD Write, LD SCSI, PD SCSI, +MFI Doorbell, MFI Abort, MFI SMP, MFI Stop}; + +struct megasas_cmd_t { +int index; +int context; +int count; + +target_phys_addr_t pa; +target_phys_addr_t pa_size; +union mfi_frame *frame; +SCSIRequest *req; +struct iovec *iov; +void *iov_buf; +long iov_cnt; +long iov_size; +long iov_offset; Why would anything be a long? It's either target_ulong or uintXX_t for device code usually :). +SCSIDevice *sdev; +struct megasas_state_t *state; +}; + +typedef struct megasas_state_t { +PCIDevice dev; +int mmio_io_addr; +int io_addr; +int queue_addr; +uint32_t frame_hi; + +int fw_state; +uint32_t fw_sge; +uint32_t fw_cmds; +int fw_luns; +int intr_mask; +int doorbell; +int busy; +char *raid_mode_str; +int is_jbod; + +int event_count; +int shutdown_event; +int boot_event; + +uint64_t reply_queue_pa; +void *reply_queue; +int reply_queue_len; +int reply_queue_index; +
[PATCH resend] compat_ioctl: fix warning caused by qemu
On Linux x86_64 host with 32bit userspace, running qemu or even just qemu-img create -f qcow2 some.img 1G causes a kernel warning: ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(5326){t:'S';sz:0} arg(7fff) on some.img ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} arg(fff77350) on some.img ioctl 5326 is CDROM_DRIVE_STATUS, ioctl 801c0204 is FDGETPRM. The warning appears because the Linux compat-ioctl handler for these ioctls only applies to block devices, while qemu also uses the ioctls on plain files. Signed-off-by: Johannes Stezenbach j...@sig21.net --- (resend with Cc: suggested by get_maintainer.pl) discussed in http://lkml.kernel.org/r/20110617090424.ga19...@sig21.net Arnd, is this what you had in mind, or did you mean to move all floppy compat definitions? I decided to go with the minimal change. Tested on both 2.6.39.2 and 3.0-rc5-63-g0d72c6f. diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c index cc3eb78..7b72502 100644 --- a/block/compat_ioctl.c +++ b/block/compat_ioctl.c @@ -208,19 +208,6 @@ static int compat_blkpg_ioctl(struct block_device *bdev, fmode_t mode, #define BLKBSZSET_32 _IOW(0x12, 113, int) #define BLKGETSIZE64_32_IOR(0x12, 114, int) -struct compat_floppy_struct { - compat_uint_t size; - compat_uint_t sect; - compat_uint_t head; - compat_uint_t track; - compat_uint_t stretch; - unsigned char gap; - unsigned char rate; - unsigned char spec1; - unsigned char fmt_gap; - const compat_caddr_t name; -}; - struct compat_floppy_drive_params { charcmos; compat_ulong_t max_dtr; @@ -288,7 +275,6 @@ struct compat_floppy_write_errors { #define FDSETPRM32 _IOW(2, 0x42, struct compat_floppy_struct) #define FDDEFPRM32 _IOW(2, 0x43, struct compat_floppy_struct) -#define FDGETPRM32 _IOR(2, 0x04, struct compat_floppy_struct) #define FDSETDRVPRM32 _IOW(2, 0x90, struct compat_floppy_drive_params) #define FDGETDRVPRM32 _IOR(2, 0x11, struct compat_floppy_drive_params) #define FDGETDRVSTAT32 _IOR(2, 0x12, struct compat_floppy_drive_struct) diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index 61abb63..8be086e 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -68,6 +68,8 @@ #ifdef CONFIG_BLOCK #include linux/loop.h +#include linux/cdrom.h +#include linux/fd.h #include scsi/scsi.h #include scsi/scsi_ioctl.h #include scsi/sg.h @@ -944,6 +946,9 @@ COMPATIBLE_IOCTL(FIOQSIZE) IGNORE_IOCTL(LOOP_CLR_FD) /* md calls this on random blockdevs */ IGNORE_IOCTL(RAID_VERSION) +/* qemu/qemu-img might call these two on plain files for probing */ +IGNORE_IOCTL(CDROM_DRIVE_STATUS) +IGNORE_IOCTL(FDGETPRM32) /* SG stuff */ COMPATIBLE_IOCTL(SG_SET_TIMEOUT) COMPATIBLE_IOCTL(SG_GET_TIMEOUT) diff --git a/include/linux/fd.h b/include/linux/fd.h index f5d194a..c6a68d0 100644 --- a/include/linux/fd.h +++ b/include/linux/fd.h @@ -3,6 +3,7 @@ #include linux/ioctl.h #include linux/compiler.h +#include linux/compat.h /* New file layout: Now the ioctl definitions immediately follow the * definitions of the structures that they use */ @@ -377,4 +378,21 @@ struct floppy_raw_cmd { #define FDEJECT _IO(2, 0x5a) /* eject the disk */ +#ifdef CONFIG_COMPAT +struct compat_floppy_struct { + compat_uint_t size; + compat_uint_t sect; + compat_uint_t head; + compat_uint_t track; + compat_uint_t stretch; + unsigned char gap; + unsigned char rate; + unsigned char spec1; + unsigned char fmt_gap; + const compat_caddr_t name; +}; + +#define FDGETPRM32 _IOR(2, 0x04, struct compat_floppy_struct) +#endif + #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/17] KVM: PPC: Fix machine checks on 32-bit Book3S
On 29.06.2011, at 12:16, Paul Mackerras wrote: Commit 69acc0d3ba (KVM: PPC: Resolve real-mode handlers through function exports) resulted in vcpu-arch.trampoline_lowmem and vcpu-arch.trampoline_enter ending up with kernel virtual addresses rather than physical addresses. This is OK on 64-bit Book3S machines, which ignore the top 4 bits of the effective address in real mode, but on 32-bit Book3S machines, accessing these addresses in real mode causes machine check interrupts, as the hardware uses the whole effective address as the physical address in real mode. This fixes the problem by using __pa() to convert these addresses to physical addresses. Ouch. Thanks for the catch! I really need to include book3s_32 in my automated testing :(. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate
On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote: On 06/29/2011 12:41 PM, Paul Mackerras wrote: +struct kvm_ppc_set_platform { +__u16 platform; /* defines the OS/hypervisor ABI */ +__u16 guest_arch; /* e.g. decimal 206 for v2.06 */ +__u32 flags; Please add some padding so we can extend it later if necessary. +}; + +/* Values for platform */ +#define KVM_PPC_PV_NONE 0 /* bare-metal, non-paravirtualized */ +#define KVM_PPC_PV_KVM 1 /* as defined in kvm_para.h */ +#define KVM_PPC_PV_SPAPR2 /* IBM Server PAPR (a la PowerVM) */ We also support BookE which would be useful to also include in the list. Furthermore, KVM is more of a feature flag than a platform. We can easily support KVM extensions on an SPAPR platform, no? Yes, I guess so. The hypercall sequence will have to be different, since ordinary system call interrupts go straight to the guest. But I guess you've allowed for that with the hypercall sequence property in the device tree. This whole interface also could deprecate the PVR setting one, so we can simply include PVR as well and not require kernel space to jump through hoops to figure out its capabilities. I debated about whether to include a PVR value in this structure. The thing is that POWER7 has the Processor Compatibility Register (PCR), which has a bit which makes the processor behave in user mode as if it were a POWER6. So, we could run a book3s_hv guest in POWER6 mode by setting this bit (which we might want to do to run older distros). However, this bit doesn't affect the PVR value that the guest sees. That's why I went for an architecture level rather than a specific PVR value. We could go with a PVR value and use the logical PVR values defined in PAPR to represent architecture levels, e.g. 0x0f02 for architecture v2.05 (POWER6). And we need to identify 32-bit BookS processors, so we can go into 32-bit mode when necessary. That should also be a different guest_arch, right? Right. If we go with a PVR value then we just use the PVR value for a suitable 32-bit processor. + +/* Values for flags */ +#define KVM_PPC_CROSS_ARCH 1 /* guest architecture != host */ User space shouldn't have to worry about this one. It's up to the kernel to decide that it's cross. I put that in because we might want to force the use of book3s_pr, for example if we know we're going to want to do emulated MMIO or something else that isn't implemented in book3s_hv just yet. Ultimately, yes, the kernel should be able to decide whether it's cross or not. However, I don't think we should make it completely opaque to userspace as to whether the kernel is using _pr or _hv. If nothing else, userspace should be able to find out and tell the user so that performance expectations can be set correctly. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate
On 01.07.2011, at 12:09, Paul Mackerras wrote: On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote: On 06/29/2011 12:41 PM, Paul Mackerras wrote: +struct kvm_ppc_set_platform { + __u16 platform; /* defines the OS/hypervisor ABI */ + __u16 guest_arch; /* e.g. decimal 206 for v2.06 */ + __u32 flags; Please add some padding so we can extend it later if necessary. +}; + +/* Values for platform */ +#define KVM_PPC_PV_NONE0 /* bare-metal, non-paravirtualized */ +#define KVM_PPC_PV_KVM 1 /* as defined in kvm_para.h */ +#define KVM_PPC_PV_SPAPR 2 /* IBM Server PAPR (a la PowerVM) */ We also support BookE which would be useful to also include in the list. Furthermore, KVM is more of a feature flag than a platform. We can easily support KVM extensions on an SPAPR platform, no? Yes, I guess so. The hypercall sequence will have to be different, since ordinary system call interrupts go straight to the guest. But I guess you've allowed for that with the hypercall sequence property in the device tree. This whole interface also could deprecate the PVR setting one, so we can simply include PVR as well and not require kernel space to jump through hoops to figure out its capabilities. I debated about whether to include a PVR value in this structure. The thing is that POWER7 has the Processor Compatibility Register (PCR), which has a bit which makes the processor behave in user mode as if it were a POWER6. So, we could run a book3s_hv guest in POWER6 mode by setting this bit (which we might want to do to run older distros). However, this bit doesn't affect the PVR value that the guest sees. That's why I went for an architecture level rather than a specific PVR value. We could go with a PVR value and use the logical PVR values defined in PAPR to represent architecture levels, e.g. 0x0f02 for architecture v2.05 (POWER6). IIUC the PVR values are somewhat standardized to contain major and minor revision numbers. Can't we just mask out the minor ones and match for known good systems? And we need to identify 32-bit BookS processors, so we can go into 32-bit mode when necessary. That should also be a different guest_arch, right? Right. If we go with a PVR value then we just use the PVR value for a suitable 32-bit processor. Well, we need to have some way of mapping PVR to arch then. KVM easily supports -cpu G3 and G4. We might also want to have some information on feature flags, such as Altivec or SPE mode available. Or paired singles :). I'm not sure I want to have all that mapping information inside the kernel. So what we could do is we just provide as much information as we can from user space, including PVR, architecture (2.01 for example), features (32/64-bit, booke/books, fpu, altivec, spe, ...). + +/* Values for flags */ +#define KVM_PPC_CROSS_ARCH 1 /* guest architecture != host */ User space shouldn't have to worry about this one. It's up to the kernel to decide that it's cross. I put that in because we might want to force the use of book3s_pr, for example if we know we're going to want to do emulated MMIO or something else that isn't implemented in book3s_hv just yet. Ah, I see. Well, we could just add a flag to the feature list saying MMIO. If that's impossible to satisfy (HV only), fail the call. Otherwise switch to _pr mode. Later when _hv might be able to support MMIO, we can use it without changing user space. Ultimately, yes, the kernel should be able to decide whether it's cross or not. However, I don't think we should make it completely opaque to userspace as to whether the kernel is using _pr or _hv. If nothing else, userspace should be able to find out and tell the user so that performance expectations can be set correctly. Hrm. Sure, but the decision should be done in kernel land based on all information required to actually make it. And the kernel has more information regarding the system it's running on, so that's the place to actually do the decision. Bubbling it up to user space again is certainly fine by me :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 12/31] kvm tools: Add UDP support for uip
* Asias He asias.he...@gmail.com wrote: +static void *uip_udp_socket_thread(void *p) +{ + struct epoll_event events[UIP_UDP_MAX_EVENTS]; + struct uip_udp_socket *sk; + struct uip_info *info; + struct uip_eth *eth2; + struct uip_udp *udp2; + struct uip_buf *buf; + struct uip_ip *ip2; + u8 *payload; + int nfds; + int ret; + int i; + + info = p; + + do { + payload = malloc(UIP_MAX_UDP_PAYLOAD); + } while (!payload); + + while (1) { + nfds = epoll_wait(info-udp_epollfd, events, UIP_UDP_MAX_EVENTS, -1); + + if (nfds == -1) + continue; + + for (i = 0; i nfds; i++) { + + sk = events[i].data.ptr; + ret = recvfrom(sk-fd, payload, UIP_MAX_UDP_PAYLOAD, 0, NULL, NULL); + if (ret 0) + continue; + + /* + * Get free buffer to send data to guest + */ + buf = uip_buf_get_free(info); + + /* + * Cook a ethernet frame + */ + udp2= (struct uip_udp *)(buf-eth); + eth2= (struct uip_eth *)buf-eth; + ip2 = (struct uip_ip *)(buf-eth); + + eth2-src = info-host_mac; + eth2-dst = info-guest_mac; + eth2-type = htons(UIP_ETH_P_IP); + + ip2-vhl= UIP_IP_VER_4 | UIP_IP_HDR_LEN; + ip2-tos= 0; + ip2-id = 0; + ip2-flgfrag= 0; + ip2-ttl= UIP_IP_TTL; + ip2-proto = UIP_IP_P_UDP; + ip2-csum = 0; + ip2-sip= sk-dip; + ip2-dip= sk-sip; + + udp2-sport = sk-dport; + udp2-dport = sk-sport; + udp2-len = htons(ret + uip_udp_hdrlen(udp2)); + udp2-csum = 0; + + memcpy(udp2-payload, payload, ret); + + ip2-len= udp2-len + htons(uip_ip_hdrlen(ip2)); + ip2-csum = uip_csum_ip(ip2); + udp2-csum = uip_csum_udp(udp2); + + /* + * virtio_net_hdr + */ + buf-vnet_len = sizeof(struct virtio_net_hdr); + memset(buf-vnet, 0, buf-vnet_len); + + buf-eth_len= ntohs(ip2-len) + uip_eth_hdrlen(ip2-eth); + + /* + * Send data received from socket to guest + */ + uip_buf_set_used(info, buf); + } + } + + free(payload); + pthread_exit(NULL); + return NULL; +} This function is way too large, please split out the meat of it into a separate helper inline. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/17] KVM: PPC: Deliver program interrupts right away instead of queueing them
On 29.06.2011, at 12:18, Paul Mackerras wrote: Doing so means that we don't have to save the flags anywhere and gets rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c. Doing so is OK because a program interrupt won't be generated at the same time as any other synchronous interrupt. If a program interrupt and an asynchronous interrupt (external or decrementer) are generated at the same time, the program interrupt will be delivered, which is correct because it has a higher priority, and then the asynchronous interrupt will be masked. We don't ever generate system reset or machine check interrupts to the guest, but if we did, then we would need to make sure they got delivered rather than the program interrupt. The current code would be wrong in this situation anyway since it would deliver the program interrupt as well as the reset/machine check interrupt. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s.c |8 +++- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 163e3e1..f68a34d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -129,8 +129,8 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { - to_book3s(vcpu)-prog_flags = flags; Now that prog_flags is unused, please remove it from the headers. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/31] Implement user mode network for kvm tools
* Asias He asias.he...@gmail.com wrote: Usermode TCP/IP can be quite cumbersome for users as things like ping and ip6 won't work properly. Yes, usermode TCP/IP do have limits. But it's more cumbersome for user to setup bridge/nat thing with privileged networking. The network setup is a headache for some users. That group of 'some users' includes me for example. The thing is, when i test an existing distro image there's better things to do with my time than to figure out that year's preferred method of configuring the network and troubleshooting it if it goes wrong. So having zero-config networking (assuming we grow some DHCP capability as well) would be a real plus. This patchset implements things like 'qemu -net user' without the slirp. I just took at a look the LOC in qemu and uip. qemu.git$ cat slirp/*.{c,h} net/slirp.{c,h}| wc -l 11514 kernel.git/tools/kvm$ cat uip/*.{c,h} include/kvm/uip.h | wc -l 1312 That's pretty impressive (if it does not come at the expensive of features that Qemu's slirp code has) - and the thing is that we don't actually have to implement the vast majority of TCP-IP features, because the transport between the guest and the host is obviously reliable. This patch-set turned out to be a *lot* more simple than i first thought it would end up. Simpler also means potentially faster and potentially more secure. ( The lack of ipv6 is not something we should worry about too much, ipv4 should scale up to a couple of hundred thousand virtual machines per box, right? ) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer
On 07/01/2011 10:27 AM, Paolo Bonzini wrote: On 07/01/2011 09:42 AM, Hannes Reinecke wrote: 'tag' is just an abstraction to identify the command from the driver. So we should make that explicit by replacing 'tag' with a driver-defined pointer 'hba_private'. This saves the lookup for driver handling several commands in parallel. This makes tracing a bit harder to follow. Perhaps you can keep the transport tag (a uint64_t) in the SCSIRequest for debugging purposes? Hmm. The transport tag wouldn't have any meaning outside scsi-bus.c. And it's a 64-bit value. So why can't we use the hba_private pointer directly here? After some I/O has been ongoing the linear 'tag' number becomes unreadable very quickly, so there's not much difference here ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/31] Implement user mode network for kvm tools
On 01.07.2011, at 13:53, Ingo Molnar wrote: * Asias He asias.he...@gmail.com wrote: Usermode TCP/IP can be quite cumbersome for users as things like ping and ip6 won't work properly. Yes, usermode TCP/IP do have limits. But it's more cumbersome for user to setup bridge/nat thing with privileged networking. The network setup is a headache for some users. That group of 'some users' includes me for example. The thing is, when i test an existing distro image there's better things to do with my time than to figure out that year's preferred method of configuring the network and troubleshooting it if it goes wrong. So having zero-config networking (assuming we grow some DHCP capability as well) would be a real plus. This patchset implements things like 'qemu -net user' without the slirp. I just took at a look the LOC in qemu and uip. qemu.git$ cat slirp/*.{c,h} net/slirp.{c,h}| wc -l 11514 kernel.git/tools/kvm$ cat uip/*.{c,h} include/kvm/uip.h | wc -l 1312 That's pretty impressive (if it does not come at the expensive of features that Qemu's slirp code has) - and the thing is that we don't actually have to implement the vast majority of TCP-IP features, because the transport between the guest and the host is obviously reliable. I don't see how it would. Once you overrun device buffers, you have to do something. Either you drop packets or you stall the guest. I'd usually prefer the former :). This patch-set turned out to be a *lot* more simple than i first thought it would end up. Simpler also means potentially faster and potentially more secure. ( The lack of ipv6 is not something we should worry about too much, ipv4 should scale up to a couple of hundred thousand virtual machines per box, right? ) Well, if the system you're trying to connect to supports ipv4, sure. If it doesn't, tough luck :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster
From: Dan Carpenter [mailto:erro...@gmail.com] On Thu, Jun 30, 2011 at 04:28:14PM -0700, Dan Magenheimer wrote: Hi Dan -- Thanks for the careful review. You're right... some of this was leftover from debugging an off-by-one error, though the code as is still works. OTOH, there's a good chance that much of this sysfs code will disappear before zcache would get promoted out of staging, since it is to help those experimenting with zcache to get more insight into what the underlying compression/accept-reject algorithms are doing. So I hope you (and GregKH) are OK that another version is not necessary at this time to fix these. Off by one errors are kind of insidious. People cut and paste them and they spread. If someone adds a new list of chunks then there are now two examples that are correct and two which have an extra element, so it's 50/50 that he'll copy the right one. True, but these are NOT off-by-one errors... they are correct-but-slightly-ugly code snippets. (To clarify, I said the *ugliness* arose when debugging an off-by-one error.) Patches always welcome, and I agree that these should be fixed eventually, assuming the code doesn't go away completely first.. I'm simply stating the position that going through another test/submit cycling to fix correct-but-slightly-ugly code which is present only to surface information for experiments is not high on my priority list right now... unless GregKH says he won't accept the patch. Btw, looking at it again, this seems like maybe a similar issue in zbud_evict_zbpg(): 516 for (i = 0; i MAX_CHUNK; i++) { 517 retry_unbud_list_i: MAX_CHUNKS is NCHUNKS - 1. Shouldn't that be i NCHUNKS so that we reach the last element in the list? No, the last element in that list is unused. There is a comment to that effect someplace in the code. (These lists are keeping track of pages with chunks of available space and the last entry would have no available space so is always empty.) Thanks again for your interest... are you using zcache? Dan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] scsi: replace 'tag' with 'hba_private' pointer
On 07/01/2011 03:11 PM, Hannes Reinecke wrote: On 07/01/2011 10:27 AM, Paolo Bonzini wrote: On 07/01/2011 09:42 AM, Hannes Reinecke wrote: 'tag' is just an abstraction to identify the command from the driver. So we should make that explicit by replacing 'tag' with a driver-defined pointer 'hba_private'. This saves the lookup for driver handling several commands in parallel. This makes tracing a bit harder to follow. Perhaps you can keep the transport tag (a uint64_t) in the SCSIRequest for debugging purposes? Hmm. The transport tag wouldn't have any meaning outside scsi-bus.c. It depends, in vmw_pvscsi I take it from a field in the request block that is 0..255. So either you have a small tag that is recycled but stays nice, or a large tag that is unwieldy but should not be recycled ever. A pointer is unwieldy _and_ is recycled, so it gives the worse of both worlds. But I'm not very attached to this, I may even do it myself if/when I find the need. Won't ack yet because of the nit with ESP/USB, but even if you do not bother I will ack the next respin. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH resend] compat_ioctl: fix warning caused by qemu
On Friday 01 July 2011, Johannes Stezenbach wrote: On Linux x86_64 host with 32bit userspace, running qemu or even just qemu-img create -f qcow2 some.img 1G causes a kernel warning: ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(5326){t:'S';sz:0} arg(7fff) on some.img ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} arg(fff77350) on some.img ioctl 5326 is CDROM_DRIVE_STATUS, ioctl 801c0204 is FDGETPRM. The warning appears because the Linux compat-ioctl handler for these ioctls only applies to block devices, while qemu also uses the ioctls on plain files. Signed-off-by: Johannes Stezenbach j...@sig21.net Acked-by: Arnd Bergmann a...@arndb.de --- (resend with Cc: suggested by get_maintainer.pl) discussed in http://lkml.kernel.org/r/20110617090424.ga19...@sig21.net Arnd, is this what you had in mind, or did you mean to move all floppy compat definitions? I decided to go with the minimal change. Tested on both 2.6.39.2 and 3.0-rc5-63-g0d72c6f. Yes, that should be fine, unless Jens would like to see a different solution for the struct definitions, e.g. moving all of the floppy compat ioctl numbers to fd.h. I'm fine with it either way. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 12/31] kvm tools: Add UDP support for uip
On 07/01/2011 07:46 PM, Ingo Molnar wrote: * Asias He asias.he...@gmail.com wrote: +static void *uip_udp_socket_thread(void *p) +{ +struct epoll_event events[UIP_UDP_MAX_EVENTS]; +struct uip_udp_socket *sk; +struct uip_info *info; +struct uip_eth *eth2; +struct uip_udp *udp2; +struct uip_buf *buf; +struct uip_ip *ip2; +u8 *payload; +int nfds; +int ret; +int i; + +info = p; + +do { +payload = malloc(UIP_MAX_UDP_PAYLOAD); +} while (!payload); + +while (1) { +nfds = epoll_wait(info-udp_epollfd, events, UIP_UDP_MAX_EVENTS, -1); + +if (nfds == -1) +continue; + +for (i = 0; i nfds; i++) { + +sk = events[i].data.ptr; +ret = recvfrom(sk-fd, payload, UIP_MAX_UDP_PAYLOAD, 0, NULL, NULL); +if (ret 0) +continue; + +/* + * Get free buffer to send data to guest + */ +buf = uip_buf_get_free(info); + +/* + * Cook a ethernet frame + */ +udp2= (struct uip_udp *)(buf-eth); +eth2= (struct uip_eth *)buf-eth; +ip2 = (struct uip_ip *)(buf-eth); + +eth2-src = info-host_mac; +eth2-dst = info-guest_mac; +eth2-type = htons(UIP_ETH_P_IP); + +ip2-vhl= UIP_IP_VER_4 | UIP_IP_HDR_LEN; +ip2-tos= 0; +ip2-id = 0; +ip2-flgfrag= 0; +ip2-ttl= UIP_IP_TTL; +ip2-proto = UIP_IP_P_UDP; +ip2-csum = 0; +ip2-sip= sk-dip; +ip2-dip= sk-sip; + +udp2-sport = sk-dport; +udp2-dport = sk-sport; +udp2-len = htons(ret + uip_udp_hdrlen(udp2)); +udp2-csum = 0; + +memcpy(udp2-payload, payload, ret); + +ip2-len= udp2-len + htons(uip_ip_hdrlen(ip2)); +ip2-csum = uip_csum_ip(ip2); +udp2-csum = uip_csum_udp(udp2); + +/* + * virtio_net_hdr + */ +buf-vnet_len = sizeof(struct virtio_net_hdr); +memset(buf-vnet, 0, buf-vnet_len); + +buf-eth_len= ntohs(ip2-len) + uip_eth_hdrlen(ip2-eth); + +/* + * Send data received from socket to guest + */ +uip_buf_set_used(info, buf); +} +} + +free(payload); +pthread_exit(NULL); +return NULL; +} This function is way too large, please split out the meat of it into a separate helper inline. Will do. Thanks. -- Best Regards, Asias He -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] [v5] Megasas HBA Emulation
Hi all, after getting various feedback from Paolo, Stefan, and Alexander I've respun the patches. Chances since the previous version: - iov: Update parameter usage in iov_(to|from)_buf() Updated description for the first patch and clarified the usage Renamed arguments for io_XXX for clarification - scsi: Add 'hba_private' to SCSIRequest Kept 'tag' for tracing and just add 'hba_private' as an additional field as per request from Paolo - megasas: checkpatch.pl fixes and update to work with the changed interface in scsi_req_new(). Also included the suggested fixes from Alex. Hannes Reinecke (3): iov: Update parameter usage in iov_(to|from)_buf() scsi: Add 'hba_private' to SCSIRequest megasas: LSI Megaraid SAS emulation Makefile.objs |1 + default-configs/pci.mak |1 + hw/esp.c|2 +- hw/lsi53c895a.c | 22 +- hw/megasas.c| 1920 +++ hw/mfi.h| 1197 + hw/pci_ids.h|3 +- hw/scsi-bus.c |9 +- hw/scsi-disk.c |4 +- hw/scsi-generic.c |5 +- hw/scsi.h | 10 +- hw/spapr_vscsi.c| 29 +- hw/usb-msd.c|9 +- hw/virtio-net.c |2 +- hw/virtio-serial-bus.c |2 +- iov.c | 49 +- iov.h | 10 +- 17 files changed, 3192 insertions(+), 83 deletions(-) create mode 100644 hw/megasas.c create mode 100644 hw/mfi.h -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] iov: Update parameter usage in iov_(to|from)_buf()
iov_to_buf() has an 'offset' parameter, iov_from_buf() hasn't. This patch adds the missing parameter to iov_from_buf(). It also renames the 'offset' parameter to 'iov_off' to emphasize it's the offset into the iovec and not the buffer. Signed-off-by: Hannes Reinecke h...@suse.de --- hw/virtio-net.c|2 +- hw/virtio-serial-bus.c |2 +- iov.c | 49 ++- iov.h | 10 4 files changed, 34 insertions(+), 29 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 6997e02..a32cc01 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -657,7 +657,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ /* copy in packet. ugh */ len = iov_from_buf(sg, elem.in_num, - buf + offset, size - offset); + buf + offset, 0, size - offset); total += len; offset += len; /* If buffers can't be merged, at this point we diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 7f6db7b..53c58d0 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -103,7 +103,7 @@ static size_t write_to_port(VirtIOSerialPort *port, } len = iov_from_buf(elem.in_sg, elem.in_num, - buf + offset, size - offset); + buf + offset, 0, size - offset); offset += len; virtqueue_push(vq, elem, len); diff --git a/iov.c b/iov.c index 588cd04..1e02791 100644 --- a/iov.c +++ b/iov.c @@ -14,56 +14,61 @@ #include iov.h -size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, -const void *buf, size_t size) +size_t iov_from_buf(struct iovec *iov, unsigned int iov_cnt, +const void *buf, size_t iov_off, size_t size) { -size_t offset; +size_t iovec_off, buf_off; unsigned int i; -offset = 0; -for (i = 0; offset size i iovcnt; i++) { -size_t len; +iovec_off = 0; +buf_off = 0; +for (i = 0; i iov_cnt size; i++) { +if (iov_off (iovec_off + iov[i].iov_len)) { +size_t len = MIN((iovec_off + iov[i].iov_len) - iov_off, size); -len = MIN(iov[i].iov_len, size - offset); +memcpy(iov[i].iov_base + (iov_off - iovec_off), buf + buf_off, len); -memcpy(iov[i].iov_base, buf + offset, len); -offset += len; +buf_off += len; +iov_off += len; +size -= len; +} +iovec_off += iov[i].iov_len; } -return offset; +return buf_off; } -size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt, - void *buf, size_t offset, size_t size) +size_t iov_to_buf(const struct iovec *iov, const unsigned int iov_cnt, + void *buf, size_t iov_off, size_t size) { uint8_t *ptr; -size_t iov_off, buf_off; +size_t iovec_off, buf_off; unsigned int i; ptr = buf; -iov_off = 0; +iovec_off = 0; buf_off = 0; -for (i = 0; i iovcnt size; i++) { -if (offset (iov_off + iov[i].iov_len)) { -size_t len = MIN((iov_off + iov[i].iov_len) - offset , size); +for (i = 0; i iov_cnt size; i++) { +if (iov_off (iovec_off + iov[i].iov_len)) { +size_t len = MIN((iovec_off + iov[i].iov_len) - iov_off , size); -memcpy(ptr + buf_off, iov[i].iov_base + (offset - iov_off), len); +memcpy(ptr + buf_off, iov[i].iov_base + (iov_off - iovec_off), len); buf_off += len; -offset += len; +iov_off += len; size -= len; } -iov_off += iov[i].iov_len; +iovec_off += iov[i].iov_len; } return buf_off; } -size_t iov_size(const struct iovec *iov, const unsigned int iovcnt) +size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt) { size_t len; unsigned int i; len = 0; -for (i = 0; i iovcnt; i++) { +for (i = 0; i iov_cnt; i++) { len += iov[i].iov_len; } return len; diff --git a/iov.h b/iov.h index 60a8547..110f67a 100644 --- a/iov.h +++ b/iov.h @@ -12,8 +12,8 @@ #include qemu-common.h -size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, -const void *buf, size_t size); -size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt, - void *buf, size_t offset, size_t size); -size_t iov_size(const struct iovec *iov, const unsigned int iovcnt); +size_t iov_from_buf(struct iovec *iov, unsigned int iov_cnt, +const void *buf, size_t iov_off, size_t size); +size_t iov_to_buf(const struct iovec *iov, const unsigned int iov_cnt, + void *buf, size_t iov_off, size_t size); +size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt); -- 1.7.3.4 -- To unsubscribe from this list: send the line
[PATCH 2/3] scsi: Add 'hba_private' to SCSIRequest
'tag' is just an abstraction to identify the command from the driver. So we should make that explicit by replacing 'tag' with a driver-defined pointer 'hba_private'. This saves the lookup for driver handling several commands in parallel. 'tag' is still being kept for tracing purposes. Signed-off-by: Hannes Reinecke h...@suse.de --- hw/esp.c |2 +- hw/lsi53c895a.c | 22 -- hw/scsi-bus.c |9 ++--- hw/scsi-disk.c|4 ++-- hw/scsi-generic.c |5 +++-- hw/scsi.h | 10 +++--- hw/spapr_vscsi.c | 29 + hw/usb-msd.c |9 + 8 files changed, 37 insertions(+), 53 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 6d3f5d2..aa87197 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid) DPRINTF(do_busid_cmd: busid 0x%x\n, busid); lun = busid 7; -s-current_req = scsi_req_new(s-current_dev, 0, lun); +s-current_req = scsi_req_new(s-current_dev, 0, lun, NULL); datalen = scsi_req_enqueue(s-current_req, buf); s-ti_size = datalen; if (datalen != 0) { diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 940b43a..69eec1d 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -661,7 +661,7 @@ static lsi_request *lsi_find_by_tag(LSIState *s, uint32_t tag) static void lsi_request_cancelled(SCSIRequest *req) { LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent); -lsi_request *p; +lsi_request *p = req-hba_private; if (s-current req == s-current-req) { scsi_req_unref(req); @@ -670,7 +670,6 @@ static void lsi_request_cancelled(SCSIRequest *req) return; } -p = lsi_find_by_tag(s, req-tag); if (p) { QTAILQ_REMOVE(s-queue, p, next); scsi_req_unref(req); @@ -680,18 +679,12 @@ static void lsi_request_cancelled(SCSIRequest *req) /* Record that data is available for a queued command. Returns zero if the device was reselected, nonzero if the IO is deferred. */ -static int lsi_queue_tag(LSIState *s, uint32_t tag, uint32_t len) +static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len) { -lsi_request *p; - -p = lsi_find_by_tag(s, tag); -if (!p) { -BADF(IO with unknown tag %d\n, tag); -return 1; -} +lsi_request *p = req-hba_private; if (p-pending) { -BADF(Multiple IO pending for tag %d\n, tag); +BADF(Multiple IO pending for request %p\n, p); } p-pending = len; /* Reselect if waiting for it, or if reselection triggers an IRQ @@ -743,9 +736,9 @@ static void lsi_transfer_data(SCSIRequest *req, uint32_t len) LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent); int out; -if (s-waiting == 1 || !s-current || req-tag != s-current-tag || +if (s-waiting == 1 || !s-current || req-hba_private != s-current || (lsi_irq_on_rsl(s) !(s-scntl1 LSI_SCNTL1_CON))) { -if (lsi_queue_tag(s, req-tag, len)) { +if (lsi_queue_req(s, req, len)) { return; } } @@ -789,7 +782,8 @@ static void lsi_do_command(LSIState *s) assert(s-current == NULL); s-current = qemu_mallocz(sizeof(lsi_request)); s-current-tag = s-select_tag; -s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun); +s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun, + s-current); n = scsi_req_enqueue(s-current-req, buf); if (n) { diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index ad6a730..8b1a412 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -131,7 +131,8 @@ int scsi_bus_legacy_handle_cmdline(SCSIBus *bus) return res; } -SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t lun) +SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, +uint32_t lun, void *hba_private) { SCSIRequest *req; @@ -141,14 +142,16 @@ SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t l req-dev = d; req-tag = tag; req-lun = lun; +req-hba_private = hba_private; req-status = -1; trace_scsi_req_alloc(req-dev-id, req-lun, req-tag); return req; } -SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun) +SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, + void *hba_private) { -return d-info-alloc_req(d, tag, lun); +return d-info-alloc_req(d, tag, lun, hba_private); } uint8_t *scsi_req_get_buf(SCSIRequest *req) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index a8c7372..c2a99fe 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -81,13 +81,13 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, int type); static int scsi_disk_emulate_command(SCSIDiskReq *r, uint8_t *outbuf); static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t
Re: [PATCH 2/3] scsi: Add 'hba_private' to SCSIRequest
On 07/01/2011 05:35 PM, Hannes Reinecke wrote: 'tag' is just an abstraction to identify the command from the driver. So we should make that explicit by replacing 'tag' with a driver-defined pointer 'hba_private'. This saves the lookup for driver handling several commands in parallel. 'tag' is still being kept for tracing purposes. Signed-off-by: Hannes Reineckeh...@suse.de --- hw/esp.c |2 +- hw/lsi53c895a.c | 22 -- hw/scsi-bus.c |9 ++--- hw/scsi-disk.c|4 ++-- hw/scsi-generic.c |5 +++-- hw/scsi.h | 10 +++--- hw/spapr_vscsi.c | 29 + hw/usb-msd.c |9 + 8 files changed, 37 insertions(+), 53 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 6d3f5d2..aa87197 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -244,7 +244,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid) DPRINTF(do_busid_cmd: busid 0x%x\n, busid); lun = busid 7; -s-current_req = scsi_req_new(s-current_dev, 0, lun); +s-current_req = scsi_req_new(s-current_dev, 0, lun, NULL); datalen = scsi_req_enqueue(s-current_req, buf); s-ti_size = datalen; if (datalen != 0) { diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 940b43a..69eec1d 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -661,7 +661,7 @@ static lsi_request *lsi_find_by_tag(LSIState *s, uint32_t tag) static void lsi_request_cancelled(SCSIRequest *req) { LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent); -lsi_request *p; +lsi_request *p = req-hba_private; if (s-current req == s-current-req) { scsi_req_unref(req); @@ -670,7 +670,6 @@ static void lsi_request_cancelled(SCSIRequest *req) return; } -p = lsi_find_by_tag(s, req-tag); if (p) { QTAILQ_REMOVE(s-queue, p, next); scsi_req_unref(req); @@ -680,18 +679,12 @@ static void lsi_request_cancelled(SCSIRequest *req) /* Record that data is available for a queued command. Returns zero if the device was reselected, nonzero if the IO is deferred. */ -static int lsi_queue_tag(LSIState *s, uint32_t tag, uint32_t len) +static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len) { -lsi_request *p; - -p = lsi_find_by_tag(s, tag); -if (!p) { -BADF(IO with unknown tag %d\n, tag); -return 1; -} +lsi_request *p = req-hba_private; if (p-pending) { -BADF(Multiple IO pending for tag %d\n, tag); +BADF(Multiple IO pending for request %p\n, p); } p-pending = len; /* Reselect if waiting for it, or if reselection triggers an IRQ @@ -743,9 +736,9 @@ static void lsi_transfer_data(SCSIRequest *req, uint32_t len) LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent); int out; -if (s-waiting == 1 || !s-current || req-tag != s-current-tag || +if (s-waiting == 1 || !s-current || req-hba_private != s-current || (lsi_irq_on_rsl(s) !(s-scntl1 LSI_SCNTL1_CON))) { -if (lsi_queue_tag(s, req-tag, len)) { +if (lsi_queue_req(s, req, len)) { return; } } @@ -789,7 +782,8 @@ static void lsi_do_command(LSIState *s) assert(s-current == NULL); s-current = qemu_mallocz(sizeof(lsi_request)); s-current-tag = s-select_tag; -s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun); +s-current-req = scsi_req_new(dev, s-current-tag, s-current_lun, + s-current); n = scsi_req_enqueue(s-current-req, buf); if (n) { diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index ad6a730..8b1a412 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -131,7 +131,8 @@ int scsi_bus_legacy_handle_cmdline(SCSIBus *bus) return res; } -SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t lun) +SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, +uint32_t lun, void *hba_private) { SCSIRequest *req; @@ -141,14 +142,16 @@ SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t l req-dev = d; req-tag = tag; req-lun = lun; +req-hba_private = hba_private; req-status = -1; trace_scsi_req_alloc(req-dev-id, req-lun, req-tag); return req; } -SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun) +SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, + void *hba_private) { -return d-info-alloc_req(d, tag, lun); +return d-info-alloc_req(d, tag, lun, hba_private); } uint8_t *scsi_req_get_buf(SCSIRequest *req) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index a8c7372..c2a99fe 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -81,13 +81,13 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, int type); static int
[PATCH] virt: Add more flexible way to specify comm ports host - guest
When running the virt guest windows tests using the (now default) autotest private bridge, noticed that some ports needed for host and guest communication weren't specified. So, add a config file knob to allow people to specify additional ports to be added to the default firewall configuration. The config tracks some important ports used on tests, such as the remote shell ports and remote shell file transfer ports. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests_base.cfg.sample |3 ++ client/virt/virt_test_setup.py | 47 +-- 2 files changed, 35 insertions(+), 15 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 5313da1..1a86265 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -64,6 +64,9 @@ bridge = private # be a specific bridge # name, such as 'virbr0' #bridge = virbr0 +# If you need more ports to be available for comm between host and guest, +# please add them here +priv_bridge_ports = 53 67 run_tcpdump = yes # Misc diff --git a/client/virt/virt_test_setup.py b/client/virt/virt_test_setup.py index 6e2d477..1539cac 100644 --- a/client/virt/virt_test_setup.py +++ b/client/virt/virt_test_setup.py @@ -308,21 +308,38 @@ class PrivateBridgeConfig(object): self.subnet = params.get(priv_subnet, '192.168.58') self.ip_version = params.get(bridge_ip_version, ipv4) self.dhcp_server_pid = None -self.iptables_rules = [ -INPUT 1 -i %s -p udp -m udp --dport 53 -j ACCEPT % self.brname, -INPUT 2 -i %s -p tcp -m tcp --dport 53 -j ACCEPT % self.brname, -INPUT 3 -i %s -p udp -m udp --dport 67 -j ACCEPT % self.brname, -INPUT 4 -i %s -p tcp -m tcp --dport 67 -j ACCEPT % self.brname, -INPUT 5 -i %s -p tcp -m tcp --dport 12323 -j ACCEPT % self.brname, -FORWARD 1 -m physdev --physdev-is-bridged -j ACCEPT, -FORWARD 2 -d %s.0/24 -o %s -m state --state RELATED,ESTABLISHED --j ACCEPT % (self.subnet, self.brname), -FORWARD 3 -s %s.0/24 -i %s -j ACCEPT % (self.subnet, self.brname), -FORWARD 4 -i %s -o %s -j ACCEPT % (self.brname, self.brname), -(FORWARD 5 -o %s -j REJECT --reject-with icmp-port-unreachable % - self.brname), -(FORWARD 6 -i %s -j REJECT --reject-with icmp-port-unreachable % - self.brname)] +ports = params.get(priv_bridge_ports, '53 67').split() +s_port = params.get(guest_port_remote_shell, 10022) +if s_port not in ports: +ports.append(s_port) +ft_port = params.get(guest_port_file_transfer, 10023) +if ft_port not in ports: +ports.append(ft_port) +u_port = params.get(guest_port_unattended_install, 13323) +if u_port not in ports: +ports.append(u_port) +self.iptables_rules = self._assemble_iptables_rules(ports) + + +def _assemble_iptables_rules(self, port_list): +rules = [] +index = 0 +for port in port_list: +index += 1 +rules.append(INPUT %s -i %s -p tcp -m tcp --dport %s -j ACCEPT % + (index, self.brname, port)) +index += 1 +rules.append(INPUT %s -i %s -p udp -m udp --dport %s -j ACCEPT % + (index, self.brname, port)) +rules.append(FORWARD 1 -m physdev --physdev-is-bridged -j ACCEPT) +rules.append(FORWARD 2 -d %s.0/24 -o %s -m state + --state RELATED,ESTABLISHED -j ACCEPT % + (self.subnet, self.brname)) +rules.append(FORWARD 3 -s %s.0/24 -i %s -j ACCEPT % + (self.subnet, self.brname)) +rules.append(FORWARD 4 -i %s -o %s -j ACCEPT % + (self.brname, self.brname)) +return rules def _add_bridge(self): -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] iov: Update parameter usage in iov_(to|from)_buf()
On 01.07.2011, at 17:35, Hannes Reinecke wrote: iov_to_buf() has an 'offset' parameter, iov_from_buf() hasn't. This patch adds the missing parameter to iov_from_buf(). It also renames the 'offset' parameter to 'iov_off' to emphasize it's the offset into the iovec and not the buffer. Signed-off-by: Hannes Reinecke h...@suse.de Acked-by: Alexander Graf ag...@suse.de Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] megasas: LSI Megaraid SAS emulation
On 01.07.2011, at 17:35, Hannes Reinecke wrote: This patch adds an emulation for the LSI Megaraid SAS 8708EM2 HBA. Have you tried to execute the current version of megasas and actually do something with it? I just booted up openSUSE 11.4 rescue from DVD with a megasas adapter that contained a raw file backed by tmpfs. Creating a partition worked fine, but when running mkfs.ext3 and mounting afterwards, the mount fails saying there is no ext3 on the disk. Sounds like data corruption to me :). I know that this used to work a while back, so it might be a regression recently? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/31] Implement user mode network for kvm tools
On Fri, Jul 1, 2011 at 12:38 AM, Asias He asias.he...@gmail.com wrote: On 06/30/2011 04:56 PM, Stefan Hajnoczi wrote: On Thu, Jun 30, 2011 at 9:40 AM, Asias He asias.he...@gmail.com wrote: uip stands for user mode {TCP,UDP}/IP. Currently, uip supports ARP, ICMP, IPV4, UDP, TCP. So any network protocols above UDP/TCP should work as well, e.g., HTTP, FTP, SSH, DNS. There is an existing uIP which might cause confusion, not sure if you've seen it. First I thought you were using that :). I heard about uIP, but this patchset have nothing to do with uIP ;-) At first I was naming the user mode network as UNET which is User mode NETwork, however, I though uip looks better because it is shorter. Anyway, if uip do cause confusion. I'd like to change this naming. It's up to you but now is the right time to do it. Consider if another program wants to reuse this code or if you ever want to make it a library, it wouldn't help to have a confusing name. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] staging: zcache: support multiple clients, prep for KVM and RAMster
On Fri, Jul 01, 2011 at 07:31:54AM -0700, Dan Magenheimer wrote: Off by one errors are kind of insidious. People cut and paste them and they spread. If someone adds a new list of chunks then there are now two examples that are correct and two which have an extra element, so it's 50/50 that he'll copy the right one. True, but these are NOT off-by-one errors... they are correct-but-slightly-ugly code snippets. (To clarify, I said the *ugliness* arose when debugging an off-by-one error.) What I meant was the new arrays are *one* element too large. Patches always welcome, and I agree that these should be fixed eventually, assuming the code doesn't go away completely first.. I'm simply stating the position that going through another test/submit cycling to fix correct-but-slightly-ugly code which is present only to surface information for experiments is not high on my priority list right now... unless GregKH says he won't accept the patch. Btw, looking at it again, this seems like maybe a similar issue in zbud_evict_zbpg(): 516 for (i = 0; i MAX_CHUNK; i++) { 517 retry_unbud_list_i: MAX_CHUNKS is NCHUNKS - 1. Shouldn't that be i NCHUNKS so that we reach the last element in the list? No, the last element in that list is unused. There is a comment to that effect someplace in the code. (These lists are keeping track of pages with chunks of available space and the last entry would have no available space so is always empty.) The comment says that the first element isn't used. Perhaps the comment is out of date and now it's the last element that isn't used. To me, it makes sense to have an unused first element, but it doesn't make sense to have an unused last element. Why not just make the array smaller? Also if the last element of the original arrays isn't used, then does that mean the last *two* elements of the new arrays aren't used? Getting array sizes wrong is not a correct-but-slightly-ugly thing. *grumble* *grumble* *grumble*. But it doesn't crash the system so I'm fine with it going in as is... Thanks again for your interest... are you using zcache? No. I was just on the driver-devel list reviewing patches at random. regards, dan carpenter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/17] Hypervisor-mode KVM on POWER7 and PPC970
On 29.06.2011, at 12:15, Paul Mackerras wrote: The first patch of the following series is a pure bug-fix for 32-bit kernels. The remainder of the following series of patches enable KVM to exploit the hardware hypervisor mode on 64-bit Power ISA Book3S machines. At present, POWER7 and PPC970 processors are supported. (Note that the PPC970 processors in Apple G5 machines don't have a usable hypervisor mode and are not supported by these patches.) Running the KVM host in hypervisor mode means that the guest can use both supervisor mode and user mode. That means that the guest can execute supervisor-privilege instructions and access supervisor- privilege registers. In addition the hardware directs most exceptions to the guest. Thus we don't need to emulate any instructions in the host. Generally, the only times we need to exit the guest are when it does a hypercall or when an external interrupt or host timer (decrementer) interrupt occurs. The focus of this KVM implementation is to run guests that use the PAPR (Power Architecture Platform Requirements) paravirtualization interface, which is the interface supplied by PowerVM on IBM pSeries machines. Currently the pseries machine type in qemu is only supported by book3s_hv KVM, and book3s_hv KVM only supports the pseries machine type. That will hopefully change in future. These patches are against the master branch of the kvm tree. Something seems to be broken with signals. When running without io-thread, I can't even do ctrl-c on -nographic while the guest is in sleep mode. But that might not be related to your patches. I've applied 01-16 now. Sending them through some more testing and if they're good, sending a pull request. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/17] KVM: PPC: Add support for Book3S processors in hypervisor mode
On Wed, 2011-06-29 at 20:21 +1000, Paul Mackerras wrote: +struct kvmppc_pginfo { + unsigned long pfn; + atomic_t refcnt; +}; I only see this refcnt inc'd in one spot and never decremented or read. Is the refcnt just the number of hptes we have for this particular page at the moment? +long kvmppc_alloc_hpt(struct kvm *kvm) +{ + unsigned long hpt; + unsigned long lpid; + + hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN, +HPT_ORDER - PAGE_SHIFT); + if (!hpt) { + pr_err(kvm_alloc_hpt: Couldn't alloc HPT\n); + return -ENOMEM; + } + kvm-arch.hpt_virt = hpt; + + do { + lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS); + if (lpid = NR_LPIDS) { + pr_err(kvm_alloc_hpt: No LPIDs free\n); + free_pages(hpt, HPT_ORDER - PAGE_SHIFT); + return -ENOMEM; + } + } while (test_and_set_bit(lpid, lpid_inuse)); + + kvm-arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18); + kvm-arch.lpid = lpid; + kvm-arch.host_sdr1 = mfspr(SPRN_SDR1); + kvm-arch.host_lpid = mfspr(SPRN_LPID); + kvm-arch.host_lpcr = mfspr(SPRN_LPCR); + + pr_info(KVM guest htab at %lx, LPID %lx\n, hpt, lpid); + return 0; +} +static unsigned long user_page_size(unsigned long addr) +{ + struct vm_area_struct *vma; + unsigned long size = PAGE_SIZE; + + down_read(current-mm-mmap_sem); + vma = find_vma(current-mm, addr); + if (vma) + size = vma_kernel_pagesize(vma); + up_read(current-mm-mmap_sem); + return size; +} That one looks pretty arch-independent and like it could use some consolidation with: virt/kvm/kvm_main.c::kvm_host_page_size() +void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) +{ + unsigned long i; + unsigned long npages = kvm-arch.ram_npages; + unsigned long pfn; + unsigned long *hpte; + unsigned long hash; + struct kvmppc_pginfo *pginfo = kvm-arch.ram_pginfo; + + if (!pginfo) + return; + + /* VRMA can't be 1TB */ + if (npages 1ul (40 - kvm-arch.ram_porder)) + npages = 1ul (40 - kvm-arch.ram_porder); Is that because it can only be a single segment? Does that mean that we can't ever have guests larger than 1TB? Or just that they have to live with 1TB until they get their own page tables up? + /* Can't use more than 1 HPTE per HPTEG */ + if (npages HPT_NPTEG) + npages = HPT_NPTEG; + + for (i = 0; i npages; ++i) { + pfn = pginfo[i].pfn; + /* can't use hpt_hash since va 64 bits */ + hash = (i ^ (VRMA_VSID ^ (VRMA_VSID 25))) HPT_HASH_MASK; Is that because 'i' could potentially have a very large pfn? Nish thought it might have something to do with the hpte entries being larger than 64-bits themselves with the vsid included, but we got thoroughly confused. :) -- Dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/17] KVM: PPC: Add support for Book3S processors in hypervisor mode
On 01.07.2011, at 20:37, Dave Hansen wrote: On Wed, 2011-06-29 at 20:21 +1000, Paul Mackerras wrote: +struct kvmppc_pginfo { +unsigned long pfn; +atomic_t refcnt; +}; I only see this refcnt inc'd in one spot and never decremented or read. Is the refcnt just the number of hptes we have for this particular page at the moment? +long kvmppc_alloc_hpt(struct kvm *kvm) +{ +unsigned long hpt; +unsigned long lpid; + +hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN, + HPT_ORDER - PAGE_SHIFT); +if (!hpt) { +pr_err(kvm_alloc_hpt: Couldn't alloc HPT\n); +return -ENOMEM; +} +kvm-arch.hpt_virt = hpt; + +do { +lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS); +if (lpid = NR_LPIDS) { +pr_err(kvm_alloc_hpt: No LPIDs free\n); +free_pages(hpt, HPT_ORDER - PAGE_SHIFT); +return -ENOMEM; +} +} while (test_and_set_bit(lpid, lpid_inuse)); + +kvm-arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18); +kvm-arch.lpid = lpid; +kvm-arch.host_sdr1 = mfspr(SPRN_SDR1); +kvm-arch.host_lpid = mfspr(SPRN_LPID); +kvm-arch.host_lpcr = mfspr(SPRN_LPCR); + +pr_info(KVM guest htab at %lx, LPID %lx\n, hpt, lpid); +return 0; +} +static unsigned long user_page_size(unsigned long addr) +{ +struct vm_area_struct *vma; +unsigned long size = PAGE_SIZE; + +down_read(current-mm-mmap_sem); +vma = find_vma(current-mm, addr); +if (vma) +size = vma_kernel_pagesize(vma); +up_read(current-mm-mmap_sem); +return size; +} That one looks pretty arch-independent and like it could use some consolidation with: virt/kvm/kvm_main.c::kvm_host_page_size() Yep, I'd deem that a cleanup for later though. Good point however! We have similar code in e500 kvm today. +void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem) +{ +unsigned long i; +unsigned long npages = kvm-arch.ram_npages; +unsigned long pfn; +unsigned long *hpte; +unsigned long hash; +struct kvmppc_pginfo *pginfo = kvm-arch.ram_pginfo; + +if (!pginfo) +return; + +/* VRMA can't be 1TB */ +if (npages 1ul (40 - kvm-arch.ram_porder)) +npages = 1ul (40 - kvm-arch.ram_porder); Is that because it can only be a single segment? Does that mean that we can't ever have guests larger than 1TB? Or just that they have to live with 1TB until they get their own page tables up? The VRMA is only important in real mode, so this part looks good. The RMA is usually a lot smaller than 1TB ;). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH resend] compat_ioctl: fix warning caused by qemu
On 2011-07-01 16:46, Arnd Bergmann wrote: Yes, that should be fine, unless Jens would like to see a different solution for the struct definitions, e.g. moving all of the floppy compat ioctl numbers to fd.h. I'm fine with it either way. Looks OK to me, I've queued it up for 3.1 with your ack. Thanks Johannes. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/31] Implement user mode network for kvm tools
On Fri, Jul 1, 2011 at 7:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jul 1, 2011 at 12:38 AM, Asias He asias.he...@gmail.com wrote: On 06/30/2011 04:56 PM, Stefan Hajnoczi wrote: On Thu, Jun 30, 2011 at 9:40 AM, Asias He asias.he...@gmail.com wrote: uip stands for user mode {TCP,UDP}/IP. Currently, uip supports ARP, ICMP, IPV4, UDP, TCP. So any network protocols above UDP/TCP should work as well, e.g., HTTP, FTP, SSH, DNS. There is an existing uIP which might cause confusion, not sure if you've seen it. First I thought you were using that :). I heard about uIP, but this patchset have nothing to do with uIP ;-) At first I was naming the user mode network as UNET which is User mode NETwork, however, I though uip looks better because it is shorter. Anyway, if uip do cause confusion. I'd like to change this naming. It's up to you but now is the right time to do it. Consider if another program wants to reuse this code or if you ever want to make it a library, it wouldn't help to have a confusing name. I don't care too much what we use as the namespace prefix but as a directory name tools/kvm/uip is pretty meaningless. I'd just move the code under tools/kvm/net to mirror what the kernel already has. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/9] Steal time series again
Here follows the fourth version of the steal time series. Hope it is acceptable for all involved parties now. The main differences from v3 are: * The Changelogs seem to have been writen by an actual person now, not of a monkey. Yet, I am the aforementioned person, so don't expect much. * Forcing delayacct on the hypervisor side allow us to simplify the guest code dramatically, since now we don't need to test for is_idle: if we're idle, we won't have steal time and end of story. Hope you enjoy. Glauber Costa (8): KVM-HDR Add constant to represent KVM MSRs enabled bit KVM-HDR: KVM Steal time implementation KVM-HV: KVM Steal time implementation KVM-GST: Add a pv_ops stub for steal time add jump labels for ia64 paravirt KVM-GST: KVM Steal time accounting KVM-GST: adjust scheduler cpu power KVM-GST: KVM Steal time registration Gleb Natapov (1): introduce kvm_read_guest_cached Documentation/kernel-parameters.txt |4 ++ Documentation/virtual/kvm/msr.txt | 35 ++ arch/ia64/include/asm/paravirt.h |4 ++ arch/ia64/kernel/paravirt.c |2 + arch/x86/Kconfig | 12 + arch/x86/include/asm/kvm_host.h |8 +++ arch/x86/include/asm/kvm_para.h | 15 ++ arch/x86/include/asm/paravirt.h |9 arch/x86/include/asm/paravirt_types.h |1 + arch/x86/kernel/kvm.c | 73 ++ arch/x86/kernel/kvmclock.c|2 + arch/x86/kernel/paravirt.c|9 arch/x86/kvm/Kconfig |1 + arch/x86/kvm/x86.c| 56 ++- include/linux/kvm_host.h |2 + kernel/sched.c| 80 kernel/sched_features.h |4 +- virt/kvm/kvm_main.c | 20 18 files changed, 322 insertions(+), 15 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/9] KVM-HV: KVM Steal time implementation
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. Signed-off-by: Glauber Costa glom...@redhat.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- arch/x86/include/asm/kvm_host.h |8 + arch/x86/include/asm/kvm_para.h |4 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/x86.c | 56 -- 4 files changed, 66 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index da6bbee..9ba354d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -389,6 +389,14 @@ struct kvm_vcpu_arch { unsigned int hw_tsc_khz; unsigned int time_offset; struct page *time_page; + + struct { + u64 msr_val; + u64 last_steal; + struct gfn_to_hva_cache stime; + struct kvm_steal_time steal; + } st; + u64 last_guest_tsc; u64 last_kernel_ns; u64 last_tsc_nsec; diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 65f8bb9..c484ba8 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -45,6 +45,10 @@ struct kvm_steal_time { __u32 pad[12]; }; +#define KVM_STEAL_ALIGNMENT_BITS 5 +#define KVM_STEAL_VALID_BITS ((-1ULL (KVM_STEAL_ALIGNMENT_BITS + 1))) +#define KVM_STEAL_RESERVED_MASK (((1 KVM_STEAL_ALIGNMENT_BITS) - 1 ) 1) + #define KVM_MAX_MMU_OP_BATCH 32 #define KVM_ASYNC_PF_ENABLED (1 0) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 50f6364..99c3f05 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_ASYNC_PF select USER_RETURN_NOTIFIER select KVM_MMIO + select TASK_DELAY_ACCT ---help--- Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7167717..237bcdc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -808,12 +808,12 @@ EXPORT_SYMBOL_GPL(kvm_get_dr); * kvm-specific. Those are put in the beginning of the list. */ -#define KVM_SAVE_MSRS_BEGIN8 +#define KVM_SAVE_MSRS_BEGIN9 static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, - HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, + HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_STAR, #ifdef CONFIG_X86_64 @@ -1491,6 +1491,27 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu) } } +static void record_steal_time(struct kvm_vcpu *vcpu) +{ + u64 delta; + + if (!(vcpu-arch.st.msr_val KVM_MSR_ENABLED)) + return; + + if (unlikely(kvm_read_guest_cached(vcpu-kvm, vcpu-arch.st.stime, + vcpu-arch.st.steal, sizeof(struct kvm_steal_time + return; + + delta = current-sched_info.run_delay - vcpu-arch.st.last_steal; + vcpu-arch.st.last_steal = current-sched_info.run_delay; + + vcpu-arch.st.steal.steal += delta; + vcpu-arch.st.steal.version += 2; + + kvm_write_guest_cached(vcpu-kvm, vcpu-arch.st.stime, + vcpu-arch.st.steal, sizeof(struct kvm_steal_time)); +} + int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) { switch (msr) { @@ -1573,6 +1594,28 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (kvm_pv_enable_async_pf(vcpu, data)) return 1; break; + case MSR_KVM_STEAL_TIME: + vcpu-arch.st.msr_val = data; + + if (!(data KVM_MSR_ENABLED)) { + break; + } + + if (unlikely(!sched_info_on())) + break; + + if (data KVM_STEAL_RESERVED_MASK) +
[PATCH v4 2/9] KVM-HDR Add constant to represent KVM MSRs enabled bit
This patch is simple, put in a different commit so it can be more easily shared between guest and hypervisor. It just defines a named constant to indicate the enable bit for KVM-specific MSRs. Signed-off-by: Glauber Costa glom...@redhat.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- arch/x86/include/asm/kvm_para.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index a427bf7..d6cd79b 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -30,6 +30,7 @@ #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 +#define KVM_MSR_ENABLED 1 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */ #define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 8/9] KVM-GST: adjust scheduler cpu power
This patch makes update_rq_clock() aware of steal time. The mechanism of operation is not different from irq_time, and follows the same principles. This lives in a CONFIG option itself, and can be compiled out independently of the rest of steal time reporting. The effect of disabling it is that the scheduler will still report steal time (that cannot be disabled), but won't use this information for cpu power adjustments. Everytime update_rq_clock_task() is invoked, we query information about how much time was stolen since last call, and feed it into sched_rt_avg_update(). Although steal time reporting in account_process_tick() keeps track of the last time we read the steal clock, in prev_steal_time, this patch do it independently using another field, prev_steal_time_rq. This is because otherwise, information about time accounted in update_process_tick() would never reach us in update_rq_clock(). Signed-off-by: Glauber Costa glom...@redhat.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- arch/x86/Kconfig| 12 kernel/sched.c | 47 +-- kernel/sched_features.h |4 ++-- 3 files changed, 51 insertions(+), 12 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index da34972..b26f312 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -512,6 +512,18 @@ menuconfig PARAVIRT_GUEST if PARAVIRT_GUEST +config PARAVIRT_TIME_ACCOUNTING + bool Paravirtual steal time accounting + select PARAVIRT + default n + ---help--- + Select this option to enable fine granularity task steal time + accounting. Time spent executing other tasks in parallel with + the current vCPU is discounted from the vCPU power. To account for + that, there can be a small performance impact. + + If in doubt, say N here. + source arch/x86/xen/Kconfig config KVM_CLOCK diff --git a/kernel/sched.c b/kernel/sched.c index 247dd51..c40b118 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -532,6 +532,9 @@ struct rq { #ifdef CONFIG_PARAVIRT u64 prev_steal_time; #endif +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING + u64 prev_steal_time_rq; +#endif /* calc_load related fields */ unsigned long calc_load_update; @@ -1971,8 +1974,14 @@ static inline u64 steal_ticks(u64 steal) static void update_rq_clock_task(struct rq *rq, s64 delta) { - s64 irq_delta; - +/* + * In theory, the compile should just see 0 here, and optimize out the call + * to sched_rt_avg_update. But I don't trust it... + */ +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) + s64 steal = 0, irq_delta = 0; +#endif +#ifdef CONFIG_IRQ_TIME_ACCOUNTING irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time; /* @@ -1995,12 +2004,35 @@ static void update_rq_clock_task(struct rq *rq, s64 delta) rq-prev_irq_time += irq_delta; delta -= irq_delta; +#endif +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING + if (static_branch((paravirt_steal_rq_enabled))) { + u64 st; + + steal = paravirt_steal_clock(cpu_of(rq)); + steal -= rq-prev_steal_time_rq; + + if (unlikely(steal delta)) + steal = delta; + + st = steal_ticks(steal); + steal = st * TICK_NSEC; + + rq-prev_steal_time_rq += steal; + + delta -= steal; + } +#endif + rq-clock_task += delta; - if (irq_delta sched_feat(NONIRQ_POWER)) - sched_rt_avg_update(rq, irq_delta); +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) + if ((irq_delta + steal) sched_feat(NONTASK_POWER)) + sched_rt_avg_update(rq, irq_delta + steal); +#endif } +#ifdef CONFIG_IRQ_TIME_ACCOUNTING static int irqtime_account_hi_update(void) { struct cpu_usage_stat *cpustat = kstat_this_cpu.cpustat; @@ -2035,12 +2067,7 @@ static int irqtime_account_si_update(void) #define sched_clock_irqtime(0) -static void update_rq_clock_task(struct rq *rq, s64 delta) -{ - rq-clock_task += delta; -} - -#endif /* CONFIG_IRQ_TIME_ACCOUNTING */ +#endif #include sched_idletask.c #include sched_fair.c diff --git a/kernel/sched_features.h b/kernel/sched_features.h index be40f73..ca3b025 100644 --- a/kernel/sched_features.h +++ b/kernel/sched_features.h @@ -61,9 +61,9 @@ SCHED_FEAT(LB_BIAS, 1) SCHED_FEAT(OWNER_SPIN, 1) /* - * Decrement CPU power based on irq activity + * Decrement CPU power based on time not spent running tasks */ -SCHED_FEAT(NONIRQ_POWER, 1) +SCHED_FEAT(NONTASK_POWER, 1) /* * Queue remote wakeups on the target CPU and process them -- 1.7.3.4 -- To unsubscribe from
[PATCH v4 1/9] introduce kvm_read_guest_cached
From: Gleb Natapov g...@redhat.com Introduce kvm_read_guest_cached() function in addition to write one we already have. [ by glauber: export function signature in kvm header ] Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Glauber Costa glom...@redhat.com --- include/linux/kvm_host.h |2 ++ virt/kvm/kvm_main.c | 20 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 31ebb59..f7df0a3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -381,6 +381,8 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len); int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len); +int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + void *data, unsigned long len); int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, int offset, int len); int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 11d2783..d5ef9eb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1418,6 +1418,26 @@ int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, } EXPORT_SYMBOL_GPL(kvm_write_guest_cached); +int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + void *data, unsigned long len) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + int r; + + if (slots-generation != ghc-generation) + kvm_gfn_to_hva_cache_init(kvm, ghc, ghc-gpa); + + if (kvm_is_error_hva(ghc-hva)) + return -EFAULT; + + r = __copy_from_user(data, (void __user *)ghc-hva, len); + if (r) + return -EFAULT; + + return 0; +} +EXPORT_SYMBOL_GPL(kvm_read_guest_cached); + int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) { return kvm_write_guest_page(kvm, gfn, (const void *) empty_zero_page, -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 5/9] KVM-GST: Add a pv_ops stub for steal time
This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: Glauber Costa glom...@redhat.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- arch/x86/include/asm/paravirt.h |9 + arch/x86/include/asm/paravirt_types.h |1 + arch/x86/kernel/paravirt.c|9 + 3 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index ebbc4d8..a7d2db9 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -230,6 +230,15 @@ static inline unsigned long long paravirt_sched_clock(void) return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock); } +struct jump_label_key; +extern struct jump_label_key paravirt_steal_enabled; +extern struct jump_label_key paravirt_steal_rq_enabled; + +static inline u64 paravirt_steal_clock(int cpu) +{ + return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu); +} + static inline unsigned long long paravirt_read_pmc(int counter) { return PVOP_CALL1(u64, pv_cpu_ops.read_pmc, counter); diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 8288509..2c76521 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -89,6 +89,7 @@ struct pv_lazy_ops { struct pv_time_ops { unsigned long long (*sched_clock)(void); + unsigned long long (*steal_clock)(int cpu); unsigned long (*get_tsc_khz)(void); }; diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 869e1ae..613a793 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -202,6 +202,14 @@ static void native_flush_tlb_single(unsigned long addr) __native_flush_tlb_single(addr); } +struct jump_label_key paravirt_steal_enabled; +struct jump_label_key paravirt_steal_rq_enabled; + +static u64 native_steal_clock(int cpu) +{ + return 0; +} + /* These are in entry.S */ extern void native_iret(void); extern void native_irq_enable_sysexit(void); @@ -307,6 +315,7 @@ struct pv_init_ops pv_init_ops = { struct pv_time_ops pv_time_ops = { .sched_clock = native_sched_clock, + .steal_clock = native_steal_clock, }; struct pv_irq_ops pv_irq_ops = { -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 6/9] add jump labels for ia64 paravirt
Since in a later patch I intend to call jump labels inside CONFIG_PARAVIRT, IA64 would fail to compile if they are not provided. This patch provides those jump labels for the IA64 architecture. Signed-off-by: Glauber Costa glom...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp CC: Eddie Dong eddie.d...@intel.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- arch/ia64/include/asm/paravirt.h |4 arch/ia64/kernel/paravirt.c |2 ++ 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/ia64/include/asm/paravirt.h b/arch/ia64/include/asm/paravirt.h index 2eb0a98..32551d3 100644 --- a/arch/ia64/include/asm/paravirt.h +++ b/arch/ia64/include/asm/paravirt.h @@ -281,6 +281,10 @@ paravirt_init_missing_ticks_accounting(int cpu) pv_time_ops.init_missing_ticks_accounting(cpu); } +struct jump_label_key; +extern struct jump_label_key paravirt_steal_enabled; +extern struct jump_label_key paravirt_steal_rq_enabled; + static inline int paravirt_do_steal_accounting(unsigned long *new_itm) { diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index a21d7bb..1008682 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -634,6 +634,8 @@ struct pv_irq_ops pv_irq_ops = { * pv_time_ops * time operations */ +struct jump_label_key paravirt_steal_enabled; +struct jump_label_key paravirt_steal_rq_enabled; static int ia64_native_do_steal_accounting(unsigned long *new_itm) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 7/9] KVM-GST: KVM Steal time accounting
This patch accounts steal time time in account_process_tick. If one or more tick is considered stolen in the current accounting cycle, user/system accounting is skipped. Idle is fine, since the hypervisor does not report steal time if the guest is halted. Accounting steal time from the core scheduler give us the advantage of direct access to the runqueue data. In a later opportunity, it can be used to tweak cpu power and make the scheduler aware of the time it lost. Signed-off-by: Glauber Costa glom...@redhat.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- kernel/sched.c | 33 + 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 3f2e502..247dd51 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -75,6 +75,7 @@ #include asm/tlb.h #include asm/irq_regs.h #include asm/mutex.h +#include asm/paravirt.h #include sched_cpupri.h #include workqueue_sched.h @@ -528,6 +529,9 @@ struct rq { #ifdef CONFIG_IRQ_TIME_ACCOUNTING u64 prev_irq_time; #endif +#ifdef CONFIG_PARAVIRT + u64 prev_steal_time; +#endif /* calc_load related fields */ unsigned long calc_load_update; @@ -1953,6 +1957,18 @@ void account_system_vtime(struct task_struct *curr) } EXPORT_SYMBOL_GPL(account_system_vtime); +#endif /* CONFIG_IRQ_TIME_ACCOUNTING */ + +#ifdef CONFIG_PARAVIRT +static inline u64 steal_ticks(u64 steal) +{ + if (unlikely(steal NSEC_PER_SEC)) + return div_u64(steal, TICK_NSEC); + + return __iter_div_u64_rem(steal, TICK_NSEC, steal); +} +#endif + static void update_rq_clock_task(struct rq *rq, s64 delta) { s64 irq_delta; @@ -3929,6 +3945,23 @@ void account_process_tick(struct task_struct *p, int user_tick) return; } +#ifdef CONFIG_PARAVIRT + if (static_branch(paravirt_steal_enabled)) { + u64 steal, st = 0; + + steal = paravirt_steal_clock(smp_processor_id()); + steal -= this_rq()-prev_steal_time; + + st = steal_ticks(steal); + this_rq()-prev_steal_time += st * TICK_NSEC; + + if (st) { + account_steal_time(st); + return; + } + } +#endif + if (user_tick) account_user_time(p, cputime_one_jiffy, one_jiffy_scaled); else if ((p != rq-idle) || (irq_count() != HARDIRQ_OFFSET)) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 9/9] KVM-GST: KVM Steal time registration
This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. Signed-off-by: Glauber Costa glom...@redhat.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- Documentation/kernel-parameters.txt |4 ++ arch/x86/include/asm/kvm_para.h |1 + arch/x86/kernel/kvm.c | 73 +++ arch/x86/kernel/kvmclock.c |2 + 4 files changed, 80 insertions(+), 0 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index fd248a31..a722574 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1737,6 +1737,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page fault handling. + no-steal-acc[X86,KVM] Disable paravirtualized steal time accounting. + steal time is computed, but won't influence scheduler + behaviour + nolapic [X86-32,APIC] Do not enable or use the local APIC. nolapic_timer [X86-32,APIC] Do not use the local APIC timer. diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index c484ba8..35d732d 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -94,6 +94,7 @@ struct kvm_vcpu_pv_apf_data { extern void kvmclock_init(void); extern int kvm_register_clock(char *txt); +extern void kvm_disable_steal_time(void); /* This instruction is vmcall. On non-VT architectures, it will generate a diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 33c07b0..58331c2 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -51,6 +51,15 @@ static int parse_no_kvmapf(char *arg) early_param(no-kvmapf, parse_no_kvmapf); +static int steal_acc = 1; +static int parse_no_stealacc(char *arg) +{ +steal_acc = 0; +return 0; +} + +early_param(no-steal-acc, parse_no_stealacc); + struct kvm_para_state { u8 mmu_queue[MMU_QUEUE_SIZE]; int mmu_queue_len; @@ -58,6 +67,8 @@ struct kvm_para_state { static DEFINE_PER_CPU(struct kvm_para_state, para_state); static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); +static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64); +static int has_steal_clock = 0; static struct kvm_para_state *kvm_para_state(void) { @@ -441,6 +452,21 @@ static void __init paravirt_ops_setup(void) #endif } +static void kvm_register_steal_time(void) +{ + int cpu = smp_processor_id(); + struct kvm_steal_time *st = per_cpu(steal_time, cpu); + + if (!has_steal_clock) + return; + + memset(st, 0, sizeof(*st)); + + wrmsrl(MSR_KVM_STEAL_TIME, (__pa(st) | KVM_MSR_ENABLED)); + printk(KERN_INFO kvm-stealtime: cpu %d, msr %lx\n, + cpu, __pa(st)); +} + void __cpuinit kvm_guest_cpu_init(void) { if (!kvm_para_available()) @@ -457,6 +483,9 @@ void __cpuinit kvm_guest_cpu_init(void) printk(KERN_INFOKVM setup async PF for cpu %d\n, smp_processor_id()); } + + if (has_steal_clock) + kvm_register_steal_time(); } static void kvm_pv_disable_apf(void *unused) @@ -483,6 +512,31 @@ static struct notifier_block kvm_pv_reboot_nb = { .notifier_call = kvm_pv_reboot_notify, }; +static u64 kvm_steal_clock(int cpu) +{ + u64 steal; + struct kvm_steal_time *src; + int version; + + src = per_cpu(steal_time, cpu); + do { + version = src-version; + rmb(); + steal = src-steal; + rmb(); + } while ((version 1) || (version != src-version)); + + return steal; +} + +void kvm_disable_steal_time(void) +{ + if (!has_steal_clock) + return; + + wrmsr(MSR_KVM_STEAL_TIME, 0, 0); +} + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { @@ -500,6 +554,7 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy) static void kvm_guest_cpu_offline(void *dummy) { + kvm_disable_steal_time(); kvm_pv_disable_apf(NULL); apf_task_wake_all(); } @@ -548,6 +603,11 @@ void __init kvm_guest_init(void) if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)) x86_init.irqs.trap_init = kvm_apf_trap_init; + if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) { + has_steal_clock = 1; + pv_time_ops.steal_clock =
[PATCH v4 3/9] KVM-HDR: KVM Steal time implementation
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the headers for it. I am keeping it separate to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. Signed-off-by: Glauber Costa glom...@redhat.com CC: Rik van Riel r...@redhat.com CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com CC: Peter Zijlstra pet...@infradead.org CC: Avi Kivity a...@redhat.com CC: Anthony Liguori aligu...@us.ibm.com CC: Eric B Munson emun...@mgebm.net --- Documentation/virtual/kvm/msr.txt | 35 +++ arch/x86/include/asm/kvm_para.h |9 + 2 files changed, 44 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt index d079aed..38db3f8 100644 --- a/Documentation/virtual/kvm/msr.txt +++ b/Documentation/virtual/kvm/msr.txt @@ -185,3 +185,38 @@ MSR_KVM_ASYNC_PF_EN: 0x4b564d02 Currently type 2 APF will be always delivered on the same vcpu as type 1 was, but guest should not rely on that. + +MSR_KVM_STEAL_TIME: 0x4b564d03 + + data: 64-byte alignment physical address of a memory area which must be + in guest RAM, plus an enable bit in bit 0. This memory is expected to + hold a copy of the following structure: + + struct kvm_steal_time { + __u64 steal; + __u32 version; + __u32 flags; + __u32 pad[12]; + } + + whose data will be filled in by the hypervisor periodically. Only one + write, or registration, is needed for each VCPU. The interval between + updates of this structure is arbitrary and implementation-dependent. + The hypervisor may update this structure at any time it sees fit until + anything with bit0 == 0 is written to it. Guest is required to make sure + this structure is initialized to zero. + + Fields have the following meanings: + + version: a sequence counter. In other words, guest has to check + this field before and after grabbing time information and make + sure they are both equal and even. An odd version indicates an + in-progress update. + + flags: At this point, always zero. May be used to indicate + changes in this structure in the future. + + steal: the amount of time in which this vCPU did not run, in + nanoseconds. Time during which the vcpu is idle, will not be + reported as steal time. + diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index d6cd79b..65f8bb9 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -21,6 +21,7 @@ */ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 +#define KVM_FEATURE_STEAL_TIME 5 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -35,6 +36,14 @@ #define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 +#define MSR_KVM_STEAL_TIME 0x4b564d03 + +struct kvm_steal_time { + __u64 steal; + __u32 version; + __u32 flags; + __u32 pad[12]; +}; #define KVM_MAX_MMU_OP_BATCH 32 -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/8] kvm tools: Don't dynamically allocate threadpool jobs
To allow efficient use of shorter-term threadpool jobs, don't allocate them dynamically upon creation. Instead, store them within 'job' structures. This will prevent some overhead creating/destroying jobs which live for a short time. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/include/kvm/threadpool.h | 29 ++--- tools/kvm/include/kvm/virtio-9p.h |3 ++- tools/kvm/threadpool.c | 30 ++ tools/kvm/virtio/9p.c |7 +++ tools/kvm/virtio/blk.c |8 tools/kvm/virtio/console.c | 10 +- tools/kvm/virtio/rng.c | 16 7 files changed, 50 insertions(+), 53 deletions(-) diff --git a/tools/kvm/include/kvm/threadpool.h b/tools/kvm/include/kvm/threadpool.h index 62826a6..768239f 100644 --- a/tools/kvm/include/kvm/threadpool.h +++ b/tools/kvm/include/kvm/threadpool.h @@ -1,14 +1,37 @@ #ifndef KVM__THREADPOOL_H #define KVM__THREADPOOL_H +#include kvm/mutex.h + +#include linux/list.h + struct kvm; typedef void (*kvm_thread_callback_fn_t)(struct kvm *kvm, void *data); -int thread_pool__init(unsigned long thread_count); +struct thread_pool__job { + kvm_thread_callback_fn_tcallback; + struct kvm *kvm; + void*data; + + int signalcount; + pthread_mutex_t mutex; -void *thread_pool__add_job(struct kvm *kvm, kvm_thread_callback_fn_t callback, void *data); + struct list_headqueue; +}; + +static inline void thread_pool__init_job(struct thread_pool__job *job, struct kvm *kvm, kvm_thread_callback_fn_t callback, void *data) +{ + *job = (struct thread_pool__job) { + .kvm= kvm, + .callback = callback, + .data = data, + .mutex = PTHREAD_MUTEX_INITIALIZER, + }; +} + +int thread_pool__init(unsigned long thread_count); -void thread_pool__do_job(void *job); +void thread_pool__do_job(struct thread_pool__job *job); #endif diff --git a/tools/kvm/include/kvm/virtio-9p.h b/tools/kvm/include/kvm/virtio-9p.h index eb546bb..b9c10de 100644 --- a/tools/kvm/include/kvm/virtio-9p.h +++ b/tools/kvm/include/kvm/virtio-9p.h @@ -2,6 +2,7 @@ #define KVM__VIRTIO_9P_H #include kvm/virtio.h #include kvm/pci.h +#include kvm/threadpool.h #include sys/types.h #include dirent.h @@ -34,7 +35,7 @@ struct p9_fid { struct p9_dev_job { struct virt_queue *vq; struct p9_dev *p9dev; - void*job_id; + struct thread_pool__job job_id; }; struct p9_dev { diff --git a/tools/kvm/threadpool.c b/tools/kvm/threadpool.c index 2db02184..fdc5fa7 100644 --- a/tools/kvm/threadpool.c +++ b/tools/kvm/threadpool.c @@ -6,17 +6,6 @@ #include pthread.h #include stdbool.h -struct thread_pool__job { - kvm_thread_callback_fn_tcallback; - struct kvm *kvm; - void*data; - - int signalcount; - pthread_mutex_t mutex; - - struct list_headqueue; -}; - static pthread_mutex_t job_mutex = PTHREAD_MUTEX_INITIALIZER; static pthread_mutex_t thread_mutex= PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t job_cond= PTHREAD_COND_INITIALIZER; @@ -139,26 +128,11 @@ int thread_pool__init(unsigned long thread_count) return i; } -void *thread_pool__add_job(struct kvm *kvm, - kvm_thread_callback_fn_t callback, void *data) -{ - struct thread_pool__job *job = calloc(1, sizeof(*job)); - - *job = (struct thread_pool__job) { - .kvm= kvm, - .data = data, - .callback = callback, - .mutex = PTHREAD_MUTEX_INITIALIZER - }; - - return job; -} - -void thread_pool__do_job(void *job) +void thread_pool__do_job(struct thread_pool__job *job) { struct thread_pool__job *jobinfo = job; - if (jobinfo == NULL) + if (jobinfo == NULL || jobinfo-callback == NULL) return; mutex_lock(jobinfo-mutex); diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c index 69e534f..d927688 100644 --- a/tools/kvm/virtio/9p.c +++ b/tools/kvm/virtio/9p.c @@ -18,7 +18,6 @@ #include linux/virtio_9p.h #include net/9p/9p.h - /* Warning: Immediately use value returned from this function */ static const char *rel_to_abs(struct p9_dev *p9dev, const char *path, char *abs_path) @@ -659,7 +658,7 @@ static void ioevent_callback(struct kvm *kvm, void *param) { struct p9_dev_job *job = param; - thread_pool__do_job(job-job_id); + thread_pool__do_job(job-job_id); } static bool
[PATCH v2 2/8] kvm tools: Process virtio-blk requests in parallel
Process multiple requests within a virtio-blk device's vring in parallel. Doing so may improve performance in cases when a request which can be completed using data which is present in a cache is queued after a request with un-cached data. bonnie++ benchmarks have shown a 6% improvement with reads, and 2% improvement in writes. Suggested-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/virtio/blk.c | 74 --- 1 files changed, 38 insertions(+), 36 deletions(-) diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c index 1fdfc1e..f2a728c 100644 --- a/tools/kvm/virtio/blk.c +++ b/tools/kvm/virtio/blk.c @@ -31,6 +31,8 @@ struct blk_dev_job { struct virt_queue *vq; struct blk_dev *bdev; + struct ioveciov[VIRTIO_BLK_QUEUE_SIZE]; + u16 out, in, head; struct thread_pool__job job_id; }; @@ -51,7 +53,8 @@ struct blk_dev { u16 queue_selector; struct virt_queue vqs[NUM_VIRT_QUEUES]; - struct blk_dev_job jobs[NUM_VIRT_QUEUES]; + struct blk_dev_job jobs[VIRTIO_BLK_QUEUE_SIZE]; + u16 job_idx; struct pci_device_headerpci_hdr; }; @@ -118,20 +121,26 @@ static bool virtio_blk_pci_io_in(struct ioport *ioport, struct kvm *kvm, u16 por return ret; } -static bool virtio_blk_do_io_request(struct kvm *kvm, - struct blk_dev *bdev, - struct virt_queue *queue) +static void virtio_blk_do_io_request(struct kvm *kvm, void *param) { - struct iovec iov[VIRTIO_BLK_QUEUE_SIZE]; struct virtio_blk_outhdr *req; - ssize_t block_cnt = -1; - u16 out, in, head; u8 *status; + ssize_t block_cnt; + struct blk_dev_job *job; + struct blk_dev *bdev; + struct virt_queue *queue; + struct iovec *iov; + u16 out, in, head; - head= virt_queue__get_iov(queue, iov, out, in, kvm); - - /* head */ - req = iov[0].iov_base; + block_cnt = -1; + job = param; + bdev= job-bdev; + queue = job-vq; + iov = job-iov; + out = job-out; + in = job-in; + head= job-head; + req = iov[0].iov_base; switch (req-type) { case VIRTIO_BLK_T_IN: @@ -153,24 +162,27 @@ static bool virtio_blk_do_io_request(struct kvm *kvm, status = iov[out + in - 1].iov_base; *status = (block_cnt 0) ? VIRTIO_BLK_S_IOERR : VIRTIO_BLK_S_OK; + mutex_lock(bdev-mutex); virt_queue__set_used_elem(queue, head, block_cnt); + mutex_unlock(bdev-mutex); - return true; + virt_queue__trigger_irq(queue, bdev-pci_hdr.irq_line, bdev-isr, kvm); } -static void virtio_blk_do_io(struct kvm *kvm, void *param) +static void virtio_blk_do_io(struct kvm *kvm, struct virt_queue *vq, struct blk_dev *bdev) { - struct blk_dev_job *job = param; - struct virt_queue *vq; - struct blk_dev *bdev; + while (virt_queue__available(vq)) { + struct blk_dev_job *job = bdev-jobs[bdev-job_idx++ % VIRTIO_BLK_QUEUE_SIZE]; - vq = job-vq; - bdev= job-bdev; - - while (virt_queue__available(vq)) - virtio_blk_do_io_request(kvm, bdev, vq); + *job= (struct blk_dev_job) { + .vq = vq, + .bdev = bdev, + }; + job-head = virt_queue__get_iov(vq, job-iov, job-out, job-in, kvm); - virt_queue__trigger_irq(vq, bdev-pci_hdr.irq_line, bdev-isr, kvm); + thread_pool__init_job(job-job_id, kvm, virtio_blk_do_io_request, job); + thread_pool__do_job(job-job_id); + } } static bool virtio_blk_pci_io_out(struct ioport *ioport, struct kvm *kvm, u16 port, void *data, int size, u32 count) @@ -190,24 +202,14 @@ static bool virtio_blk_pci_io_out(struct ioport *ioport, struct kvm *kvm, u16 po break; case VIRTIO_PCI_QUEUE_PFN: { struct virt_queue *queue; - struct blk_dev_job *job; void *p; - job = bdev-jobs[bdev-queue_selector]; - queue = bdev-vqs[bdev-queue_selector]; queue-pfn = ioport__read32(data); p = guest_pfn_to_host(kvm, queue-pfn); vring_init(queue-vring, VIRTIO_BLK_QUEUE_SIZE, p, VIRTIO_PCI_VRING_ALIGN); -
[PATCH v2 3/8] kvm tools: Allow giving instance names
This will allow tracking instance names and sending commands to specific instances if multiple instances are running. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/include/kvm/kvm.h |5 +++- tools/kvm/kvm-run.c |5 +++- tools/kvm/kvm.c | 56 ++- tools/kvm/term.c|3 ++ 4 files changed, 66 insertions(+), 3 deletions(-) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 7d90d35..5ad3236 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -41,9 +41,11 @@ struct kvm { const char *vmlinux; struct disk_image **disks; int nr_disks; + + const char *name; }; -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size); +struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name); int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); void kvm__delete(struct kvm *kvm); @@ -61,6 +63,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr); void kvm__pause(void); void kvm__continue(void); void kvm__notify_paused(void); +int kvm__get_pid_by_instance(const char *name); /* * Debugging diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index efae3c0..56c39ab 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -69,6 +69,7 @@ static const char *network; static const char *host_ip_addr; static const char *guest_mac; static const char *script; +static const char *guest_name; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; static bool vnc; @@ -132,6 +133,8 @@ static int virtio_9p_rootdir_parser(const struct option *opt, const char *arg, i static const struct option options[] = { OPT_GROUP(Basic options:), + OPT_STRING('\0', name, guest_name, guest name, + A name for the guest), OPT_INTEGER('c', cpus, nrcpus, Number of CPUs), OPT_U64('m', mem, ram_size, Virtual machine memory size in MiB.), OPT_CALLBACK('d', disk, NULL, image, Disk image, img_name_parser), @@ -546,7 +549,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) term_init(); - kvm = kvm__init(kvm_dev, ram_size); + kvm = kvm__init(kvm_dev, ram_size, guest_name); ioeventfd__init(); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index c400c70..23d31a3 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -31,6 +31,7 @@ #include asm/unistd.h #define DEFINE_KVM_EXIT_REASON(reason) [reason] = #reason +#define KVM_PID_FILE_PATH ~/.kvm-tools/ const char *kvm_exit_reasons[] = { DEFINE_KVM_EXIT_REASON(KVM_EXIT_UNKNOWN), @@ -113,11 +114,60 @@ static struct kvm *kvm__new(void) return kvm; } +static void kvm__create_pidfile(struct kvm *kvm) +{ + int fd; + char full_name[PATH_MAX], pid[10]; + + if (!kvm-name) + return; + + mkdir(KVM_PID_FILE_PATH, 0777); + sprintf(full_name, %s/%s.pid, KVM_PID_FILE_PATH, kvm-name); + fd = open(full_name, O_CREAT | O_WRONLY, 0666); + sprintf(pid, %u\n, getpid()); + if (write(fd, pid, strlen(pid)) = 0) + die(Failed creating PID file); + close(fd); +} + +static void kvm__remove_pidfile(struct kvm *kvm) +{ + char full_name[PATH_MAX]; + + if (!kvm-name) + return; + + sprintf(full_name, %s/%s.pid, KVM_PID_FILE_PATH, kvm-name); + unlink(full_name); +} + +int kvm__get_pid_by_instance(const char *name) +{ + int fd, pid; + char pid_str[10], pid_file[PATH_MAX]; + + sprintf(pid_file, %s/%s.pid, KVM_PID_FILE_PATH, name); + fd = open(pid_file, O_RDONLY); + if (fd 0) + return -1; + + if (read(fd, pid_str, 10) == 0) + return -1; + + pid = atoi(pid_str); + if (pid 0) + return -1; + + return pid; +} + void kvm__delete(struct kvm *kvm) { kvm__stop_timer(kvm); munmap(kvm-ram_start, kvm-ram_size); + kvm__remove_pidfile(kvm); free(kvm); } @@ -237,7 +287,7 @@ int kvm__max_cpus(struct kvm *kvm) return ret; } -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size) +struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) { struct kvm_pit_config pit_config = { .flags = 0, }; struct kvm *kvm; @@ -300,6 +350,10 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size) if (ret 0) die_perror(KVM_CREATE_IRQCHIP ioctl); + kvm-name = name; + + kvm__create_pidfile(kvm); + return kvm; } diff --git a/tools/kvm/term.c b/tools/kvm/term.c index 9947223..a0cb03f 100644 --- a/tools/kvm/term.c +++ b/tools/kvm/term.c @@ -9,7 +9,9 @@ #include kvm/read-write.h #include kvm/term.h #include kvm/util.h +#include kvm/kvm.h +extern struct kvm *kvm; static struct termios
[PATCH v2 5/8] kvm tools: Provide instance name when running 'kvm pause'
Instead of sending a signal to the first instance found, send it to a specific instance. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/kvm-pause.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/kvm/kvm-pause.c b/tools/kvm/kvm-pause.c index fdf8714..0cb6f29 100644 --- a/tools/kvm/kvm-pause.c +++ b/tools/kvm/kvm-pause.c @@ -5,9 +5,18 @@ #include kvm/util.h #include kvm/kvm-cmd.h #include kvm/kvm-pause.h +#include kvm/kvm.h int kvm_cmd_pause(int argc, const char **argv, const char *prefix) { - signal(SIGUSR2, SIG_IGN); - return system(kill -USR2 $(pidof kvm)); + int pid; + + if (argc != 1) + die(Usage: kvm debug [instance name]\n); + + pid = kvm__get_pid_by_instance(argv[0]); + if (pid 0) + die(Failed locating instance name); + + return kill(pid, SIGUSR2); } -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/8] kvm tools: Provide instance name when running 'kvm debug'
Instead of sending a signal to the first instance found, send it to a specific instance. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/kvm-debug.c | 19 +++ 1 files changed, 15 insertions(+), 4 deletions(-) diff --git a/tools/kvm/kvm-debug.c b/tools/kvm/kvm-debug.c index 58782dd..432ae84 100644 --- a/tools/kvm/kvm-debug.c +++ b/tools/kvm/kvm-debug.c @@ -1,11 +1,22 @@ -#include stdio.h -#include string.h - #include kvm/util.h #include kvm/kvm-cmd.h #include kvm/kvm-debug.h +#include kvm/kvm.h + +#include stdio.h +#include string.h +#include signal.h int kvm_cmd_debug(int argc, const char **argv, const char *prefix) { - return system(kill -3 $(pidof kvm)); + int pid; + + if (argc != 1) + die(Usage: kvm debug [instance name]\n); + + pid = kvm__get_pid_by_instance(argv[0]); + if (pid 0) + die(Failed locating instance name); + + return kill(pid, SIGQUIT); } -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/8] kvm tools: Add virtio-balloon device
From the virtio spec: The virtio memory balloon device is a primitive device for managing guest memory: the device asks for a certain amount of memory, and the guest supplies it (or withdraws it, if the device has more than it asks for). This allows the guest to adapt to changes in allowance of underlying physical memory. To activate the virtio-balloon device run kvm tools with the '--balloon' command line parameter. Current implementation listens for two signals: - SIGKVMADDMEM: Adds 1M to the balloon driver (inflate). This will decrease available memory within the guest. - SIGKVMDELMEM: Remove 1M from the balloon driver (deflate). This will increase available memory within the guest. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile |1 + tools/kvm/include/kvm/kvm.h|3 + tools/kvm/include/kvm/virtio-balloon.h |8 + tools/kvm/include/kvm/virtio-pci-dev.h |1 + tools/kvm/kvm-run.c|6 + tools/kvm/virtio/balloon.c | 265 6 files changed, 284 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/include/kvm/virtio-balloon.h create mode 100644 tools/kvm/virtio/balloon.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 6d6a0a4..1ec75da 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -40,6 +40,7 @@ OBJS += virtio/console.o OBJS += virtio/core.o OBJS += virtio/net.o OBJS += virtio/rng.o +OBJS+= virtio/balloon.o OBJS += disk/blk.o OBJS += disk/qcow.o OBJS += disk/raw.o diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 5ad3236..1fdfcf7 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -6,6 +6,7 @@ #include stdbool.h #include linux/types.h #include time.h +#include signal.h #define KVM_NR_CPUS(255) @@ -17,6 +18,8 @@ #define SIGKVMEXIT (SIGRTMIN + 0) #define SIGKVMPAUSE(SIGRTMIN + 1) +#define SIGKVMADDMEM (SIGRTMIN + 2) +#define SIGKVMDELMEM (SIGRTMIN + 3) struct kvm { int sys_fd; /* For system ioctls(), i.e. /dev/kvm */ diff --git a/tools/kvm/include/kvm/virtio-balloon.h b/tools/kvm/include/kvm/virtio-balloon.h new file mode 100644 index 000..eb49fd4 --- /dev/null +++ b/tools/kvm/include/kvm/virtio-balloon.h @@ -0,0 +1,8 @@ +#ifndef KVM__BLN_VIRTIO_H +#define KVM__BLN_VIRTIO_H + +struct kvm; + +void virtio_bln__init(struct kvm *kvm); + +#endif /* KVM__BLN_VIRTIO_H */ diff --git a/tools/kvm/include/kvm/virtio-pci-dev.h b/tools/kvm/include/kvm/virtio-pci-dev.h index ca373df..4eee831 100644 --- a/tools/kvm/include/kvm/virtio-pci-dev.h +++ b/tools/kvm/include/kvm/virtio-pci-dev.h @@ -12,6 +12,7 @@ #define PCI_DEVICE_ID_VIRTIO_BLK 0x1001 #define PCI_DEVICE_ID_VIRTIO_CONSOLE 0x1003 #define PCI_DEVICE_ID_VIRTIO_RNG 0x1004 +#define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 #define PCI_DEVICE_ID_VIRTIO_P90x1009 #define PCI_DEVICE_ID_VESA 0x2000 diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index 56c39ab..a7f010c 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -18,6 +18,7 @@ #include kvm/virtio-net.h #include kvm/virtio-console.h #include kvm/virtio-rng.h +#include kvm/virtio-balloon.h #include kvm/disk-image.h #include kvm/util.h #include kvm/pci.h @@ -74,6 +75,7 @@ static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; static bool vnc; static bool sdl; +static bool balloon; extern bool ioport_debug; extern int active_console; extern int debug_iodelay; @@ -145,6 +147,7 @@ static const struct option options[] = { OPT_STRING('\0', kvm-dev, kvm_dev, kvm-dev, KVM device file), OPT_CALLBACK('\0', virtio-9p, NULL, dirname,tag_name, Enable 9p over virtio, virtio_9p_rootdir_parser), + OPT_BOOLEAN('\0', balloon, balloon, Enable virtio balloon), OPT_BOOLEAN('\0', vnc, vnc, Enable VNC framebuffer), OPT_BOOLEAN('\0', sdl, sdl, Enable SDL framebuffer), @@ -629,6 +632,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) while (virtio_rng--) virtio_rng__init(kvm); + if (balloon) + virtio_bln__init(kvm); + if (!network) network = DEFAULT_NETWORK; diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c new file mode 100644 index 000..ab9ccb7 --- /dev/null +++ b/tools/kvm/virtio/balloon.c @@ -0,0 +1,265 @@ +#include kvm/virtio-balloon.h + +#include kvm/virtio-pci-dev.h + +#include kvm/disk-image.h +#include kvm/virtio.h +#include kvm/ioport.h +#include kvm/util.h +#include kvm/kvm.h +#include kvm/pci.h +#include kvm/threadpool.h +#include kvm/irq.h +#include kvm/ioeventfd.h + +#include linux/virtio_ring.h +#include linux/virtio_balloon.h + +#include
[PATCH v2 8/8] kvm tools: Add 'kvm balloon' command
Add a command to allow easily inflate/deflate the balloon driver in running instances. Usage: kvm balloon [command] [instance name] [size] command is either inflate or deflate, and size is represented in MB. Target instance must be named (started with '--name'). Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile |1 + tools/kvm/include/kvm/kvm-balloon.h |6 ++ tools/kvm/kvm-balloon.c | 34 ++ tools/kvm/kvm-cmd.c | 12 +++- tools/kvm/virtio/balloon.c |8 5 files changed, 52 insertions(+), 9 deletions(-) create mode 100644 tools/kvm/include/kvm/kvm-balloon.h create mode 100644 tools/kvm/kvm-balloon.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 1ec75da..90ad708 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -58,6 +58,7 @@ OBJS += kvm-cmd.o OBJS += kvm-debug.o OBJS += kvm-help.o OBJS+= kvm-pause.o +OBJS+= kvm-balloon.o OBJS += kvm-run.o OBJS += mptable.o OBJS += rbtree.o diff --git a/tools/kvm/include/kvm/kvm-balloon.h b/tools/kvm/include/kvm/kvm-balloon.h new file mode 100644 index 000..f5f92b9 --- /dev/null +++ b/tools/kvm/include/kvm/kvm-balloon.h @@ -0,0 +1,6 @@ +#ifndef KVM__BALLOON_H +#define KVM__BALLOON_H + +int kvm_cmd_balloon(int argc, const char **argv, const char *prefix); + +#endif diff --git a/tools/kvm/kvm-balloon.c b/tools/kvm/kvm-balloon.c new file mode 100644 index 000..277cada --- /dev/null +++ b/tools/kvm/kvm-balloon.c @@ -0,0 +1,34 @@ +#include stdio.h +#include string.h +#include signal.h + +#include kvm/util.h +#include kvm/kvm-cmd.h +#include kvm/kvm-balloon.h +#include kvm/kvm.h + +int kvm_cmd_balloon(int argc, const char **argv, const char *prefix) +{ + int pid; + int amount, i; + int inflate = 0; + + if (argc != 3) + die(Usage: kvm balloon [command] [instance name] [amount]\n); + + pid = kvm__get_pid_by_instance(argv[1]); + if (pid 0) + die(Failed locating instance name); + + if (strcmp(argv[0], inflate) == 0) + inflate = 1; + else if (strcmp(argv[0], deflate)) + die(command can be either 'inflate' or 'deflate'); + + amount = atoi(argv[2]); + + for (i = 0; i amount; i++) + kill(pid, inflate ? SIGKVMADDMEM : SIGKVMDELMEM); + + return 0; +} diff --git a/tools/kvm/kvm-cmd.c b/tools/kvm/kvm-cmd.c index ffbc4ff..1598781 100644 --- a/tools/kvm/kvm-cmd.c +++ b/tools/kvm/kvm-cmd.c @@ -7,16 +7,18 @@ /* user defined header files */ #include kvm/kvm-debug.h #include kvm/kvm-pause.h +#include kvm/kvm-balloon.h #include kvm/kvm-help.h #include kvm/kvm-cmd.h #include kvm/kvm-run.h struct cmd_struct kvm_commands[] = { - { pause, kvm_cmd_pause, NULL, 0 }, - { debug, kvm_cmd_debug, NULL, 0 }, - { help, kvm_cmd_help, NULL, 0 }, - { run, kvm_cmd_run, kvm_run_help, 0 }, - { NULL,NULL, NULL, 0 }, + { pause, kvm_cmd_pause, NULL, 0 }, + { debug, kvm_cmd_debug, NULL, 0 }, + { balloon,kvm_cmd_balloon,NULL, 0 }, + { help, kvm_cmd_help, NULL, 0 }, + { run,kvm_cmd_run,kvm_run_help, 0 }, + { NULL, NULL, NULL, 0 }, }; /* diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c index ab9ccb7..854d04b 100644 --- a/tools/kvm/virtio/balloon.c +++ b/tools/kvm/virtio/balloon.c @@ -39,7 +39,7 @@ struct bln_dev { /* virtio queue */ u16 queue_selector; struct virt_queue vqs[NUM_VIRT_QUEUES]; - void*jobs[NUM_VIRT_QUEUES]; + struct thread_pool__job jobs[NUM_VIRT_QUEUES]; struct virtio_balloon_config config; }; @@ -174,13 +174,13 @@ static bool virtio_bln_pci_io_out(struct ioport *ioport, struct kvm *kvm, u16 po vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, VIRTIO_PCI_VRING_ALIGN); - bdev.jobs[bdev.queue_selector] = thread_pool__add_job(kvm, virtio_bln_do_io, queue); + thread_pool__init_job(bdev.jobs[bdev.queue_selector], kvm, virtio_bln_do_io, queue); ioevent = (struct ioevent) { .io_addr= bdev.base_addr + VIRTIO_PCI_QUEUE_NOTIFY, .io_len = sizeof(u16), .fn = ioevent_callback, - .fn_ptr = bdev.jobs[bdev.queue_selector], + .fn_ptr = bdev.jobs[bdev.queue_selector], .datamatch = bdev.queue_selector, .fn_kvm = kvm, .fd =
[PATCH 9/9] kvm tools: Stop VCPUs before freeing struct kvm
Not stopping VCPUs before leads to seg faults and other errors due to synchronization between threads. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/term.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/tools/kvm/term.c b/tools/kvm/term.c index a0cb03f..2a3e1f0 100644 --- a/tools/kvm/term.c +++ b/tools/kvm/term.c @@ -10,6 +10,7 @@ #include kvm/term.h #include kvm/util.h #include kvm/kvm.h +#include kvm/kvm-cpu.h extern struct kvm *kvm; static struct termios orig_term; @@ -34,6 +35,7 @@ int term_getc(int who) if (term_got_escape) { term_got_escape = false; if (c == 'x') { + kvm_cpu__reboot(); kvm__delete(kvm); printf(\n # KVM session terminated.\n); exit(1); -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/8] kvm tools: Advise memory allocated for guest RAM as KSM mergable
Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/kvm.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 23d31a3..2f5d633 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -346,6 +346,8 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) if (kvm-ram_start == MAP_FAILED) die(out of memory); + madvise(kvm-ram_start, kvm-ram_size, MADV_MERGEABLE); + ret = ioctl(kvm-vm_fd, KVM_CREATE_IRQCHIP); if (ret 0) die_perror(KVM_CREATE_IRQCHIP ioctl); -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio_balloon: Notify guest only after deflating the balloon
Unless the host requires that requested pages won't be used until he us notified (VIRTIO_BALLOON_F_MUST_TELL_HOST), only notify after deflating the balloon. This will avoid having to take an exit before actually using the pages. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com --- drivers/virtio/virtio_balloon.c | 16 ++-- 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index e058ace..055f95d 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -148,14 +148,18 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) vb-num_pages--; } - /* -* Note that if -* virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); -* is true, we *have* to do it in this order +* If the host doesn't require us to notify him before using +* pages which belong to the balloon, update him only after +* freeing those pages for guest use. */ - tell_host(vb, vb-deflate_vq); - release_pages_by_pfn(vb-pfns, vb-num_pfns); + if (virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST)) { + tell_host(vb, vb-deflate_vq); + release_pages_by_pfn(vb-pfns, vb-num_pfns); + } else { + release_pages_by_pfn(vb-pfns, vb-num_pfns); + tell_host(vb, vb-deflate_vq); + } } static inline void update_stat(struct virtio_balloon *vb, int idx, -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/31] Implement user mode network for kvm tools
On 07/02/2011 04:36 AM, Pekka Enberg wrote: On Fri, Jul 1, 2011 at 7:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jul 1, 2011 at 12:38 AM, Asias He asias.he...@gmail.com wrote: On 06/30/2011 04:56 PM, Stefan Hajnoczi wrote: On Thu, Jun 30, 2011 at 9:40 AM, Asias He asias.he...@gmail.com wrote: uip stands for user mode {TCP,UDP}/IP. Currently, uip supports ARP, ICMP, IPV4, UDP, TCP. So any network protocols above UDP/TCP should work as well, e.g., HTTP, FTP, SSH, DNS. There is an existing uIP which might cause confusion, not sure if you've seen it. First I thought you were using that :). I heard about uIP, but this patchset have nothing to do with uIP ;-) At first I was naming the user mode network as UNET which is User mode NETwork, however, I though uip looks better because it is shorter. Anyway, if uip do cause confusion. I'd like to change this naming. It's up to you but now is the right time to do it. Consider if another program wants to reuse this code or if you ever want to make it a library, it wouldn't help to have a confusing name. I don't care too much what we use as the namespace prefix but as a directory name tools/kvm/uip is pretty meaningless. I'd just move the code under tools/kvm/net to mirror what the kernel already has. I have thought about putting user mode net code in tools/kvm/net. However, we have net code in tools/kvm/virtio as well. Is this a problem in terms of clean code organization? And I think splitting the tap code in virtio/net.c into tools/kvm/net is a good idea. Further, we can put macvtap related code into tools/kvm/net as well. -- Best Regards, Asias He -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate
On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote: On 06/29/2011 12:41 PM, Paul Mackerras wrote: +struct kvm_ppc_set_platform { +__u16 platform; /* defines the OS/hypervisor ABI */ +__u16 guest_arch; /* e.g. decimal 206 for v2.06 */ +__u32 flags; Please add some padding so we can extend it later if necessary. +}; + +/* Values for platform */ +#define KVM_PPC_PV_NONE 0 /* bare-metal, non-paravirtualized */ +#define KVM_PPC_PV_KVM 1 /* as defined in kvm_para.h */ +#define KVM_PPC_PV_SPAPR2 /* IBM Server PAPR (a la PowerVM) */ We also support BookE which would be useful to also include in the list. Furthermore, KVM is more of a feature flag than a platform. We can easily support KVM extensions on an SPAPR platform, no? Yes, I guess so. The hypercall sequence will have to be different, since ordinary system call interrupts go straight to the guest. But I guess you've allowed for that with the hypercall sequence property in the device tree. This whole interface also could deprecate the PVR setting one, so we can simply include PVR as well and not require kernel space to jump through hoops to figure out its capabilities. I debated about whether to include a PVR value in this structure. The thing is that POWER7 has the Processor Compatibility Register (PCR), which has a bit which makes the processor behave in user mode as if it were a POWER6. So, we could run a book3s_hv guest in POWER6 mode by setting this bit (which we might want to do to run older distros). However, this bit doesn't affect the PVR value that the guest sees. That's why I went for an architecture level rather than a specific PVR value. We could go with a PVR value and use the logical PVR values defined in PAPR to represent architecture levels, e.g. 0x0f02 for architecture v2.05 (POWER6). And we need to identify 32-bit BookS processors, so we can go into 32-bit mode when necessary. That should also be a different guest_arch, right? Right. If we go with a PVR value then we just use the PVR value for a suitable 32-bit processor. + +/* Values for flags */ +#define KVM_PPC_CROSS_ARCH 1 /* guest architecture != host */ User space shouldn't have to worry about this one. It's up to the kernel to decide that it's cross. I put that in because we might want to force the use of book3s_pr, for example if we know we're going to want to do emulated MMIO or something else that isn't implemented in book3s_hv just yet. Ultimately, yes, the kernel should be able to decide whether it's cross or not. However, I don't think we should make it completely opaque to userspace as to whether the kernel is using _pr or _hv. If nothing else, userspace should be able to find out and tell the user so that performance expectations can be set correctly. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/17] KVM: PPC: Deliver program interrupts right away instead of queueing them
On 29.06.2011, at 12:18, Paul Mackerras wrote: Doing so means that we don't have to save the flags anywhere and gets rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c. Doing so is OK because a program interrupt won't be generated at the same time as any other synchronous interrupt. If a program interrupt and an asynchronous interrupt (external or decrementer) are generated at the same time, the program interrupt will be delivered, which is correct because it has a higher priority, and then the asynchronous interrupt will be masked. We don't ever generate system reset or machine check interrupts to the guest, but if we did, then we would need to make sure they got delivered rather than the program interrupt. The current code would be wrong in this situation anyway since it would deliver the program interrupt as well as the reset/machine check interrupt. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s.c |8 +++- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 163e3e1..f68a34d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -129,8 +129,8 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { - to_book3s(vcpu)-prog_flags = flags; Now that prog_flags is unused, please remove it from the headers. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/17] Hypervisor-mode KVM on POWER7 and PPC970
On 29.06.2011, at 12:15, Paul Mackerras wrote: The first patch of the following series is a pure bug-fix for 32-bit kernels. The remainder of the following series of patches enable KVM to exploit the hardware hypervisor mode on 64-bit Power ISA Book3S machines. At present, POWER7 and PPC970 processors are supported. (Note that the PPC970 processors in Apple G5 machines don't have a usable hypervisor mode and are not supported by these patches.) Running the KVM host in hypervisor mode means that the guest can use both supervisor mode and user mode. That means that the guest can execute supervisor-privilege instructions and access supervisor- privilege registers. In addition the hardware directs most exceptions to the guest. Thus we don't need to emulate any instructions in the host. Generally, the only times we need to exit the guest are when it does a hypercall or when an external interrupt or host timer (decrementer) interrupt occurs. The focus of this KVM implementation is to run guests that use the PAPR (Power Architecture Platform Requirements) paravirtualization interface, which is the interface supplied by PowerVM on IBM pSeries machines. Currently the pseries machine type in qemu is only supported by book3s_hv KVM, and book3s_hv KVM only supports the pseries machine type. That will hopefully change in future. These patches are against the master branch of the kvm tree. Something seems to be broken with signals. When running without io-thread, I can't even do ctrl-c on -nographic while the guest is in sleep mode. But that might not be related to your patches. I've applied 01-16 now. Sending them through some more testing and if they're good, sending a pull request. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html