Re: [libvirt] [virt-devel] Cache and memory bandwidth graphs: native versus guest
Bill Gray wrote: See attachment with two graphs: (1) cache bandwidth, (2) blowup of sustained memory bandwidth region... Bill, I had some difficulty with this document under ooffice. A recent version seized and an older version didn't seem to render correctly. Could you export it as pdf? Thanks, -john - X axis has a log scale - Light blue line is an older system with 32K L1 and 6M L2 caches - All other measurements on perf34: 32K L1, 256K L2, 30M L3 caches - Majority of variation in L1 cache region is from the two guest measurements done with no taskset to a VCPU: yellow and maroon lines. Perhaps this reflects the test bouncing between VCPUs in the guest. - The sustained memory bandwidth for the guest with no pinning is only 80% of native (maroon line), which motivates more convenient and comprehensive numactl for guests. - Virtualized bandwidth is otherwise nearly in line with native, which confirms the importance of the virtual CPUID communicating actual native cache sizes to cache-size-aware guest applications, since guest apps could benefit from the full size of the native cache. (Guest was started with -cpu host, but lscpu in guest showed 4M cache despite actual 30M cache.) -- john.coo...@redhat.com -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] Add huge page support to libvirt..
This patch allows passing of a -mem-path arg flag to qemu for support of huge page backed guests. A guest may request this option via specifying: hugepageon/hugepage in its domain definition xml file. The request for huge page backing will be attempted within libvirt if the host system has indicated a hugetlbfs mount point in qemu.conf, for example: hugepage_mount = /hugetlbfs _and_ the target qemu executable is aware of the -mem-path flag. Otherwise this request by a guest will result in an error. This patch does not address setup of the required host hugetlbfs mount point, verifying the mount point is correct/usable, nor assure sufficient free huge pages are available; which are assumed to be addressed by other means. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/src/domain_conf.c b/src/domain_conf.c index f3e4c6c..04d6911 100644 --- a/src/domain_conf.c +++ b/src/domain_conf.c @@ -2369,6 +2369,17 @@ static virDomainDefPtr virDomainDefParseXML(virConnectPtr conn, if (virXPathULong(conn, string(./currentMemory[1]), ctxt, def-memory) 0) def-memory = def-maxmem; +tmp = virXPathString(conn, string(./hugepage[1]), ctxt); +if (!tmp || STREQ(tmp, off)) +def-hugepage_backed = 0; +else if (STREQ(tmp, on)) +def-hugepage_backed = 1; +else { +virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR, + _(invalid hugepage mode %s), tmp); +goto error; +} + if (virXPathULong(conn, string(./vcpu[1]), ctxt, def-vcpus) 0) def-vcpus = 1; @@ -3933,6 +3944,8 @@ char *virDomainDefFormat(virConnectPtr conn, virBufferVSprintf(buf, memory%lu/memory\n, def-maxmem); virBufferVSprintf(buf, currentMemory%lu/currentMemory\n, def-memory); +if (def-hugepage_backed) +virBufferVSprintf(buf, hugepage%s/hugepage\n, on); for (n = 0 ; n def-cpumasklen ; n++) if (def-cpumask[n] != 1) diff --git a/src/domain_conf.h b/src/domain_conf.h index 6e111fa..d6bdcdb 100644 --- a/src/domain_conf.h +++ b/src/domain_conf.h @@ -481,6 +481,7 @@ struct _virDomainDef { unsigned long memory; unsigned long maxmem; +unsigned char hugepage_backed; unsigned long vcpus; int cpumasklen; char *cpumask; diff --git a/src/qemu.conf b/src/qemu.conf index 3009725..a3387f1 100644 --- a/src/qemu.conf +++ b/src/qemu.conf @@ -95,3 +95,10 @@ # The group ID for QEMU processes run by the system instance #group = root + +# If provided by the host and this hugetlbfs mount point is +# configured, a guest may request huge page backing. When this +# mount point is undefined, huge page backing is disabled. + +hugepage_mount = /hugetlbfs + diff --git a/src/qemu_conf.c b/src/qemu_conf.c index 4043d70..632b784 100644 --- a/src/qemu_conf.c +++ b/src/qemu_conf.c @@ -218,6 +218,17 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, } VIR_FREE(group); +p = virConfGetValue (conf, hugepage_mount); +CHECK_TYPE (hugepage_mount, VIR_CONF_STRING); +if (p p-str) { +VIR_FREE(driver-hugepage_mount); +if (!(driver-hugepage_mount = strdup(p-str))) { +virReportOOMError(NULL); +virConfFree(conf); +return -1; +} +} + virConfFree (conf); return 0; } @@ -500,6 +511,8 @@ static unsigned int qemudComputeCmdFlags(const char *help, flags |= QEMUD_CMD_FLAG_VGA; if (strstr(help, boot=on)) flags |= QEMUD_CMD_FLAG_DRIVE_BOOT; +if (strstr(help, -mem-path)) +flags |= QEMUD_CMD_FLAG_MEM_PATH; if (version = 9000) flags |= QEMUD_CMD_FLAG_VNC_COLON; @@ -1125,6 +1138,15 @@ int qemudBuildCommandLine(virConnectPtr conn, ADD_ARG_LIT(-no-kvm); ADD_ARG_LIT(-m); ADD_ARG_LIT(memory); +if (def-hugepage_backed) { + if (!driver-hugepage_mount || !(qemuCmdFlags QEMUD_CMD_FLAG_MEM_PATH)) { +qemudReportError(conn, NULL, NULL, VIR_ERR_NO_SUPPORT, + %s, _(hugepage backing not supported)); +goto error; + } + ADD_ARG_LIT(-mem-path); + ADD_ARG_LIT(driver-hugepage_mount); +} ADD_ARG_LIT(-smp); ADD_ARG_LIT(vcpus); diff --git a/src/qemu_conf.h b/src/qemu_conf.h index fbf2ab9..847597f 100644 --- a/src/qemu_conf.h +++ b/src/qemu_conf.h @@ -58,6 +58,7 @@ enum qemud_cmd_flags { QEMUD_CMD_FLAG_KVM = (1 13), /* Whether KVM is compiled in */ QEMUD_CMD_FLAG_DRIVE_FORMAT = (1 14), /* Is -drive format= avail */ QEMUD_CMD_FLAG_VGA = (1 15), /* Is -vga avail */ +QEMUD_CMD_FLAG_MEM_PATH = (1 16), /* mmap'ped guest backing supported */ }; /* Main driver state */ @@ -86,6 +87,7 @@ struct qemud_driver { char *vncListen; char *vncPassword; char *vncSASLdir; +char *hugepage_mount; virCapsPtr caps; diff --git a/src/qemu_driver.c b/src/qemu_driver.c
Re: [libvirt] [PATCH] Add huge page support to libvirt..
Mark McLoughlin wrote: Other options include: - hugepages/ - memory hugepages=yesX/memory Yes, I'd expect additional options will need to be addressed. Currently the only additional qemu-resident knob is the -mem-prealloc flag which is enabled by default. I've removed support for it here in the interest of simplicity but it will fall into the existing scheme. hugepage_mount = /hugetlbfs I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously unstandard whiff about it That was intended as an example for no other reason than it is what I happen to use. I suspected it wouldn't be the mount point's final resting place. Thanks, -john -- john.coo...@redhat.com -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Add huge page support to libvirt..
Daniel P. Berrange wrote: On Wed, Jul 22, 2009 at 09:25:02PM -0400, john cooper wrote: This patch allows passing of a -mem-path arg flag to qemu for support of huge page backed guests. A guest may request this option via specifying: hugepageon/hugepage in its domain definition xml file. This really opens a can of worms. While obviously this maps very simply onto KVM's -mem-path argument, I can't help thinking things are going to get much more advanced later. For example, I don't think a boolean on/off is sufficient for this, since Xen already has a 3rd option of 'best effort' it uses by default where it'll try to allocate hugepages and fallback to normal pages - in fact you can't tell Xen not to use hugepages AFAIK. I agree growing beyond a simple on/off switch is likely. The patch originally had a prealloc option (since removed) to accommodate passing of the existing -mem-prealloc flag. That option defaults to enabled, which is desired in general and therefore was dropped for the sake of patch simplicity. I'm also wondering whether we need to be concerned about different hugepage sizes for guest configs eg 2M vs 1 GB, vs a mix of both - who decides? Not a consideration currently, but it does underscore the argument for a more extensible option syntax. KVM also seems to have ability to request that huge pages are pre-allocated upfront, vs on demand, though I'm not sure what happens to a VM if it doesn't pre-allocate and it later can't be satisfied. That was the motivation for -mem-prealloc, prior to which a VM would SIGSEGV terminate if a huge page fault couldn't be satisfied during runtime. The (now default) preallocation behavior guarantees the guest has its working set present at startup but at the potential cost of overly pessimistic memory allocation. The request for huge page backing will be attempted within libvirt if the host system has indicated a hugetlbfs mount point in qemu.conf, for example: hugepage_mount = /hugetlbfs Seems like it would be simpler to just open /proc/mounts and scan it to find whether/where hugetlbfs is mounted, so it would 'just work' if the user had mounted it. Checking /proc/mounts solely seemed a bit too speculative which is where the qemu.conf option arose. But I can see both being useful as in checking whether the qemu.conf mount point exists, otherwise attempting to glean the information from /proc/mounts, and if neither are satisfied flagging the error. It looks like argument is not available in upstream QEMU, only part of the KVM fork ? ANy idea why it hasn't been sent upstream, and/or whether it will be soon. I'd hazard due to the existing huge page support being closely tied to kvm and no motivation as of yet to reconcile this upstream. I'm loathe to add more KVM specific options since we've been burnt everytime we've done this in the past with its semantics changing when merged to QEMU :-( Quite understandable. It is also why I was attempting to be as generic (and simple) as possible here and not excessively cast the existing kvm implementation specifics into the exported libvirt option. I agree that setting up hugetlbfs is out of scope for libvirt. We should just probe to see whether its available or not. We ought to have some way of reporting available hugepages though, both at a host level, and likely per NUMA node too. Without this a mgmt app using libvirt has no clue whether they'll be able to actually use hugepages successfully or not. Agree. Extracting this host information will be needed when more comprehensive management of the same exists. Still it would seem a case of best-effort enforcement without some sort of additional coordination. The process of gleaning the number of free huge pages from the host and launching of the guest currently has an inherent race. Thanks, -john -- john.coo...@redhat.com -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Support configuration of huge pages in guests
Daniel P. Berrange wrote: * configure.in: Add check for mntent.h * qemud/libvirtd_qemu.aug, qemud/test_libvirtd_qemu.aug, src/qemu.conf Add 'hugetlbfs_mount' config parameter * src/qemu_conf.c, src/qemu_conf.h: Check for -mem-path flag in QEMU, and pass it when hugepages are requested. Load hugetlbfs_mount config parameter, search for mount if not given. * src/qemu_driver.c: Free hugetlbfs_mount/path parameter in driver shutdown. Create directory for QEMU hugepage usage, chowning if required. * docs/formatdomain.html.in: Document memoryBacking/hugepages elements * docs/schemas/domain.rng: Add memoryBacking/hugepages elements to schema * src/util.c, src/util.h, src/libvirt_private.syms: Add virFileFindMountPoint helper API * tests/qemuhelptest.c: Add -mem-path constants * tests/qemuxml2argvtest.c, tests/qemuxml2xmltest.c: Add tests for hugepage handling * tests/qemuxml2argvdata/qemuxml2argv-hugepages.xml, tests/qemuxml2argvdata/qemuxml2argv-hugepages.args: Data files for hugepage tests --- configure.in |2 +- docs/formatdomain.html |8 +++- docs/formatdomain.html.in |8 +++ docs/schemas/domain.rng|9 qemud/libvirtd_qemu.aug|1 + qemud/test_libvirtd_qemu.aug |4 ++ src/domain_conf.c | 10 - src/domain_conf.h |1 + src/libvirt_private.syms |1 + src/qemu.conf | 13 + src/qemu_conf.c| 49 src/qemu_conf.h|3 + src/qemu_driver.c | 34 ++ src/util.c | 37 ++- src/util.h |4 ++ tests/qemuhelptest.c |6 ++- tests/qemuxml2argvdata/qemuxml2argv-hugepages.args |1 + tests/qemuxml2argvdata/qemuxml2argv-hugepages.xml | 25 ++ tests/qemuxml2argvtest.c |7 ++- tests/qemuxml2xmltest.c|1 + 20 files changed, 217 insertions(+), 7 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-hugepages.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-hugepages.xml I did a fetch on libvirt.git before reviewing and it appears there is some code motion relative to the version this patch was against. Although AFAICT nothing which appears to result in more than patch bounce. Looks good to me. ACK. -john -- john.coo...@redhat.com -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Support for CPUID masking
Dor Laor wrote: What about another approach for the cpuid issue: I think that dealing with specific flags is pretty error prone on all levels - virt-mgr, libvirt, qemu, migration, and even the guest. ..and performance verification, QA, and the average end user. Unless we reduce all possible combinations of knob settings into a few well thought out lumped models the complexity can be overwhelming. It is probably a reasonable compromise to initially support the most obvious use cases with more fringe models added on an as-needed, as-justified basis. We can't just 'invent' new cpu modules that will be a combination of flags we have on the host. It might work for some kernels/apps but it might break others. In addition to cpuid we have the stepping and more MSRs that needed to be hidden/exposed. In whatever abstraction, exposing all of that low-level detail as the sole configuration means will needlessly exaggerate the complexity as above. The offer: Libvirt, virt-mgr, qemu will not deal with lowlevel bits of the cpuid. We'll use predefined cpu modules to be emulated. These predefined modules will represent a real module that was once shipped by Intel/AMD. We'll write an additional tool, not under libvirt, that will be able to calculate all the virtual cpus that can run over the physical cpu. This tool will be the one messing with /proc/cpuinfo, etc. Example (theoretical): Physical cpu is Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz the output of the tool will be: shellcomputeVirtCpuCapabilities core2duo core solo 486 pentium3 .. Libvirt will only expose the real physical cpu and all of the outputs above. Higher level mgmt will compute the best -cpu to pick for the VM, and it will take account the user needs for performance or for migration flexibility. The cons: - Simple for all levels - Fast implementation - Accurate representative of real cpus The pros: - none - we should write the tool anyway - Maybe we hide some of the possibilities, but as said above, it is safer and friendlier. I think paving over the complexity to export the most common use cases is a reasonable approach. We can allow some sort of fingers-in-the-gears mode for experimentation and tuning as needed. But supporting features such as safe migration could be a non-goal in these scenarios. -john I also recommend of adding a field to be added next to the cpu for flexibility and possible future changes + dealing with problematic bios with NX bit disabled. Regards, Dor Thanks Mukesh On Thu, Sep 3, 2009 at 3:09 PM, Daniel P. Berrangeberra...@redhat.com wrote: On Thu, Sep 03, 2009 at 11:19:47AM +0300, Dor Laor wrote: On 09/02/2009 07:09 PM, Daniel P. Berrange wrote: On Wed, Sep 02, 2009 at 11:59:39AM -0400, Jim Paris wrote: Jiri Denemark wrote: Hi, We need to provide support for CPU ID masking. Xen and VMware ESX are examples of current hypervisors which support such masking. My proposal is to define new 'cpuid' feature advertised in guest capabilities: ... domain type='xen' id='42' ... features pae/ acpi/ apic/ cpuid mask level='1' register='ebx' :::1010:::: /mask ... What are your opinions about this? I think it's too low-level, and the structure is x86-specific. QEMU and KVM compute their CPUID response based on arguments to the -cpu argument, e.g.: -cpu core2duo,model=23,+ssse3,+lahf_lm I think a similar structure makes more sense for libvirt, where the configuration generally avoids big blocks of binary data, and the XML format should suit other architectures as well. I'm going backforth on this too. We essentially have 3 options - Named CPU + flags/features - CPUID masks - Allow either If we do either of the first two, we have to translate between the two formats for one or more of the hypervisors. For the last one we are just punting the problem off to applications. If we choose CPUID, and made QEMU driver convert to named CPU + flags we'd be stuck for non-x86 as you say. Why is that? cpu model + flags may apply for other arch too. If we have CPUID in the XML, there is no meaningful CPUID register representation for sparc/ppc/arm/etc. It is an x86 concept, which is almost certainly why QEMU uses named CPU models + named flags instead of CPUID as is public facing config. Xen/VMWare of course don't have this limitation since they only really care about x86. So really QEMU's CPU model + flags approach is more generic albeit being much more verbose to achieve the same level of expressivity. If we chose named CPU + flags, and made VMWare/Xen convert to raw CPUID we'd potentially loose information if user had defined a config with a raw CPUID mask outside context of libvirt. The other thing to remember is that CPUID also encodes sockets/cores/ threads
[libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.
We have found certain application scenarios where overriding of the default qemu host cache mode provides a substantial improvement in guest performance. In particular, disabling host caching of the file/dev backing a guest drive. A summary of performance metrics may be found below. Attached is a simple patch which adds general support for a cache=* attribute to be passed to qemu from libvirt as part of a xml driver element. It is parsed in a domain's xml definition as well as being generated during output of the same. For example: disk type='file' device='disk' source file='/guest_roots/fc8.root'/ target dev='hda'/ driver name='qemu' type='qwerty' cache='none'/ /disk where both the existing type=* and added cache=* attributes are optional and independent. Note this is only intended to serve as an internal hook allowing override of qemu's default cache behavior. At the present it is not proposed as an end-user documented/visible feature. -john Mark Wagner wrote: This data is provided is provided by the Red Hat Performance team. It is intended to support the request for adding the required functionality to libvirt to allow for setting a storage parameter to control the caching behavior of QEMU. This value would be passed on the QEMU command line. Our testing of commercial databases in a larger, Enterprise configuration (MSA, 16 cores, etc) has shown that the default QEMU caching behavior is not always the best. This set of data compares the drop off in performance of both writethrough and writeback cache compared to the nocache option. The main metric is transactions per minute (tpm). The other axis is the number of simulated users (x1000) % nocache TPM Results K Users WT WB 10 60% 64% 20 66% 71% 40 68% 72% 60 71% 75% 80 74% 79% 100 76% 83% From the above set of data, it is clear that the default behavior of QEMU is the worst performer out of the three cache options for this type of use case. It is also clear that we at minimum, 25% of the possible TPM performance just due to the cache setting. -- john.coo...@redhat.com domain_conf.c | 32 +--- domain_conf.h | 15 +++ qemu_conf.c | 32 3 files changed, 68 insertions(+), 11 deletions(-) = --- a/src/domain_conf.h +++ b/src/domain_conf.h @@ -80,6 +80,20 @@ enum virDomainDiskBus { VIR_DOMAIN_DISK_BUS_LAST }; +/* summary of all possible host open file cache modes + */ +typedef enum { +VIR_DOMAIN_DISK_CACHE_DEFAULT = 0, +VIR_DOMAIN_DISK_CACHE_DISABLE, +VIR_DOMAIN_DISK_CACHE_WRITEBACK, +VIR_DOMAIN_DISK_CACHE_WRITETHRU, +} host_cachemode; + +typedef struct { +char *tag; +host_cachemode idx; +} cache_map_t; + /* Stores the virtual disk configuration */ typedef struct _virDomainDiskDef virDomainDiskDef; typedef virDomainDiskDef *virDomainDiskDefPtr; @@ -91,6 +105,7 @@ struct _virDomainDiskDef { char *dst; char *driverName; char *driverType; +host_cachemode cachemode; unsigned int readonly : 1; unsigned int shared : 1; int slotnum; /* pci slot number for unattach */ = --- a/src/qemu_conf.c +++ b/src/qemu_conf.c @@ -616,6 +616,26 @@ qemudNetworkIfaceConnect(virConnectPtr c return NULL; } +/* map from internal cache mode to qemu cache arg text + */ +static cache_map_t qemu_cache_map[] = { +{none, VIR_DOMAIN_DISK_CACHE_DISABLE}, +{writeback, VIR_DOMAIN_DISK_CACHE_WRITEBACK}, +{writethrough, VIR_DOMAIN_DISK_CACHE_WRITETHRU}, +{NULL}}; + +/* return qemu arg text * corresponding to idx if found, NULL otherwise + */ +static inline char *qemu_cache_mode(host_cachemode idx) +{ +int i; + +for (i = 0; qemu_cache_map[i].tag; ++i) + if (qemu_cache_map[i].idx == idx) +return (qemu_cache_map[i].tag); +return (NULL); +} + static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev, char *buf, int buflen) @@ -946,6 +966,8 @@ int qemudBuildCommandLine(virConnectPtr virDomainDiskDefPtr disk = vm-def-disks[i]; int idx = virDiskNameToIndex(disk-dst); const char *bus = virDomainDiskQEMUBusTypeToString(disk-bus); +int nc; +char *cachemode; if (disk-bus == VIR_DOMAIN_DISK_BUS_USB) { if (disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) { @@ -980,15 +1002,17 @@ int qemudBuildCommandLine(virConnectPtr break; } -snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s%s, +nc = snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s, disk-src ? disk-src : , bus, media ? media
Re: [libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.
Daniel P. Berrange wrote: A couple of extra things needed - Addition to tests/qemuxml2argvtest.c to validate the XML to struct to QEMU ARGV conversion. - Addition to tests/qemuxml2xmltest.c to validate XML to struct to XML round-trip conversions. - Addition to the docs/libvirt.rng XML schema. Ok. Let's address the above when we're satisfied with the state of the patch as we have at least one open issue (as below). This static map function are not required. Instead the int to char * conversion (and its reverse) are automatically generated through use of our built-in enum support macros Yes you mentioned this in your earlier (off list) reply to me. I did produce a patch version using the VIR_ENUM_DECL/VIR_ENUM_IMPL macros but thought they may not be the most natural fit for the task since there is no way to multiply define a particular enum value (eg: cache none || off). Also a given hypervisor will likely support only a subset of the internally enumerated option space forcing dummy-out of unused cache modes. But that is speculative and I don't have a strong bias either way. So I've reverted to use of VIR_ENUM_DECL/VIR_ENUM_IMPL here. There have been two differnet syntaxes supported in QEMU for caching, so we need to detect this and switch between them. Originally there was just cache=on cache=off (writethrough and no caching) Now it supports cache=writeback, cache=writethrough and cache=none|off So, if we detect the earlier syntax we need to raise an error if the user requested writeback (or translate it to 'cache=off' for safety sake). I was trying to address part of this as above. But in general I don't see how we can force an error in advance of launching qemu without some information beforehand of whether we're dealing with an old vs. new syntax qemu. One way to do so is to test launch qemu --help and let it tell us what it accepts. Alternatively we can just let qemu error exit in the case an old/new qemu version doesn't recognize the alternate syntax. Other than the above I believe I've incorporated all of your remaining suggestions. Attached is an updated patch. -john -- john.coo...@redhat.com domain_conf.c | 31 --- domain_conf.h | 16 qemu_conf.c | 27 +++ 3 files changed, 63 insertions(+), 11 deletions(-) = --- a/src/domain_conf.h +++ b/src/domain_conf.h @@ -80,6 +80,21 @@ enum virDomainDiskBus { VIR_DOMAIN_DISK_BUS_LAST }; +/* summary of all possible host open file cache modes + */ +typedef enum { +VIR_DOMAIN_DISK_CACHE_UNSPECIFIED, /* reserved */ +VIR_DOMAIN_DISK_CACHE_OFF, +VIR_DOMAIN_DISK_CACHE_ON, +VIR_DOMAIN_DISK_CACHE_DISABLE, +VIR_DOMAIN_DISK_CACHE_WRITEBACK, +VIR_DOMAIN_DISK_CACHE_WRITETHRU, + +VIR_DOMAIN_DISK_CACHE_LAST +} virDomainDiskCache; + +VIR_ENUM_DECL(virDomainDiskCache) + /* Stores the virtual disk configuration */ typedef struct _virDomainDiskDef virDomainDiskDef; typedef virDomainDiskDef *virDomainDiskDefPtr; @@ -91,6 +106,7 @@ struct _virDomainDiskDef { char *dst; char *driverName; char *driverType; +int cachemode; unsigned int readonly : 1; unsigned int shared : 1; int slotnum; /* pci slot number for unattach */ = --- a/src/qemu_conf.c +++ b/src/qemu_conf.c @@ -616,6 +616,20 @@ qemudNetworkIfaceConnect(virConnectPtr c return NULL; } +/* map from internal cache mode to qemu cache arg text + * + * Note: currently this replicates virDomainDiskCache, but will need to + * error flag potential new entries in virDomainDiskCache which are + * not supported by qemu (raising exceptions as appropriate). + */ +VIR_ENUM_IMPL(qemu_cache_map, VIR_DOMAIN_DISK_CACHE_LAST, +, /* reserved -- mode not specified */ +off, /* depreciated by none */ +on, /* obsolete by writethrough */ +none, +writeback, +writethrough) + static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev, char *buf, int buflen) @@ -946,6 +960,8 @@ int qemudBuildCommandLine(virConnectPtr virDomainDiskDefPtr disk = vm-def-disks[i]; int idx = virDiskNameToIndex(disk-dst); const char *bus = virDomainDiskQEMUBusTypeToString(disk-bus); +int nc; +char *cachemode; if (disk-bus == VIR_DOMAIN_DISK_BUS_USB) { if (disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) { @@ -980,15 +996,18 @@ int qemudBuildCommandLine(virConnectPtr break; } -snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s%s, +nc = snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s, disk-src ? disk-src : , bus, media ? media : ,
Re: [libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.
Daniel P. Berrange wrote: If you loook at src/qemu_conf.c, you'll find a nice method called qemudExtractVersionInfo, which runs 'qemu -help' and checks for certain interesting command line arguments :-) That problem does seem to be crying for some type of structured interface to avoid subtle breakage should someone modify the output of --help. I'm sure I'm preaching to the choir here. So it now adapts for the cases of old syntax and writethrough as well as new syntax and on since qemu will otherwise balk at those cache flag / version combinations. One note about the enums - rather than adding old style CACHEON CACHE_OFF options to the main enum in domain_conf, just create a second enum in the qemu_conf.c file for recording the mapping of virDomainDiskCache values to old style QEMU arguments values.. As we are adapting in both directions I left the single enum representing the entire option list to simplify things. Updated patch is attached. -john -- john.coo...@redhat.com domain_conf.c | 31 --- domain_conf.h | 16 qemu_conf.c | 40 +++- qemu_conf.h |1 + 4 files changed, 76 insertions(+), 12 deletions(-) = --- a/src/domain_conf.h +++ b/src/domain_conf.h @@ -80,6 +80,21 @@ enum virDomainDiskBus { VIR_DOMAIN_DISK_BUS_LAST }; +/* summary of all possible host open file cache modes + */ +typedef enum { +VIR_DOMAIN_DISK_CACHE_UNSPECIFIED, /* reserved */ +VIR_DOMAIN_DISK_CACHE_OFF, +VIR_DOMAIN_DISK_CACHE_ON, +VIR_DOMAIN_DISK_CACHE_DISABLE, +VIR_DOMAIN_DISK_CACHE_WRITEBACK, +VIR_DOMAIN_DISK_CACHE_WRITETHRU, + +VIR_DOMAIN_DISK_CACHE_LAST +} virDomainDiskCache; + +VIR_ENUM_DECL(virDomainDiskCache) + /* Stores the virtual disk configuration */ typedef struct _virDomainDiskDef virDomainDiskDef; typedef virDomainDiskDef *virDomainDiskDefPtr; @@ -91,6 +106,7 @@ struct _virDomainDiskDef { char *dst; char *driverName; char *driverType; +int cachemode; unsigned int readonly : 1; unsigned int shared : 1; int slotnum; /* pci slot number for unattach */ = --- a/src/qemu_conf.c +++ b/src/qemu_conf.c @@ -449,7 +449,9 @@ int qemudExtractVersionInfo(const char * if (strstr(help, -domid)) flags |= QEMUD_CMD_FLAG_DOMID; if (strstr(help, -drive)) -flags |= QEMUD_CMD_FLAG_DRIVE; +flags |= QEMUD_CMD_FLAG_DRIVE | +(strstr(help, cache=writethrough|writeback|none) ? +QEMUD_CMD_FLAG_DRIVE_CACHEMODE : 0); if (strstr(help, boot=on)) flags |= QEMUD_CMD_FLAG_DRIVE_BOOT; if (version = 9000) @@ -616,6 +618,20 @@ qemudNetworkIfaceConnect(virConnectPtr c return NULL; } +/* map from internal cache mode to qemu cache arg text + * + * Note: currently this replicates virDomainDiskCache, but will need to + * error flag potential new entries in virDomainDiskCache which are + * not supported by qemu (raising exceptions as appropriate). + */ +VIR_ENUM_IMPL(qemu_cache_map, VIR_DOMAIN_DISK_CACHE_LAST, +, /* reserved -- mode not specified */ +off, /* depreciated by none */ +on, /* obsolete by writethrough */ +none, +writeback, +writethrough) + static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev, char *buf, int buflen) @@ -721,6 +737,7 @@ int qemudBuildCommandLine(virConnectPtr const char *emulator; char uuid[VIR_UUID_STRING_BUFLEN]; char domid[50]; +unsigned int qf; uname(ut); @@ -946,6 +963,7 @@ int qemudBuildCommandLine(virConnectPtr virDomainDiskDefPtr disk = vm-def-disks[i]; int idx = virDiskNameToIndex(disk-dst); const char *bus = virDomainDiskQEMUBusTypeToString(disk-bus); +int nc, cachemode; if (disk-bus == VIR_DOMAIN_DISK_BUS_USB) { if (disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) { @@ -980,15 +998,27 @@ int qemudBuildCommandLine(virConnectPtr break; } -snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s%s, +nc = snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s, disk-src ? disk-src : , bus, media ? media : , idx, bootable disk-device == VIR_DOMAIN_DISK_DEVICE_DISK - ? ,boot=on : , - disk-shared ! disk-readonly - ? ,cache=off : ); + ? ,boot=on : ); + +if (cachemode = disk-cachemode) { +if (qemudExtractVersionInfo(vm-def-emulator, NULL, qf) 0) +;/* error reported */ +else if (!(qf
Re: [libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.
Jim Meyering wrote: Hi John, I tried to apply that, but failed miserably, since all of the following was recently redone to use virBufferVSprintf rather than snprintf. Yea I suspected the code was likely seeing some motion. Thanks for bringing it forward. And it's a good thing, because with the changes below, there was potential buffer overrun: nc could end up larger than PATH_MAX. You are correct. I mistook the return value of snprintf() in the case of truncation. One remaining bit that I haven't done: add tests for this, exercising e.g., - cachemode - !cachemode (disk-shared !disk-readonly) - !cachemode (!disk-shared || disk-readonly) That logic should be correct as it exists in the patch, namely we'll generate a cache=* flag in the case cachemode was specified explicitly in the xml definition or for the preexisting case of (shared !readonly). Here if both happen to be true the explicit xml cache tag takes precedence. But with the existing code we need to remove the clause of: @@ -1007,18 +1025,34 @@ int qemudBuildCommandLine(virConnectPtr if (bootable disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) virBufferAddLit(opt, ,boot=on); - if (disk-shared !disk-readonly) - virBufferAddLit(opt, ,cache=off); if (disk-driverType) virBufferVSprintf(opt, ,fmt=%s, disk-driverType); As this results in generation of a cache=* twice on the qemu cmdline, and possibly of different dispositions. With this change I think it is good to go. The attached patch incorporates the above modification. -john -- john.coo...@redhat.com domain_conf.c | 34 ++ domain_conf.h | 16 qemu_conf.c | 43 ++- qemu_conf.h |3 ++- 4 files changed, 82 insertions(+), 14 deletions(-) = --- a/src/domain_conf.h +++ b/src/domain_conf.h @@ -81,6 +81,21 @@ enum virDomainDiskBus { VIR_DOMAIN_DISK_BUS_LAST }; +/* summary of all possible host open file cache modes + */ +typedef enum { +VIR_DOMAIN_DISK_CACHE_UNSPECIFIED, /* reserved */ +VIR_DOMAIN_DISK_CACHE_OFF, +VIR_DOMAIN_DISK_CACHE_ON, +VIR_DOMAIN_DISK_CACHE_DISABLE, +VIR_DOMAIN_DISK_CACHE_WRITEBACK, +VIR_DOMAIN_DISK_CACHE_WRITETHRU, + +VIR_DOMAIN_DISK_CACHE_LAST +} virDomainDiskCache; + +VIR_ENUM_DECL(virDomainDiskCache) + /* Stores the virtual disk configuration */ typedef struct _virDomainDiskDef virDomainDiskDef; typedef virDomainDiskDef *virDomainDiskDefPtr; @@ -92,6 +107,7 @@ struct _virDomainDiskDef { char *dst; char *driverName; char *driverType; +int cachemode; unsigned int readonly : 1; unsigned int shared : 1; int slotnum; /* pci slot number for unattach */ = --- a/src/qemu_conf.c +++ b/src/qemu_conf.c @@ -1,7 +1,7 @@ /* * config.c: VM configuration management * - * Copyright (C) 2006, 2007, 2008 Red Hat, Inc. + * Copyright (C) 2006, 2007, 2008, 2009 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -399,7 +399,9 @@ int qemudExtractVersionInfo(const char * if (strstr(help, -domid)) flags |= QEMUD_CMD_FLAG_DOMID; if (strstr(help, -drive)) -flags |= QEMUD_CMD_FLAG_DRIVE; +flags |= QEMUD_CMD_FLAG_DRIVE | +(strstr(help, cache=writethrough|writeback|none) ? +QEMUD_CMD_FLAG_DRIVE_CACHEMODE : 0); if (strstr(help, boot=on)) flags |= QEMUD_CMD_FLAG_DRIVE_BOOT; if (version = 9000) @@ -591,6 +593,22 @@ qemudNetworkIfaceConnect(virConnectPtr c return NULL; } +VIR_ENUM_DECL(qemu_cache_map) + +/* map from internal cache mode to qemu cache arg text + * + * Note: currently this replicates virDomainDiskCache, but will need to + * error flag potential new entries in virDomainDiskCache which are + * not supported by qemu (raising exceptions as appropriate). + */ +VIR_ENUM_IMPL(qemu_cache_map, VIR_DOMAIN_DISK_CACHE_LAST, +, /* reserved -- mode not specified */ +off, /* deprecated; use none */ +on, /* obsolete; use writethrough */ +none, +writeback, +writethrough) + static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev, char *buf, int buflen) @@ -1007,11 +1025,27 @@ int qemudBuildCommandLine(virConnectPtr if (bootable disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) virBufferAddLit(opt, ,boot=on); -if (disk-shared !disk-readonly) -virBufferAddLit(opt, ,cache=off); if (disk-driverType) virBufferVSprintf(opt, ,fmt=%s, disk-driverType); +unsigned int qf; +int cachemode = disk-cachemode; +if (cachemode) { +if
Re: [libvirt] [RHEL6 PATCH] Correct cpuid flags and model fields, V2
A question arose in today's kvm meeting concerning any impact to libvirt from this change. I've discussed it with Cole and it seems to be a non-issue. But just to err on the side of caution, here's a summary: The current cpu model definition of qemu64 upstream is problematic from kvm's perspective such that we need to modify it slightly (BZ justifications listed below). Doing so we've left the qemu64 definition as-is, but added cpu64-rhel6 and cpu64-rhel5 models which are now selected by default via -M machine, for RHEL 6.0.0 PC and RHEL 5.x.y PC respectively. So the only issue would be libvirt invoking qemu with neither a -cpu nor -M argument (which defaults to qemu64) or explicitly requesting -cpu qemu64. From my discussion with Cole it appears the use cases where this may happen fall outside of routine/expected usage and would need to be explicitly requested by the user. However I wanted to call this out here in the event we're overlooking something. Thanks, -john http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-July/msg01110.html http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-July/msg00831.html john cooper wrote: Addresses BZs: #618332 - CPUID_EXT_POPCNT enabled in qemu64 and qemu32 built-in models. #613892 - [SR-IOV]VF device can not start on 32bit Windows2008 SP2 Summary: CPU feature flags for several processor definitions require correction to accurately reflect the corresponding shipped silicon. In particular overly conservative values for cpuid model fields cause disable of MSI support in windows guests (BZ #613892). Also recent upstream changes of qemu64's built-in definition enables POPCNT (for the benefit of TCG) but risks kvm migration breakage (BZ #618332). The following patch address these issues collectively. Changes relative to previous version: - drop of the qemu32-rhel5 model as it appears to be unneeded. - rename of qemu64-rhel? to cpu64-rhel? as we're diverging from upstream qemu64's definition but haven't migrated to kvm64 quite yet. - Correction of several fields for the (now) cpu64-rhel5 model. Further detail may be found in the associated mail thread common to both BZ cases starting about here: http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-July/msg00831.html Brew Build: https://brewweb.devel.redhat.com/taskinfo?taskID=2643552 Upstream status: BZ #618332 is a local, interim change and doesn't relate to upstream concerns. Longer-term this qemu64 derived model would be displaced by kvm64. The update of cpu definitions (BZ #613892) however needs to be pushed upstream upon validation in rhel6. We're currently the sole users of the new cpu models which fosters this inverted process. -- john.coo...@redhat.com -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list