Re: [libvirt] [PATCH] Add huge page support to libvirt..
On Wed, Jul 22, 2009 at 09:25:02PM -0400, john cooper wrote: This patch allows passing of a -mem-path arg flag to qemu for support of huge page backed guests. A guest may request this option via specifying: hugepageon/hugepage in its domain definition xml file. This really opens a can of worms. While obviously this maps very simply onto KVM's -mem-path argument, I can't help thinking things are going to get much more advanced later. For example, I don't think a boolean on/off is sufficient for this, since Xen already has a 3rd option of 'best effort' it uses by default where it'll try to allocate hugepages and fallback to normal pages - in fact you can't tell Xen not to use hugepages AFAIK. I'm also wondering whether we need to be concerned about different hugepage sizes for guest configs eg 2M vs 1 GB, vs a mix of both - who decides? KVM also seems to have ability to request that huge pages are pre-allocated upfront, vs on demand, though I'm not sure what happens to a VM if it doesn't pre-allocate and it later can't be satisfied. The request for huge page backing will be attempted within libvirt if the host system has indicated a hugetlbfs mount point in qemu.conf, for example: hugepage_mount = /hugetlbfs Seems like it would be simpler to just open /proc/mounts and scan it to find whether/where hugetlbfs is mounted, so it would 'just work' if the user had mounted it. _and_ the target qemu executable is aware of the -mem-path flag. Otherwise this request by a guest will result in an error. It looks like argument is not available in upstream QEMU, only part of the KVM fork ? ANy idea why it hasn't been sent upstream, and/or whether it will be soon. I'm loathe to add more KVM specific options since we've been burnt everytime we've done this in the past with its semantics changing when merged to QEMU :-( This patch does not address setup of the required host hugetlbfs mount point, verifying the mount point is correct/usable, nor assure sufficient free huge pages are available; which are assumed to be addressed by other means. I agree that setting up hugetlbfs is out of scope for libvirt. We should just probe to see whether its available or not. We ought to have some way of reporting available hugepages though, both at a host level, and likely per NUMA node too. Without this a mgmt app using libvirt has no clue whether they'll be able to actually use hugepages successfully or not. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Add huge page support to libvirt..
Mark McLoughlin wrote: Other options include: - hugepages/ - memory hugepages=yesX/memory Yes, I'd expect additional options will need to be addressed. Currently the only additional qemu-resident knob is the -mem-prealloc flag which is enabled by default. I've removed support for it here in the interest of simplicity but it will fall into the existing scheme. hugepage_mount = /hugetlbfs I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously unstandard whiff about it That was intended as an example for no other reason than it is what I happen to use. I suspected it wouldn't be the mount point's final resting place. Thanks, -john -- john.coo...@redhat.com -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Add huge page support to libvirt..
Daniel P. Berrange wrote: On Wed, Jul 22, 2009 at 09:25:02PM -0400, john cooper wrote: This patch allows passing of a -mem-path arg flag to qemu for support of huge page backed guests. A guest may request this option via specifying: hugepageon/hugepage in its domain definition xml file. This really opens a can of worms. While obviously this maps very simply onto KVM's -mem-path argument, I can't help thinking things are going to get much more advanced later. For example, I don't think a boolean on/off is sufficient for this, since Xen already has a 3rd option of 'best effort' it uses by default where it'll try to allocate hugepages and fallback to normal pages - in fact you can't tell Xen not to use hugepages AFAIK. I agree growing beyond a simple on/off switch is likely. The patch originally had a prealloc option (since removed) to accommodate passing of the existing -mem-prealloc flag. That option defaults to enabled, which is desired in general and therefore was dropped for the sake of patch simplicity. I'm also wondering whether we need to be concerned about different hugepage sizes for guest configs eg 2M vs 1 GB, vs a mix of both - who decides? Not a consideration currently, but it does underscore the argument for a more extensible option syntax. KVM also seems to have ability to request that huge pages are pre-allocated upfront, vs on demand, though I'm not sure what happens to a VM if it doesn't pre-allocate and it later can't be satisfied. That was the motivation for -mem-prealloc, prior to which a VM would SIGSEGV terminate if a huge page fault couldn't be satisfied during runtime. The (now default) preallocation behavior guarantees the guest has its working set present at startup but at the potential cost of overly pessimistic memory allocation. The request for huge page backing will be attempted within libvirt if the host system has indicated a hugetlbfs mount point in qemu.conf, for example: hugepage_mount = /hugetlbfs Seems like it would be simpler to just open /proc/mounts and scan it to find whether/where hugetlbfs is mounted, so it would 'just work' if the user had mounted it. Checking /proc/mounts solely seemed a bit too speculative which is where the qemu.conf option arose. But I can see both being useful as in checking whether the qemu.conf mount point exists, otherwise attempting to glean the information from /proc/mounts, and if neither are satisfied flagging the error. It looks like argument is not available in upstream QEMU, only part of the KVM fork ? ANy idea why it hasn't been sent upstream, and/or whether it will be soon. I'd hazard due to the existing huge page support being closely tied to kvm and no motivation as of yet to reconcile this upstream. I'm loathe to add more KVM specific options since we've been burnt everytime we've done this in the past with its semantics changing when merged to QEMU :-( Quite understandable. It is also why I was attempting to be as generic (and simple) as possible here and not excessively cast the existing kvm implementation specifics into the exported libvirt option. I agree that setting up hugetlbfs is out of scope for libvirt. We should just probe to see whether its available or not. We ought to have some way of reporting available hugepages though, both at a host level, and likely per NUMA node too. Without this a mgmt app using libvirt has no clue whether they'll be able to actually use hugepages successfully or not. Agree. Extracting this host information will be needed when more comprehensive management of the same exists. Still it would seem a case of best-effort enforcement without some sort of additional coordination. The process of gleaning the number of free huge pages from the host and launching of the guest currently has an inherent race. Thanks, -john -- john.coo...@redhat.com -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Add huge page support to libvirt..
* Mark McLoughlin (mar...@redhat.com) wrote: I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously unstandard whiff about it What about /var/lib/libvirt/qemu/hugetlb and having the whole thing under libvirt's control? It can allow for better security I think. thanks, -chris -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Add huge page support to libvirt..
* Daniel P. Berrange (berra...@redhat.com) wrote: On Thu, Jul 23, 2009 at 11:35:17AM -0700, Chris Wright wrote: * Mark McLoughlin (mar...@redhat.com) wrote: I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously unstandard whiff about it What about /var/lib/libvirt/qemu/hugetlb and having the whole thing under libvirt's control? It can allow for better security I think. Does hugetlbfs support extended attributes ? If so, the sVirt will automatically ensure isolation of each VM's backing file. If it doesn't supported extended attrs, then using hugetlbs at all would seem to blow a huge hole in our security model ... You should be able to specify a per mount point security context. There is no xattr support for hugetlbfs. thanks, -chris -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] Add huge page support to libvirt..
This patch allows passing of a -mem-path arg flag to qemu for support of huge page backed guests. A guest may request this option via specifying: hugepageon/hugepage in its domain definition xml file. The request for huge page backing will be attempted within libvirt if the host system has indicated a hugetlbfs mount point in qemu.conf, for example: hugepage_mount = /hugetlbfs _and_ the target qemu executable is aware of the -mem-path flag. Otherwise this request by a guest will result in an error. This patch does not address setup of the required host hugetlbfs mount point, verifying the mount point is correct/usable, nor assure sufficient free huge pages are available; which are assumed to be addressed by other means. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/src/domain_conf.c b/src/domain_conf.c index f3e4c6c..04d6911 100644 --- a/src/domain_conf.c +++ b/src/domain_conf.c @@ -2369,6 +2369,17 @@ static virDomainDefPtr virDomainDefParseXML(virConnectPtr conn, if (virXPathULong(conn, string(./currentMemory[1]), ctxt, def-memory) 0) def-memory = def-maxmem; +tmp = virXPathString(conn, string(./hugepage[1]), ctxt); +if (!tmp || STREQ(tmp, off)) +def-hugepage_backed = 0; +else if (STREQ(tmp, on)) +def-hugepage_backed = 1; +else { +virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR, + _(invalid hugepage mode %s), tmp); +goto error; +} + if (virXPathULong(conn, string(./vcpu[1]), ctxt, def-vcpus) 0) def-vcpus = 1; @@ -3933,6 +3944,8 @@ char *virDomainDefFormat(virConnectPtr conn, virBufferVSprintf(buf, memory%lu/memory\n, def-maxmem); virBufferVSprintf(buf, currentMemory%lu/currentMemory\n, def-memory); +if (def-hugepage_backed) +virBufferVSprintf(buf, hugepage%s/hugepage\n, on); for (n = 0 ; n def-cpumasklen ; n++) if (def-cpumask[n] != 1) diff --git a/src/domain_conf.h b/src/domain_conf.h index 6e111fa..d6bdcdb 100644 --- a/src/domain_conf.h +++ b/src/domain_conf.h @@ -481,6 +481,7 @@ struct _virDomainDef { unsigned long memory; unsigned long maxmem; +unsigned char hugepage_backed; unsigned long vcpus; int cpumasklen; char *cpumask; diff --git a/src/qemu.conf b/src/qemu.conf index 3009725..a3387f1 100644 --- a/src/qemu.conf +++ b/src/qemu.conf @@ -95,3 +95,10 @@ # The group ID for QEMU processes run by the system instance #group = root + +# If provided by the host and this hugetlbfs mount point is +# configured, a guest may request huge page backing. When this +# mount point is undefined, huge page backing is disabled. + +hugepage_mount = /hugetlbfs + diff --git a/src/qemu_conf.c b/src/qemu_conf.c index 4043d70..632b784 100644 --- a/src/qemu_conf.c +++ b/src/qemu_conf.c @@ -218,6 +218,17 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, } VIR_FREE(group); +p = virConfGetValue (conf, hugepage_mount); +CHECK_TYPE (hugepage_mount, VIR_CONF_STRING); +if (p p-str) { +VIR_FREE(driver-hugepage_mount); +if (!(driver-hugepage_mount = strdup(p-str))) { +virReportOOMError(NULL); +virConfFree(conf); +return -1; +} +} + virConfFree (conf); return 0; } @@ -500,6 +511,8 @@ static unsigned int qemudComputeCmdFlags(const char *help, flags |= QEMUD_CMD_FLAG_VGA; if (strstr(help, boot=on)) flags |= QEMUD_CMD_FLAG_DRIVE_BOOT; +if (strstr(help, -mem-path)) +flags |= QEMUD_CMD_FLAG_MEM_PATH; if (version = 9000) flags |= QEMUD_CMD_FLAG_VNC_COLON; @@ -1125,6 +1138,15 @@ int qemudBuildCommandLine(virConnectPtr conn, ADD_ARG_LIT(-no-kvm); ADD_ARG_LIT(-m); ADD_ARG_LIT(memory); +if (def-hugepage_backed) { + if (!driver-hugepage_mount || !(qemuCmdFlags QEMUD_CMD_FLAG_MEM_PATH)) { +qemudReportError(conn, NULL, NULL, VIR_ERR_NO_SUPPORT, + %s, _(hugepage backing not supported)); +goto error; + } + ADD_ARG_LIT(-mem-path); + ADD_ARG_LIT(driver-hugepage_mount); +} ADD_ARG_LIT(-smp); ADD_ARG_LIT(vcpus); diff --git a/src/qemu_conf.h b/src/qemu_conf.h index fbf2ab9..847597f 100644 --- a/src/qemu_conf.h +++ b/src/qemu_conf.h @@ -58,6 +58,7 @@ enum qemud_cmd_flags { QEMUD_CMD_FLAG_KVM = (1 13), /* Whether KVM is compiled in */ QEMUD_CMD_FLAG_DRIVE_FORMAT = (1 14), /* Is -drive format= avail */ QEMUD_CMD_FLAG_VGA = (1 15), /* Is -vga avail */ +QEMUD_CMD_FLAG_MEM_PATH = (1 16), /* mmap'ped guest backing supported */ }; /* Main driver state */ @@ -86,6 +87,7 @@ struct qemud_driver { char *vncListen; char *vncPassword; char *vncSASLdir; +char *hugepage_mount; virCapsPtr caps; diff --git a/src/qemu_driver.c b/src/qemu_driver.c