Re: [libvirt] [PATCH] Add huge page support to libvirt..

2009-07-23 Thread Daniel P. Berrange
On Wed, Jul 22, 2009 at 09:25:02PM -0400, john cooper wrote:
 This patch allows passing of a -mem-path arg
 flag to qemu for support of huge page backed
 guests.  A guest may request this option via
 specifying:
 
 hugepageon/hugepage
 
 in its domain definition xml file. 

This really opens a can of worms. While obviously this maps
very simply onto KVM's  -mem-path argument, I can't help 
thinking things are going to get much more advanced later.
For example, I don't think a boolean on/off is sufficient for
this, since Xen already has a 3rd option of 'best effort' it
uses by default where it'll try to allocate hugepages and
fallback to normal pages - in fact you can't tell Xen not
to use hugepages AFAIK. I'm also wondering whether we need
to be concerned about different hugepage sizes for guest 
configs eg 2M vs 1 GB, vs a mix of both - who decides?
KVM also seems to have ability to request that huge pages
are pre-allocated upfront, vs on demand, though I'm not
sure what happens to  a VM if it doesn't pre-allocate and
it later can't be satisfied.


   The request
 for huge page backing will be attempted within
 libvirt if the host system has indicated a
 hugetlbfs mount point in qemu.conf, for example:
 
 hugepage_mount = /hugetlbfs

Seems like it would be simpler to just open /proc/mounts
and scan it to find whether/where hugetlbfs is mounted,
so it would 'just work' if the user had mounted it.

 _and_ the target qemu executable is aware of
 the -mem-path flag.  Otherwise this request
 by a guest will result in an error.

It looks like argument is not available in upstream QEMU, only
part of the KVM fork ? ANy idea why it hasn't been sent upstream,
and/or whether it will be soon. I'm loathe to add more KVM
specific options since we've been burnt everytime we've done
this in the past with its semantics changing when merged to
QEMU :-(

 This patch does not address setup of the required
 host hugetlbfs mount point, verifying the mount
 point is correct/usable, nor assure sufficient
 free huge pages are available; which are assumed
 to be addressed by other means.

I agree that setting up hugetlbfs is out of scope for libvirt.
We should just probe to see whether its available or not.
We ought to have some way of reporting available hugepages
though, both at a host level, and likely per NUMA node too.
Without this a mgmt app using libvirt has no clue whether they'll
be able to actually use hugepages successfully or not.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Add huge page support to libvirt..

2009-07-23 Thread john cooper
Mark McLoughlin wrote:

 Other options include:
 
   - hugepages/
 
   - memory hugepages=yesX/memory

Yes, I'd expect additional options will need to
be addressed.  Currently the only additional
qemu-resident knob is the -mem-prealloc flag
which is enabled by default.  I've removed
support for it here in the interest of simplicity
but it will fall into the existing scheme.
 
 hugepage_mount = /hugetlbfs
 
 I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously
 unstandard whiff about it

That was intended as an example for no other
reason than it is what I happen to use.  I
suspected it wouldn't be the mount point's
final resting place.

Thanks,

-john

-- 
john.coo...@redhat.com

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Add huge page support to libvirt..

2009-07-23 Thread john cooper
Daniel P. Berrange wrote:
 On Wed, Jul 22, 2009 at 09:25:02PM -0400, john cooper wrote:
 This patch allows passing of a -mem-path arg
 flag to qemu for support of huge page backed
 guests.  A guest may request this option via
 specifying:

 hugepageon/hugepage

 in its domain definition xml file. 
 
 This really opens a can of worms. While obviously this maps
 very simply onto KVM's  -mem-path argument, I can't help 
 thinking things are going to get much more advanced later.
 For example, I don't think a boolean on/off is sufficient for
 this, since Xen already has a 3rd option of 'best effort' it
 uses by default where it'll try to allocate hugepages and
 fallback to normal pages - in fact you can't tell Xen not
 to use hugepages AFAIK.

I agree growing beyond a simple on/off switch is likely.
The patch originally had a prealloc option (since removed)
to accommodate passing of the existing -mem-prealloc flag.
That option defaults to enabled, which is desired in general
and therefore was dropped for the sake of patch simplicity.

 I'm also wondering whether we need
 to be concerned about different hugepage sizes for guest 
 configs eg 2M vs 1 GB, vs a mix of both - who decides?

Not a consideration currently, but it does underscore the
argument for a more extensible option syntax.

 KVM also seems to have ability to request that huge pages
 are pre-allocated upfront, vs on demand, though I'm not
 sure what happens to  a VM if it doesn't pre-allocate and
 it later can't be satisfied.

That was the motivation for -mem-prealloc, prior to which
a VM would SIGSEGV terminate if a huge page fault couldn't
be satisfied during runtime.  The (now default) preallocation
behavior guarantees the guest has its working set present at
startup but at the potential cost of overly pessimistic
memory allocation.

 The request
 for huge page backing will be attempted within
 libvirt if the host system has indicated a
 hugetlbfs mount point in qemu.conf, for example:

 hugepage_mount = /hugetlbfs
 
 Seems like it would be simpler to just open /proc/mounts
 and scan it to find whether/where hugetlbfs is mounted,
 so it would 'just work' if the user had mounted it.

Checking /proc/mounts solely seemed a bit too speculative
which is where the qemu.conf option arose.  But I can see
both being useful as in checking whether the qemu.conf
mount point exists, otherwise attempting to glean the
information from /proc/mounts, and if neither are satisfied
flagging the error.

 It looks like argument is not available in upstream QEMU, only
 part of the KVM fork ? ANy idea why it hasn't been sent upstream,
 and/or whether it will be soon.

I'd hazard due to the existing huge page support
being closely tied to kvm and no motivation as of
yet to reconcile this upstream.

 I'm loathe to add more KVM
 specific options since we've been burnt everytime we've done
 this in the past with its semantics changing when merged to
 QEMU :-(

Quite understandable.  It is also why I was attempting
to be as generic (and simple) as possible here and not
excessively cast the existing kvm implementation specifics
into the exported libvirt option. 

 I agree that setting up hugetlbfs is out of scope for libvirt.
 We should just probe to see whether its available or not.
 We ought to have some way of reporting available hugepages
 though, both at a host level, and likely per NUMA node too.
 Without this a mgmt app using libvirt has no clue whether they'll
 be able to actually use hugepages successfully or not.

Agree.  Extracting this host information will be needed
when more comprehensive management of the same exists.
Still it would seem a case of best-effort enforcement
without some sort of additional coordination.  The
process of gleaning the number of free huge pages from
the host and launching of the guest currently has an
inherent race.

Thanks,

-john

-- 
john.coo...@redhat.com

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Add huge page support to libvirt..

2009-07-23 Thread Chris Wright
* Mark McLoughlin (mar...@redhat.com) wrote:
 I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously
 unstandard whiff about it

What about /var/lib/libvirt/qemu/hugetlb and having the whole thing under
libvirt's control?  It can allow for better security I think.

thanks,
-chris

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Add huge page support to libvirt..

2009-07-23 Thread Chris Wright
* Daniel P. Berrange (berra...@redhat.com) wrote:
 On Thu, Jul 23, 2009 at 11:35:17AM -0700, Chris Wright wrote:
  * Mark McLoughlin (mar...@redhat.com) wrote:
   I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously
   unstandard whiff about it
  
  What about /var/lib/libvirt/qemu/hugetlb and having the whole thing under
  libvirt's control?  It can allow for better security I think.
 
 Does hugetlbfs support extended attributes ?  If so, the sVirt will
 automatically ensure isolation of each VM's backing file. If it
 doesn't supported extended attrs, then using hugetlbs at all would
 seem to blow a huge hole in our security model ...  

You should be able to specify a per mount point security context.
There is no xattr support for hugetlbfs.

thanks,
-chris

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] Add huge page support to libvirt..

2009-07-22 Thread john cooper
This patch allows passing of a -mem-path arg
flag to qemu for support of huge page backed
guests.  A guest may request this option via
specifying:

hugepageon/hugepage

in its domain definition xml file.  The request
for huge page backing will be attempted within
libvirt if the host system has indicated a
hugetlbfs mount point in qemu.conf, for example:

hugepage_mount = /hugetlbfs

_and_ the target qemu executable is aware of
the -mem-path flag.  Otherwise this request
by a guest will result in an error.

This patch does not address setup of the required
host hugetlbfs mount point, verifying the mount
point is correct/usable, nor assure sufficient
free huge pages are available; which are assumed
to be addressed by other means.

Signed-off-by: john cooper john.coo...@redhat.com
---

diff --git a/src/domain_conf.c b/src/domain_conf.c
index f3e4c6c..04d6911 100644
--- a/src/domain_conf.c
+++ b/src/domain_conf.c
@@ -2369,6 +2369,17 @@ static virDomainDefPtr 
virDomainDefParseXML(virConnectPtr conn,
 if (virXPathULong(conn, string(./currentMemory[1]), ctxt, def-memory) 
 0)
 def-memory = def-maxmem;
 
+tmp = virXPathString(conn, string(./hugepage[1]), ctxt);
+if (!tmp || STREQ(tmp, off))
+def-hugepage_backed = 0;
+else if (STREQ(tmp, on))
+def-hugepage_backed = 1;
+else {
+virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
+ _(invalid hugepage mode %s), tmp);
+goto error;
+}
+
 if (virXPathULong(conn, string(./vcpu[1]), ctxt, def-vcpus)  0)
 def-vcpus = 1;
 
@@ -3933,6 +3944,8 @@ char *virDomainDefFormat(virConnectPtr conn,
 virBufferVSprintf(buf,   memory%lu/memory\n, def-maxmem);
 virBufferVSprintf(buf,   currentMemory%lu/currentMemory\n,
   def-memory);
+if (def-hugepage_backed)
+virBufferVSprintf(buf,   hugepage%s/hugepage\n, on);
 
 for (n = 0 ; n  def-cpumasklen ; n++)
 if (def-cpumask[n] != 1)
diff --git a/src/domain_conf.h b/src/domain_conf.h
index 6e111fa..d6bdcdb 100644
--- a/src/domain_conf.h
+++ b/src/domain_conf.h
@@ -481,6 +481,7 @@ struct _virDomainDef {
 
 unsigned long memory;
 unsigned long maxmem;
+unsigned char hugepage_backed;
 unsigned long vcpus;
 int cpumasklen;
 char *cpumask;
diff --git a/src/qemu.conf b/src/qemu.conf
index 3009725..a3387f1 100644
--- a/src/qemu.conf
+++ b/src/qemu.conf
@@ -95,3 +95,10 @@
 
 # The group ID for QEMU processes run by the system instance
 #group = root
+
+# If provided by the host and this hugetlbfs mount point is
+# configured, a guest may request huge page backing.  When this
+# mount point is undefined, huge page backing is disabled.
+
+hugepage_mount = /hugetlbfs
+
diff --git a/src/qemu_conf.c b/src/qemu_conf.c
index 4043d70..632b784 100644
--- a/src/qemu_conf.c
+++ b/src/qemu_conf.c
@@ -218,6 +218,17 @@ int qemudLoadDriverConfig(struct qemud_driver *driver,
 }
 VIR_FREE(group);
 
+p = virConfGetValue (conf, hugepage_mount);
+CHECK_TYPE (hugepage_mount, VIR_CONF_STRING);
+if (p  p-str) {
+VIR_FREE(driver-hugepage_mount);
+if (!(driver-hugepage_mount = strdup(p-str))) {
+virReportOOMError(NULL);
+virConfFree(conf);
+return -1;
+}
+}
+
 virConfFree (conf);
 return 0;
 }
@@ -500,6 +511,8 @@ static unsigned int qemudComputeCmdFlags(const char *help,
 flags |= QEMUD_CMD_FLAG_VGA;
 if (strstr(help, boot=on))
 flags |= QEMUD_CMD_FLAG_DRIVE_BOOT;
+if (strstr(help, -mem-path))
+flags |= QEMUD_CMD_FLAG_MEM_PATH;
 if (version = 9000)
 flags |= QEMUD_CMD_FLAG_VNC_COLON;
 
@@ -1125,6 +1138,15 @@ int qemudBuildCommandLine(virConnectPtr conn,
 ADD_ARG_LIT(-no-kvm);
 ADD_ARG_LIT(-m);
 ADD_ARG_LIT(memory);
+if (def-hugepage_backed) {
+   if (!driver-hugepage_mount || !(qemuCmdFlags  
QEMUD_CMD_FLAG_MEM_PATH)) {
+qemudReportError(conn, NULL, NULL, VIR_ERR_NO_SUPPORT,
+ %s, _(hugepage backing not supported));
+goto error;
+   }
+   ADD_ARG_LIT(-mem-path);
+   ADD_ARG_LIT(driver-hugepage_mount);
+}
 ADD_ARG_LIT(-smp);
 ADD_ARG_LIT(vcpus);
 
diff --git a/src/qemu_conf.h b/src/qemu_conf.h
index fbf2ab9..847597f 100644
--- a/src/qemu_conf.h
+++ b/src/qemu_conf.h
@@ -58,6 +58,7 @@ enum qemud_cmd_flags {
 QEMUD_CMD_FLAG_KVM   = (1  13), /* Whether KVM is compiled 
in */
 QEMUD_CMD_FLAG_DRIVE_FORMAT  = (1  14), /* Is -drive format= avail */
 QEMUD_CMD_FLAG_VGA   = (1  15), /* Is -vga avail */
+QEMUD_CMD_FLAG_MEM_PATH  = (1  16), /* mmap'ped guest backing 
supported */
 };
 
 /* Main driver state */
@@ -86,6 +87,7 @@ struct qemud_driver {
 char *vncListen;
 char *vncPassword;
 char *vncSASLdir;
+char *hugepage_mount;
 
 virCapsPtr caps;
 
diff --git a/src/qemu_driver.c b/src/qemu_driver.c