Re: [libvirt] [virt-devel] Cache and memory bandwidth graphs: native versus guest

2011-05-16 Thread john cooper
Bill Gray wrote:
 
 See attachment with two graphs: (1) cache bandwidth, (2) blowup of
 sustained memory bandwidth region...

Bill,
I had some difficulty with this document under ooffice.
A recent version seized and an older version didn't seem
to render correctly.  Could you export it as pdf?

Thanks,

-john


 - X axis has a log scale
 
 - Light blue line is an older system with 32K L1 and 6M L2 caches
 
 - All other measurements on perf34: 32K L1, 256K L2, 30M L3 caches
 
 - Majority of variation in L1 cache region is from the two guest
 measurements done with no taskset to a VCPU: yellow and maroon lines.
 Perhaps this reflects the test bouncing between VCPUs in the guest.
 
 - The sustained memory bandwidth for the guest with no pinning is only
 80% of native (maroon line), which motivates more convenient and
 comprehensive numactl for guests.
 
 - Virtualized bandwidth is otherwise nearly in line with native, which
 confirms the importance of the virtual CPUID communicating actual native
 cache sizes to cache-size-aware guest applications, since guest apps
 could benefit from the full size of the native cache.  (Guest was
 started with -cpu host, but lscpu in guest showed 4M cache despite
 actual 30M cache.)


-- 
john.coo...@redhat.com

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] Add huge page support to libvirt..

2009-07-22 Thread john cooper
This patch allows passing of a -mem-path arg
flag to qemu for support of huge page backed
guests.  A guest may request this option via
specifying:

hugepageon/hugepage

in its domain definition xml file.  The request
for huge page backing will be attempted within
libvirt if the host system has indicated a
hugetlbfs mount point in qemu.conf, for example:

hugepage_mount = /hugetlbfs

_and_ the target qemu executable is aware of
the -mem-path flag.  Otherwise this request
by a guest will result in an error.

This patch does not address setup of the required
host hugetlbfs mount point, verifying the mount
point is correct/usable, nor assure sufficient
free huge pages are available; which are assumed
to be addressed by other means.

Signed-off-by: john cooper john.coo...@redhat.com
---

diff --git a/src/domain_conf.c b/src/domain_conf.c
index f3e4c6c..04d6911 100644
--- a/src/domain_conf.c
+++ b/src/domain_conf.c
@@ -2369,6 +2369,17 @@ static virDomainDefPtr 
virDomainDefParseXML(virConnectPtr conn,
 if (virXPathULong(conn, string(./currentMemory[1]), ctxt, def-memory) 
 0)
 def-memory = def-maxmem;
 
+tmp = virXPathString(conn, string(./hugepage[1]), ctxt);
+if (!tmp || STREQ(tmp, off))
+def-hugepage_backed = 0;
+else if (STREQ(tmp, on))
+def-hugepage_backed = 1;
+else {
+virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
+ _(invalid hugepage mode %s), tmp);
+goto error;
+}
+
 if (virXPathULong(conn, string(./vcpu[1]), ctxt, def-vcpus)  0)
 def-vcpus = 1;
 
@@ -3933,6 +3944,8 @@ char *virDomainDefFormat(virConnectPtr conn,
 virBufferVSprintf(buf,   memory%lu/memory\n, def-maxmem);
 virBufferVSprintf(buf,   currentMemory%lu/currentMemory\n,
   def-memory);
+if (def-hugepage_backed)
+virBufferVSprintf(buf,   hugepage%s/hugepage\n, on);
 
 for (n = 0 ; n  def-cpumasklen ; n++)
 if (def-cpumask[n] != 1)
diff --git a/src/domain_conf.h b/src/domain_conf.h
index 6e111fa..d6bdcdb 100644
--- a/src/domain_conf.h
+++ b/src/domain_conf.h
@@ -481,6 +481,7 @@ struct _virDomainDef {
 
 unsigned long memory;
 unsigned long maxmem;
+unsigned char hugepage_backed;
 unsigned long vcpus;
 int cpumasklen;
 char *cpumask;
diff --git a/src/qemu.conf b/src/qemu.conf
index 3009725..a3387f1 100644
--- a/src/qemu.conf
+++ b/src/qemu.conf
@@ -95,3 +95,10 @@
 
 # The group ID for QEMU processes run by the system instance
 #group = root
+
+# If provided by the host and this hugetlbfs mount point is
+# configured, a guest may request huge page backing.  When this
+# mount point is undefined, huge page backing is disabled.
+
+hugepage_mount = /hugetlbfs
+
diff --git a/src/qemu_conf.c b/src/qemu_conf.c
index 4043d70..632b784 100644
--- a/src/qemu_conf.c
+++ b/src/qemu_conf.c
@@ -218,6 +218,17 @@ int qemudLoadDriverConfig(struct qemud_driver *driver,
 }
 VIR_FREE(group);
 
+p = virConfGetValue (conf, hugepage_mount);
+CHECK_TYPE (hugepage_mount, VIR_CONF_STRING);
+if (p  p-str) {
+VIR_FREE(driver-hugepage_mount);
+if (!(driver-hugepage_mount = strdup(p-str))) {
+virReportOOMError(NULL);
+virConfFree(conf);
+return -1;
+}
+}
+
 virConfFree (conf);
 return 0;
 }
@@ -500,6 +511,8 @@ static unsigned int qemudComputeCmdFlags(const char *help,
 flags |= QEMUD_CMD_FLAG_VGA;
 if (strstr(help, boot=on))
 flags |= QEMUD_CMD_FLAG_DRIVE_BOOT;
+if (strstr(help, -mem-path))
+flags |= QEMUD_CMD_FLAG_MEM_PATH;
 if (version = 9000)
 flags |= QEMUD_CMD_FLAG_VNC_COLON;
 
@@ -1125,6 +1138,15 @@ int qemudBuildCommandLine(virConnectPtr conn,
 ADD_ARG_LIT(-no-kvm);
 ADD_ARG_LIT(-m);
 ADD_ARG_LIT(memory);
+if (def-hugepage_backed) {
+   if (!driver-hugepage_mount || !(qemuCmdFlags  
QEMUD_CMD_FLAG_MEM_PATH)) {
+qemudReportError(conn, NULL, NULL, VIR_ERR_NO_SUPPORT,
+ %s, _(hugepage backing not supported));
+goto error;
+   }
+   ADD_ARG_LIT(-mem-path);
+   ADD_ARG_LIT(driver-hugepage_mount);
+}
 ADD_ARG_LIT(-smp);
 ADD_ARG_LIT(vcpus);
 
diff --git a/src/qemu_conf.h b/src/qemu_conf.h
index fbf2ab9..847597f 100644
--- a/src/qemu_conf.h
+++ b/src/qemu_conf.h
@@ -58,6 +58,7 @@ enum qemud_cmd_flags {
 QEMUD_CMD_FLAG_KVM   = (1  13), /* Whether KVM is compiled 
in */
 QEMUD_CMD_FLAG_DRIVE_FORMAT  = (1  14), /* Is -drive format= avail */
 QEMUD_CMD_FLAG_VGA   = (1  15), /* Is -vga avail */
+QEMUD_CMD_FLAG_MEM_PATH  = (1  16), /* mmap'ped guest backing 
supported */
 };
 
 /* Main driver state */
@@ -86,6 +87,7 @@ struct qemud_driver {
 char *vncListen;
 char *vncPassword;
 char *vncSASLdir;
+char *hugepage_mount;
 
 virCapsPtr caps;
 
diff --git a/src/qemu_driver.c b/src/qemu_driver.c

Re: [libvirt] [PATCH] Add huge page support to libvirt..

2009-07-23 Thread john cooper
Mark McLoughlin wrote:

 Other options include:
 
   - hugepages/
 
   - memory hugepages=yesX/memory

Yes, I'd expect additional options will need to
be addressed.  Currently the only additional
qemu-resident knob is the -mem-prealloc flag
which is enabled by default.  I've removed
support for it here in the interest of simplicity
but it will fall into the existing scheme.
 
 hugepage_mount = /hugetlbfs
 
 I'd suggest /dev/hugepages as the default - /hugetlbfs has an seriously
 unstandard whiff about it

That was intended as an example for no other
reason than it is what I happen to use.  I
suspected it wouldn't be the mount point's
final resting place.

Thanks,

-john

-- 
john.coo...@redhat.com

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Add huge page support to libvirt..

2009-07-23 Thread john cooper
Daniel P. Berrange wrote:
 On Wed, Jul 22, 2009 at 09:25:02PM -0400, john cooper wrote:
 This patch allows passing of a -mem-path arg
 flag to qemu for support of huge page backed
 guests.  A guest may request this option via
 specifying:

 hugepageon/hugepage

 in its domain definition xml file. 
 
 This really opens a can of worms. While obviously this maps
 very simply onto KVM's  -mem-path argument, I can't help 
 thinking things are going to get much more advanced later.
 For example, I don't think a boolean on/off is sufficient for
 this, since Xen already has a 3rd option of 'best effort' it
 uses by default where it'll try to allocate hugepages and
 fallback to normal pages - in fact you can't tell Xen not
 to use hugepages AFAIK.

I agree growing beyond a simple on/off switch is likely.
The patch originally had a prealloc option (since removed)
to accommodate passing of the existing -mem-prealloc flag.
That option defaults to enabled, which is desired in general
and therefore was dropped for the sake of patch simplicity.

 I'm also wondering whether we need
 to be concerned about different hugepage sizes for guest 
 configs eg 2M vs 1 GB, vs a mix of both - who decides?

Not a consideration currently, but it does underscore the
argument for a more extensible option syntax.

 KVM also seems to have ability to request that huge pages
 are pre-allocated upfront, vs on demand, though I'm not
 sure what happens to  a VM if it doesn't pre-allocate and
 it later can't be satisfied.

That was the motivation for -mem-prealloc, prior to which
a VM would SIGSEGV terminate if a huge page fault couldn't
be satisfied during runtime.  The (now default) preallocation
behavior guarantees the guest has its working set present at
startup but at the potential cost of overly pessimistic
memory allocation.

 The request
 for huge page backing will be attempted within
 libvirt if the host system has indicated a
 hugetlbfs mount point in qemu.conf, for example:

 hugepage_mount = /hugetlbfs
 
 Seems like it would be simpler to just open /proc/mounts
 and scan it to find whether/where hugetlbfs is mounted,
 so it would 'just work' if the user had mounted it.

Checking /proc/mounts solely seemed a bit too speculative
which is where the qemu.conf option arose.  But I can see
both being useful as in checking whether the qemu.conf
mount point exists, otherwise attempting to glean the
information from /proc/mounts, and if neither are satisfied
flagging the error.

 It looks like argument is not available in upstream QEMU, only
 part of the KVM fork ? ANy idea why it hasn't been sent upstream,
 and/or whether it will be soon.

I'd hazard due to the existing huge page support
being closely tied to kvm and no motivation as of
yet to reconcile this upstream.

 I'm loathe to add more KVM
 specific options since we've been burnt everytime we've done
 this in the past with its semantics changing when merged to
 QEMU :-(

Quite understandable.  It is also why I was attempting
to be as generic (and simple) as possible here and not
excessively cast the existing kvm implementation specifics
into the exported libvirt option. 

 I agree that setting up hugetlbfs is out of scope for libvirt.
 We should just probe to see whether its available or not.
 We ought to have some way of reporting available hugepages
 though, both at a host level, and likely per NUMA node too.
 Without this a mgmt app using libvirt has no clue whether they'll
 be able to actually use hugepages successfully or not.

Agree.  Extracting this host information will be needed
when more comprehensive management of the same exists.
Still it would seem a case of best-effort enforcement
without some sort of additional coordination.  The
process of gleaning the number of free huge pages from
the host and launching of the guest currently has an
inherent race.

Thanks,

-john

-- 
john.coo...@redhat.com

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Support configuration of huge pages in guests

2009-09-02 Thread john cooper
Daniel P. Berrange wrote:
 * configure.in: Add check for mntent.h
 * qemud/libvirtd_qemu.aug, qemud/test_libvirtd_qemu.aug, src/qemu.conf
   Add 'hugetlbfs_mount' config parameter
 * src/qemu_conf.c, src/qemu_conf.h: Check for -mem-path flag in QEMU,
   and pass it when hugepages are requested.
   Load hugetlbfs_mount config parameter, search for mount if not given.
 * src/qemu_driver.c: Free hugetlbfs_mount/path parameter in driver shutdown.
   Create directory for QEMU hugepage usage, chowning if required.
 * docs/formatdomain.html.in: Document memoryBacking/hugepages elements
 * docs/schemas/domain.rng: Add memoryBacking/hugepages elements to schema
 * src/util.c, src/util.h, src/libvirt_private.syms: Add virFileFindMountPoint
   helper API
 * tests/qemuhelptest.c: Add -mem-path constants
 * tests/qemuxml2argvtest.c, tests/qemuxml2xmltest.c: Add tests for hugepage
   handling
 * tests/qemuxml2argvdata/qemuxml2argv-hugepages.xml,
   tests/qemuxml2argvdata/qemuxml2argv-hugepages.args: Data files for
   hugepage tests
 ---
  configure.in   |2 +-
  docs/formatdomain.html |8 +++-
  docs/formatdomain.html.in  |8 +++
  docs/schemas/domain.rng|9 
  qemud/libvirtd_qemu.aug|1 +
  qemud/test_libvirtd_qemu.aug   |4 ++
  src/domain_conf.c  |   10 -
  src/domain_conf.h  |1 +
  src/libvirt_private.syms   |1 +
  src/qemu.conf  |   13 +
  src/qemu_conf.c|   49 
 
  src/qemu_conf.h|3 +
  src/qemu_driver.c  |   34 ++
  src/util.c |   37 ++-
  src/util.h |4 ++
  tests/qemuhelptest.c   |6 ++-
  tests/qemuxml2argvdata/qemuxml2argv-hugepages.args |1 +
  tests/qemuxml2argvdata/qemuxml2argv-hugepages.xml  |   25 ++
  tests/qemuxml2argvtest.c   |7 ++-
  tests/qemuxml2xmltest.c|1 +
  20 files changed, 217 insertions(+), 7 deletions(-)
  create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-hugepages.args
  create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-hugepages.xml

I did a fetch on libvirt.git before reviewing and
it appears there is some code motion relative
to the version this patch was against.  Although
AFAICT nothing which appears to result in more than
patch bounce.

Looks good to me.  ACK.

-john

-- 
john.coo...@redhat.com

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Support for CPUID masking

2009-10-13 Thread john cooper
Dor Laor wrote:
 What about another approach for the cpuid issue:
 I think that dealing with specific flags is pretty error prone on all
 levels - virt-mgr, libvirt, qemu, migration, and even the guest.

..and performance verification, QA, and the average end user.
Unless we reduce all possible combinations of knob settings
into a few well thought out lumped models the complexity can
be overwhelming.  It is probably a reasonable compromise to
initially support the most obvious use cases with more
fringe models added on an as-needed, as-justified basis.
  
 We can't just 'invent' new cpu modules that will be a combination of
 flags we have on the host. It might work for some kernels/apps but it
 might break others. In addition to cpuid we have the stepping and more
 MSRs that needed to be hidden/exposed.

In whatever abstraction, exposing all of that low-level detail
as the sole configuration means will needlessly exaggerate the
complexity as above.

 The offer:
 Libvirt, virt-mgr, qemu will not deal with lowlevel bits of the cpuid.
 We'll use predefined cpu modules to be emulated.
 These predefined modules will represent a real module that was once
 shipped by Intel/AMD.
 
 We'll write an additional tool, not under libvirt, that will be able to
 calculate all the virtual cpus that can run over the physical cpu. This
 tool will be the one messing with /proc/cpuinfo, etc.
 
 Example (theoretical): Physical cpu is Intel(R) Core(TM)2 Duo CPU
 T9600  @ 2.80GHz the output of the tool will be:
 shellcomputeVirtCpuCapabilities
 core2duo
 core solo
 486
 pentium3
 ..
 
 Libvirt will only expose the real physical cpu and all of the outputs
 above. Higher level mgmt will compute the best -cpu to pick for the VM,
 and it will take account the user needs for performance or for migration
 flexibility.
 
 
 The cons:
  - Simple for all levels
  - Fast implementation
  - Accurate representative of real cpus
 
 The pros:
  - none - we should write the tool anyway
  - Maybe we hide some of the possibilities, but as said above, it is
safer and friendlier.

I think paving over the complexity to export the most common
use cases is a reasonable approach.  We can allow some sort
of fingers-in-the-gears mode for experimentation and tuning
as needed.  But supporting features such as safe migration
could be a non-goal in these scenarios.

-john


I also recommend of adding a field to be added next to the cpu for
flexibility and possible future changes + dealing with problematic
bios with NX bit disabled.
 
 Regards,
 Dor
 

 Thanks

 Mukesh

 On Thu, Sep 3, 2009 at 3:09 PM, Daniel P.
 Berrangeberra...@redhat.com  wrote:
 On Thu, Sep 03, 2009 at 11:19:47AM +0300, Dor Laor wrote:
 On 09/02/2009 07:09 PM, Daniel P. Berrange wrote:
 On Wed, Sep 02, 2009 at 11:59:39AM -0400, Jim Paris wrote:
 Jiri Denemark wrote:
 Hi,

 We need to provide support for CPU ID masking. Xen and VMware ESX
 are
 examples
 of current hypervisors which support such masking.

 My proposal is to define new 'cpuid' feature advertised in guest
 capabilities:
 ...
 domain type='xen' id='42'
  ...
  features
  pae/
  acpi/
  apic/
  cpuid
  mask level='1' register='ebx'
  :::1010::::
  /mask
 ...
 What are your opinions about this?

 I think it's too low-level, and the structure is x86-specific.  QEMU
 and KVM compute their CPUID response based on arguments to the -cpu
 argument, e.g.:

 -cpu core2duo,model=23,+ssse3,+lahf_lm

 I think a similar structure makes more sense for libvirt, where the
 configuration generally avoids big blocks of binary data, and the
 XML format should suit other architectures as well.

 I'm going backforth on this too. We essentially have 3 options

   - Named CPU + flags/features
   - CPUID masks
   - Allow either


 If we do either of the first two, we have to translate between the
 two formats for one or more of the hypervisors. For the last one we
 are just punting the problem off to applications.


 If we choose CPUID, and made QEMU driver convert to named CPU + flags
 we'd be stuck for non-x86 as you say.

 Why is that? cpu model + flags may apply for other arch too.

 If we have CPUID in the XML, there is no meaningful CPUID register
 representation for sparc/ppc/arm/etc. It is an x86 concept, which
 is almost certainly why QEMU uses named CPU models + named flags
 instead of CPUID as is public facing config.

 Xen/VMWare of course don't have this limitation since they only
 really care about x86.

 So really QEMU's  CPU model + flags approach is more generic albeit
 being much more verbose to achieve the same level of expressivity.

 If we chose named CPU + flags,  and made VMWare/Xen convert to raw
 CPUID we'd potentially loose information if user had defined a config
 with a raw CPUID mask outside context of libvirt.

 The other thing to remember is that CPUID also encodes sockets/cores/
 threads  

[libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.

2009-01-16 Thread john cooper

We have found certain application scenarios where
overriding of the default qemu host cache mode
provides a substantial improvement in guest
performance.  In particular, disabling host caching
of the file/dev backing a guest drive.  A summary
of performance metrics may be found below.

Attached is a simple patch which adds general
support for a cache=* attribute to be passed
to qemu from libvirt as part of a xml driver
element.  It is parsed in a domain's xml definition
as well  as being generated during output of the
same.  For example:

   disk type='file' device='disk'
 source file='/guest_roots/fc8.root'/
 target dev='hda'/
 driver name='qemu' type='qwerty' cache='none'/
   /disk

where both the existing type=* and added cache=*
attributes are optional and independent.

Note this is only intended to serve as an internal
hook allowing override of qemu's default cache
behavior.  At the present it is not proposed as an
end-user documented/visible feature.

-john


Mark Wagner wrote:

This data is provided is provided by the Red Hat Performance
team. It is intended to support the request for adding the
required functionality to libvirt to allow for setting a
storage parameter to control the caching behavior of QEMU.
This value would be passed on the QEMU command line.


Our testing of commercial databases in a larger, Enterprise
configuration (MSA, 16 cores, etc)  has shown that the default
QEMU caching behavior is not always the best. This set of data
compares the drop off in performance of both writethrough
and writeback cache compared to the nocache option. The main
metric is transactions per minute (tpm).  The other axis is
the number of simulated users (x1000)

% nocache TPM Results
K Users  WT  WB
 10 60%  64%
 20 66%  71%
 40 68%  72%
 60 71%  75%
 80 74%  79%
100 76%  83%

From the above set of data, it is clear that the default behavior
of QEMU is the worst performer out of the three cache options for
this type of use case.  It is also clear that we at minimum, 25%
of the possible TPM performance just due to the cache setting.







--
john.coo...@redhat.com

 domain_conf.c |   32 +---
 domain_conf.h |   15 +++
 qemu_conf.c   |   32 
 3 files changed, 68 insertions(+), 11 deletions(-)
=
--- a/src/domain_conf.h
+++ b/src/domain_conf.h
@@ -80,6 +80,20 @@ enum virDomainDiskBus {
 VIR_DOMAIN_DISK_BUS_LAST
 };
 
+/* summary of all possible host open file cache modes
+ */
+typedef enum {
+VIR_DOMAIN_DISK_CACHE_DEFAULT = 0,
+VIR_DOMAIN_DISK_CACHE_DISABLE,
+VIR_DOMAIN_DISK_CACHE_WRITEBACK,
+VIR_DOMAIN_DISK_CACHE_WRITETHRU,
+} host_cachemode;
+
+typedef struct {
+char *tag;
+host_cachemode idx;
+} cache_map_t;
+
 /* Stores the virtual disk configuration */
 typedef struct _virDomainDiskDef virDomainDiskDef;
 typedef virDomainDiskDef *virDomainDiskDefPtr;
@@ -91,6 +105,7 @@ struct _virDomainDiskDef {
 char *dst;
 char *driverName;
 char *driverType;
+host_cachemode cachemode;
 unsigned int readonly : 1;
 unsigned int shared : 1;
 int slotnum; /* pci slot number for unattach */
=
--- a/src/qemu_conf.c
+++ b/src/qemu_conf.c
@@ -616,6 +616,26 @@ qemudNetworkIfaceConnect(virConnectPtr c
 return NULL;
 }
 
+/* map from internal cache mode to qemu cache arg text
+ */ 
+static cache_map_t qemu_cache_map[] = {
+{none, VIR_DOMAIN_DISK_CACHE_DISABLE},
+{writeback, VIR_DOMAIN_DISK_CACHE_WRITEBACK},
+{writethrough, VIR_DOMAIN_DISK_CACHE_WRITETHRU},
+{NULL}};
+
+/* return qemu arg text * corresponding to idx if found, NULL otherwise
+ */
+static inline char *qemu_cache_mode(host_cachemode idx)
+{
+int i;
+
+for (i = 0; qemu_cache_map[i].tag; ++i)
+	if (qemu_cache_map[i].idx == idx)
+return (qemu_cache_map[i].tag);
+return (NULL);
+}
+
 static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev,
   char *buf,
   int buflen)
@@ -946,6 +966,8 @@ int qemudBuildCommandLine(virConnectPtr 
 virDomainDiskDefPtr disk = vm-def-disks[i];
 int idx = virDiskNameToIndex(disk-dst);
 const char *bus = virDomainDiskQEMUBusTypeToString(disk-bus);
+int nc;
+char *cachemode;
 
 if (disk-bus == VIR_DOMAIN_DISK_BUS_USB) {
 if (disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) {
@@ -980,15 +1002,17 @@ int qemudBuildCommandLine(virConnectPtr 
 break;
 }
 
-snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s%s,
+nc = snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s,
  disk-src ? disk-src : , bus,
  media ? media 

Re: [libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.

2009-01-19 Thread john cooper

Daniel P. Berrange wrote:
A couple of extra things needed 


 - Addition to tests/qemuxml2argvtest.c to validate the XML to
   struct to QEMU ARGV conversion.
 - Addition to tests/qemuxml2xmltest.c to  validate XML to
   struct to XML round-trip conversions.
 - Addition to the docs/libvirt.rng XML schema.

Ok. Let's address the above when we're satisfied with
the state of the patch as we have at least one open
issue (as below).


This static map  function are not required. Instead the int
to char * conversion (and its reverse) are automatically generated
through use of our built-in enum support macros

Yes you mentioned this in your earlier (off list) reply
to me. I did produce a patch version using the
VIR_ENUM_DECL/VIR_ENUM_IMPL macros but thought they may
not be the most natural fit for the task since there
is no way to multiply define a particular enum value
(eg: cache none || off). Also a given hypervisor will
likely support only a subset of the internally enumerated
option space forcing dummy-out of unused cache modes. But
that is speculative and I don't have a strong bias either
way. So I've reverted to use of VIR_ENUM_DECL/VIR_ENUM_IMPL
here.



There have been two differnet syntaxes supported in QEMU for caching,
so we need to detect this and switch between them. Originally there
was just

  cache=on  cache=off (writethrough and no caching)

Now it supports

  cache=writeback, cache=writethrough and cache=none|off

So, if we detect the earlier syntax we need to raise an error if
the user requested writeback (or translate it to 'cache=off' for
safety sake).

I was trying to address part of this as above. But in general
I don't see how we can force an error in advance of launching
qemu without some information beforehand of whether we're
dealing with an old vs. new syntax qemu. One way to do so is
to test launch qemu --help and let it tell us what it accepts.
Alternatively we can just let qemu error exit in the case an
old/new qemu version doesn't recognize the alternate syntax.

Other than the above I believe I've incorporated all of
your remaining suggestions. Attached is an updated patch.

-john


--
john.coo...@redhat.com

 domain_conf.c |   31 ---
 domain_conf.h |   16 
 qemu_conf.c   |   27 +++
 3 files changed, 63 insertions(+), 11 deletions(-)
=
--- a/src/domain_conf.h
+++ b/src/domain_conf.h
@@ -80,6 +80,21 @@ enum virDomainDiskBus {
 VIR_DOMAIN_DISK_BUS_LAST
 };
 
+/* summary of all possible host open file cache modes
+ */
+typedef enum {
+VIR_DOMAIN_DISK_CACHE_UNSPECIFIED,	/* reserved */
+VIR_DOMAIN_DISK_CACHE_OFF,
+VIR_DOMAIN_DISK_CACHE_ON,
+VIR_DOMAIN_DISK_CACHE_DISABLE,
+VIR_DOMAIN_DISK_CACHE_WRITEBACK,
+VIR_DOMAIN_DISK_CACHE_WRITETHRU,
+
+VIR_DOMAIN_DISK_CACHE_LAST
+} virDomainDiskCache;
+
+VIR_ENUM_DECL(virDomainDiskCache)
+
 /* Stores the virtual disk configuration */
 typedef struct _virDomainDiskDef virDomainDiskDef;
 typedef virDomainDiskDef *virDomainDiskDefPtr;
@@ -91,6 +106,7 @@ struct _virDomainDiskDef {
 char *dst;
 char *driverName;
 char *driverType;
+int cachemode;
 unsigned int readonly : 1;
 unsigned int shared : 1;
 int slotnum; /* pci slot number for unattach */
=
--- a/src/qemu_conf.c
+++ b/src/qemu_conf.c
@@ -616,6 +616,20 @@ qemudNetworkIfaceConnect(virConnectPtr c
 return NULL;
 }
 
+/* map from internal cache mode to qemu cache arg text
+ *
+ * Note: currently this replicates virDomainDiskCache, but will need to
+ * error flag potential new entries in virDomainDiskCache which are
+ * not supported by qemu (raising exceptions as appropriate).
+ */ 
+VIR_ENUM_IMPL(qemu_cache_map, VIR_DOMAIN_DISK_CACHE_LAST,
+, /* reserved -- mode not specified */
+off,  /* depreciated by none */
+on,   /* obsolete by writethrough */
+none,
+writeback,
+writethrough)
+
 static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev,
   char *buf,
   int buflen)
@@ -946,6 +960,8 @@ int qemudBuildCommandLine(virConnectPtr 
 virDomainDiskDefPtr disk = vm-def-disks[i];
 int idx = virDiskNameToIndex(disk-dst);
 const char *bus = virDomainDiskQEMUBusTypeToString(disk-bus);
+int nc;
+char *cachemode;
 
 if (disk-bus == VIR_DOMAIN_DISK_BUS_USB) {
 if (disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) {
@@ -980,15 +996,18 @@ int qemudBuildCommandLine(virConnectPtr 
 break;
 }
 
-snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s%s,
+nc = snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s,
  disk-src ? disk-src : , bus,
  media ? media : ,
  

Re: [libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.

2009-01-27 Thread john cooper

Daniel P. Berrange wrote:


If you loook at src/qemu_conf.c, you'll find a nice method called
qemudExtractVersionInfo, which runs 'qemu -help' and checks for
certain interesting command line arguments :-)

That problem does seem to be crying for some type
of structured interface to avoid subtle breakage
should someone modify the output of --help.
I'm sure I'm preaching to the choir here.

So it now adapts for the cases of old syntax and
writethrough as well as new syntax and on
since qemu will otherwise balk at those cache
flag / version combinations.

One note about the enums - rather than adding old style CACHEON
CACHE_OFF options to the main enum in domain_conf, just create
a second enum in the qemu_conf.c file for recording the mapping
of virDomainDiskCache values to old style QEMU arguments values..

As we are adapting in both directions I left the
single enum representing the entire option list
to simplify things. Updated patch is attached.

-john

--
john.coo...@redhat.com

 domain_conf.c |   31 ---
 domain_conf.h |   16 
 qemu_conf.c   |   40 +++-
 qemu_conf.h   |1 +
 4 files changed, 76 insertions(+), 12 deletions(-)
=
--- a/src/domain_conf.h
+++ b/src/domain_conf.h
@@ -80,6 +80,21 @@ enum virDomainDiskBus {
 VIR_DOMAIN_DISK_BUS_LAST
 };
 
+/* summary of all possible host open file cache modes
+ */
+typedef enum {
+VIR_DOMAIN_DISK_CACHE_UNSPECIFIED,	/* reserved */
+VIR_DOMAIN_DISK_CACHE_OFF,
+VIR_DOMAIN_DISK_CACHE_ON,
+VIR_DOMAIN_DISK_CACHE_DISABLE,
+VIR_DOMAIN_DISK_CACHE_WRITEBACK,
+VIR_DOMAIN_DISK_CACHE_WRITETHRU,
+
+VIR_DOMAIN_DISK_CACHE_LAST
+} virDomainDiskCache;
+
+VIR_ENUM_DECL(virDomainDiskCache)
+
 /* Stores the virtual disk configuration */
 typedef struct _virDomainDiskDef virDomainDiskDef;
 typedef virDomainDiskDef *virDomainDiskDefPtr;
@@ -91,6 +106,7 @@ struct _virDomainDiskDef {
 char *dst;
 char *driverName;
 char *driverType;
+int cachemode;
 unsigned int readonly : 1;
 unsigned int shared : 1;
 int slotnum; /* pci slot number for unattach */
=
--- a/src/qemu_conf.c
+++ b/src/qemu_conf.c
@@ -449,7 +449,9 @@ int qemudExtractVersionInfo(const char *
 if (strstr(help, -domid))
 flags |= QEMUD_CMD_FLAG_DOMID;
 if (strstr(help, -drive))
-flags |= QEMUD_CMD_FLAG_DRIVE;
+flags |= QEMUD_CMD_FLAG_DRIVE |
+(strstr(help, cache=writethrough|writeback|none) ?
+QEMUD_CMD_FLAG_DRIVE_CACHEMODE : 0);
 if (strstr(help, boot=on))
 flags |= QEMUD_CMD_FLAG_DRIVE_BOOT;
 if (version = 9000)
@@ -616,6 +618,20 @@ qemudNetworkIfaceConnect(virConnectPtr c
 return NULL;
 }
 
+/* map from internal cache mode to qemu cache arg text
+ *
+ * Note: currently this replicates virDomainDiskCache, but will need to
+ * error flag potential new entries in virDomainDiskCache which are
+ * not supported by qemu (raising exceptions as appropriate).
+ */ 
+VIR_ENUM_IMPL(qemu_cache_map, VIR_DOMAIN_DISK_CACHE_LAST,
+, /* reserved -- mode not specified */
+off,  /* depreciated by none */
+on,   /* obsolete by writethrough */
+none,
+writeback,
+writethrough)
+
 static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev,
   char *buf,
   int buflen)
@@ -721,6 +737,7 @@ int qemudBuildCommandLine(virConnectPtr 
 const char *emulator;
 char uuid[VIR_UUID_STRING_BUFLEN];
 char domid[50];
+unsigned int qf;
 
 uname(ut);
 
@@ -946,6 +963,7 @@ int qemudBuildCommandLine(virConnectPtr 
 virDomainDiskDefPtr disk = vm-def-disks[i];
 int idx = virDiskNameToIndex(disk-dst);
 const char *bus = virDomainDiskQEMUBusTypeToString(disk-bus);
+int nc, cachemode;
 
 if (disk-bus == VIR_DOMAIN_DISK_BUS_USB) {
 if (disk-device == VIR_DOMAIN_DISK_DEVICE_DISK) {
@@ -980,15 +998,27 @@ int qemudBuildCommandLine(virConnectPtr 
 break;
 }
 
-snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s%s,
+nc = snprintf(opt, PATH_MAX, file=%s,if=%s,%sindex=%d%s,
  disk-src ? disk-src : , bus,
  media ? media : ,
  idx,
  bootable 
  disk-device == VIR_DOMAIN_DISK_DEVICE_DISK
- ? ,boot=on : ,
- disk-shared  ! disk-readonly
- ? ,cache=off : );
+ ? ,boot=on : );
+
+if (cachemode = disk-cachemode) {
+if (qemudExtractVersionInfo(vm-def-emulator, NULL, qf)  0)
+;/* error reported */
+else if (!(qf  

Re: [libvirt] patch: allow disk cache mode to be specified in a domain's xml definition.

2009-01-30 Thread john cooper

Jim Meyering wrote:


Hi John,

I tried to apply that, but failed miserably,
since all of the following was recently redone to
use virBufferVSprintf rather than snprintf.

Yea I suspected the code was likely seeing some
motion. Thanks for bringing it forward.

And it's
a good thing, because with the changes below, there was
potential buffer overrun: nc could end up larger than PATH_MAX.

You are correct. I mistook the return value of
snprintf() in the case of truncation.


One remaining bit that I haven't done:
add tests for this, exercising e.g.,
  - cachemode
  - !cachemode  (disk-shared  !disk-readonly)
  - !cachemode  (!disk-shared || disk-readonly)
  

That logic should be correct as it exists in the
patch, namely we'll generate a cache=* flag in
the case cachemode was specified explicitly in
the xml definition or for the preexisting case of
(shared  !readonly). Here if both happen to be
true the explicit xml cache tag takes precedence.

But with the existing code we need to remove the
clause of:

@@ -1007,18 +1025,34 @@ int qemudBuildCommandLine(virConnectPtr
if (bootable 
disk-device == VIR_DOMAIN_DISK_DEVICE_DISK)
virBufferAddLit(opt, ,boot=on);
- if (disk-shared  !disk-readonly)
- virBufferAddLit(opt, ,cache=off);
if (disk-driverType)
virBufferVSprintf(opt, ,fmt=%s, disk-driverType);

As this results in generation of a cache=*
twice on the qemu cmdline, and possibly of
different dispositions.

With this change I think it is good to go.
The attached patch incorporates the above
modification.

-john

--
john.coo...@redhat.com

 domain_conf.c |   34 ++
 domain_conf.h |   16 
 qemu_conf.c   |   43 ++-
 qemu_conf.h   |3 ++-
 4 files changed, 82 insertions(+), 14 deletions(-)
=
--- a/src/domain_conf.h
+++ b/src/domain_conf.h
@@ -81,6 +81,21 @@ enum virDomainDiskBus {
 VIR_DOMAIN_DISK_BUS_LAST
 };
 
+/* summary of all possible host open file cache modes
+ */
+typedef enum {
+VIR_DOMAIN_DISK_CACHE_UNSPECIFIED,	/* reserved */
+VIR_DOMAIN_DISK_CACHE_OFF,
+VIR_DOMAIN_DISK_CACHE_ON,
+VIR_DOMAIN_DISK_CACHE_DISABLE,
+VIR_DOMAIN_DISK_CACHE_WRITEBACK,
+VIR_DOMAIN_DISK_CACHE_WRITETHRU,
+
+VIR_DOMAIN_DISK_CACHE_LAST
+} virDomainDiskCache;
+
+VIR_ENUM_DECL(virDomainDiskCache)
+
 /* Stores the virtual disk configuration */
 typedef struct _virDomainDiskDef virDomainDiskDef;
 typedef virDomainDiskDef *virDomainDiskDefPtr;
@@ -92,6 +107,7 @@ struct _virDomainDiskDef {
 char *dst;
 char *driverName;
 char *driverType;
+int cachemode;
 unsigned int readonly : 1;
 unsigned int shared : 1;
 int slotnum; /* pci slot number for unattach */
=
--- a/src/qemu_conf.c
+++ b/src/qemu_conf.c
@@ -1,7 +1,7 @@
 /*
  * config.c: VM configuration management
  *
- * Copyright (C) 2006, 2007, 2008 Red Hat, Inc.
+ * Copyright (C) 2006, 2007, 2008, 2009 Red Hat, Inc.
  * Copyright (C) 2006 Daniel P. Berrange
  *
  * This library is free software; you can redistribute it and/or
@@ -399,7 +399,9 @@ int qemudExtractVersionInfo(const char *
 if (strstr(help, -domid))
 flags |= QEMUD_CMD_FLAG_DOMID;
 if (strstr(help, -drive))
-flags |= QEMUD_CMD_FLAG_DRIVE;
+flags |= QEMUD_CMD_FLAG_DRIVE |
+(strstr(help, cache=writethrough|writeback|none) ?
+QEMUD_CMD_FLAG_DRIVE_CACHEMODE : 0);
 if (strstr(help, boot=on))
 flags |= QEMUD_CMD_FLAG_DRIVE_BOOT;
 if (version = 9000)
@@ -591,6 +593,22 @@ qemudNetworkIfaceConnect(virConnectPtr c
 return NULL;
 }
 
+VIR_ENUM_DECL(qemu_cache_map)
+
+/* map from internal cache mode to qemu cache arg text
+ *
+ * Note: currently this replicates virDomainDiskCache, but will need to
+ * error flag potential new entries in virDomainDiskCache which are
+ * not supported by qemu (raising exceptions as appropriate).
+ */
+VIR_ENUM_IMPL(qemu_cache_map, VIR_DOMAIN_DISK_CACHE_LAST,
+, /* reserved -- mode not specified */
+off,  /* deprecated; use none */
+on,   /* obsolete;   use writethrough */
+none,
+writeback,
+writethrough)
+
 static int qemudBuildCommandLineChrDevStr(virDomainChrDefPtr dev,
   char *buf,
   int buflen)
@@ -1007,11 +1025,27 @@ int qemudBuildCommandLine(virConnectPtr 
 if (bootable 
 disk-device == VIR_DOMAIN_DISK_DEVICE_DISK)
 virBufferAddLit(opt, ,boot=on);
-if (disk-shared  !disk-readonly)
-virBufferAddLit(opt, ,cache=off);
 if (disk-driverType)
 virBufferVSprintf(opt, ,fmt=%s, disk-driverType);
 
+unsigned int qf;
+int cachemode = disk-cachemode;
+if (cachemode) {
+if 

Re: [libvirt] [RHEL6 PATCH] Correct cpuid flags and model fields, V2

2010-08-03 Thread john cooper
A question arose in today's kvm meeting concerning any
impact to libvirt from this change.  I've discussed it
with Cole and it seems to be a non-issue.  But just to
err on the side of caution, here's a summary:

The current cpu model definition of qemu64 upstream
is problematic from kvm's perspective such that we need
to modify it slightly (BZ justifications listed below).
Doing so we've left the qemu64 definition as-is, but
added cpu64-rhel6 and cpu64-rhel5 models which
are now selected by default via -M machine, for
RHEL 6.0.0 PC and RHEL 5.x.y PC respectively.

So the only issue would be libvirt invoking qemu with
neither a -cpu nor -M argument (which defaults to
qemu64) or explicitly requesting -cpu qemu64.

From my discussion with Cole it appears the use cases
where this may happen fall outside of routine/expected
usage and would need to be explicitly requested by the
user.  However I wanted to call this out here in the
event we're overlooking something.

Thanks,

-john


http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-July/msg01110.html
http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-July/msg00831.html


john cooper wrote:
 Addresses BZs:
 
 #618332 - CPUID_EXT_POPCNT enabled in qemu64 and qemu32 built-in models.
 #613892 - [SR-IOV]VF device can not start on 32bit Windows2008 SP2 
 
 Summary:
 
 CPU feature flags for several processor definitions require correction
 to accurately reflect the corresponding shipped silicon.  In particular
 overly conservative values for cpuid model fields cause disable of
 MSI support in windows guests (BZ #613892).  Also recent upstream changes
 of qemu64's built-in definition enables POPCNT (for the benefit of TCG)
 but risks kvm migration breakage (BZ #618332).  The following patch
 address these issues collectively.
 
 Changes relative to previous version:
 
   - drop of the qemu32-rhel5 model as it appears to be unneeded.
 
   - rename of qemu64-rhel? to cpu64-rhel? as we're diverging from
 upstream qemu64's definition but haven't migrated to kvm64 quite yet.
 
   - Correction of several fields for the (now) cpu64-rhel5 model.
 
 Further detail may be found in the associated mail thread common to
 both BZ cases starting about here:
 
 
 http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-July/msg00831.html
 
 Brew Build:
 
 https://brewweb.devel.redhat.com/taskinfo?taskID=2643552
 
 Upstream status:
 
 BZ #618332 is a local, interim change and doesn't relate to upstream
 concerns.  Longer-term this qemu64 derived model would be displaced
 by kvm64.  The update of cpu definitions (BZ #613892) however needs
 to be pushed upstream upon validation in rhel6.  We're currently the
 sole users of the new cpu models which fosters this inverted process.

-- 
john.coo...@redhat.com

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list