We have a few cases where we need to extend the data returned from the
INFO ioctls in VFIO.  For instance we already have devices exposed
through vfio-pci where VFIO_DEVICE_GET_REGION_INFO reports the region
as mmap-capable, but really only supports sparse mmaps, avoiding the
MSI-X table.  If we wanted to provide in-kernel emulation or extended
functionality for devices, we'd also want the ability to tell the
user not to mmap various regions, rather than forcing them to figure
it out on their own.

Another example is VFIO_IOMMU_GET_INFO.  We'd really like to expose
the actual IOVA capabilities of the IOMMU rather than letting the
user assume the address space they have available to them.  We could
add IOVA base and size fields to struct vfio_iommu_type1_info, but
what if we have multiple IOVA ranges.  For instance x86 uses a range
of addresses at 0xfee00000 for MSI vectors.  These typically are not
available for standard DMA IOVA mappings and splits our available IOVA
space into two ranges.  POWER systems have both an IOVA window below
4G as well as dynamic data window which they can use to remap all of
guest memory.

Representing variable sized arrays within a fixed structure makes it
very difficult to parse, we'd therefore like to put this data beyond
fixed fields within the data structures.  One way to do this is to
emulate capabilities in PCI configuration space.  A new flag indciates
whether capabilties are supported and a new fixed field reports the
offset of the first entry.  Users can then walk the chain to find
capabilities, adding capabilities does not require additional fields
in the fixed structure, and parsing variable sized data becomes

This patch outlines the theory and base header structure, which
should be shared by all future users.

Signed-off-by: Alex Williamson <alex.william...@redhat.com>
 include/uapi/linux/vfio.h |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 751b69f..432569f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -59,6 +59,33 @@
 #define VFIO_TYPE      (';')
 #define VFIO_BASE      100
+ * For extension of INFO ioctls, VFIO makes use of a capability chain
+ * designed after PCI/e capabilities.  A flag bit indicates whether
+ * this capability chain is supported and a field defined in the fixed
+ * structure defines the offset of the first capability in the chain.
+ * This field is only valid when the corresponding bit in the flags
+ * bitmap is set.  This offset field is relative to the start of the
+ * INFO buffer, as is the next field within each capability header.
+ * The id within the header is a shared address space per INFO ioctl,
+ * while the version field is specific to the capability id.  The
+ * contents following the header are specific to the capability id.
+ */
+struct vfio_info_cap_header {
+       __u16   id;             /* Identifies capability */
+       __u16   version;        /* Version specific to the capability ID */
+       __u32   next;           /* Offset of next capability */
+ * Callers of INFO ioctls passing insufficiently sized buffers will see
+ * the capability chain flag bit set, a zero value for the first capability
+ * offset (if available within the provided argsz), and argsz will be
+ * updated to report the necessary buffer size.  For compatibility, the
+ * INFO ioctl will not report error in this case, but the capability chain
+ * will not be available.
+ */
 /* -------- IOCTLs for VFIO file descriptor (/dev/vfio/vfio) -------- */

To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to