Hello Zhenzhong,

On 4/18/24 10:42, Duan, Zhenzhong wrote:
Hi Cédric,

-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

Hello Zhenzhong

On 4/17/24 11:24, Duan, Zhenzhong wrote:


-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

On 4/17/24 06:21, Duan, Zhenzhong wrote:


-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

Hello,

On 4/16/24 09:09, Duan, Zhenzhong wrote:
Hi Cédric,

-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

On 4/8/24 10:44, Zhenzhong Duan wrote:
From: Yi Liu <yi.l....@intel.com>

If check fails, the host side device(either vfio or vdpa device) should
not
be passed to guest.

Implementation details for different backends will be in following
patches.

Signed-off-by: Yi Liu <yi.l....@intel.com>
Signed-off-by: Yi Sun <yi.y....@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
---
      hw/i386/intel_iommu.c | 35
+++++++++++++++++++++++++++++++++++
      1 file changed, 35 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4f84e2e801..a49b587c73 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -35,6 +35,7 @@
      #include "sysemu/kvm.h"
      #include "sysemu/dma.h"
      #include "sysemu/sysemu.h"
+#include "sysemu/iommufd.h"
      #include "hw/i386/apic_internal.h"
      #include "kvm/kvm_i386.h"
      #include "migration/vmstate.h"
@@ -3819,6 +3820,32 @@ VTDAddressSpace
*vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
          return vtd_dev_as;
      }

+static int vtd_check_legacy_hdev(IntelIOMMUState *s,
+                                 HostIOMMUDevice *hiod,
+                                 Error **errp)
+{
+    return 0;
+}
+
+static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
+                                  HostIOMMUDevice *hiod,
+                                  Error **errp)
+{
+    return 0;
+}
+
+static int vtd_check_hdev(IntelIOMMUState *s,
VTDHostIOMMUDevice
*vtd_hdev,
+                          Error **errp)
+{
+    HostIOMMUDevice *hiod = vtd_hdev->dev;
+
+    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD))
{
+        return vtd_check_iommufd_hdev(s, hiod, errp);
+    }
+
+    return vtd_check_legacy_hdev(s, hiod, errp);
+}


I think we should be using the .get_host_iommu_info() class handler
instead. Can we refactor the code slightly to avoid this check on
the type ?

There is some difficulty ini avoiding this check, the behavior of
vtd_check_legacy_hdev
and vtd_check_iommufd_hdev are different especially after nesting
support introduced.
vtd_check_iommufd_hdev() has much wider check over cap/ecap bits
besides aw_bits.

I think it is important to fully separate the vIOMMU model from the
host IOMMU backing device.

This comment is true for the structures also.

Could we introduce a new HostIOMMUDeviceClass
handler .check_hdev() handler, which would
call .get_host_iommu_info() ?

This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should be
a common structure 'HostIOMMUDeviceInfo' holding all attributes
for the different backends. Each .get_host_iommu_info() implementation
would translate the specific host iommu device data presentation
into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.

I see, it's just not easy to define the unified elements in HostIOMMUDeviceInfo
so that they maps to bits or fields in host return IOMMU info.

The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a new
API needs to be completely defined for it. The IOMMU backend implementation
could be anything, legacy, iommufd, iommufd v2, some other framework and
the vIOMMU shouldn't be aware of its implementation.

Exposing the kernel structures as done below should be avoided because
they are part of the QEMU <-> kernel IOMMUFD interface.


Different platform returned host IOMMU info is platform specific.
For vtd and siommu:

struct iommu_hw_info_vtd {
         __u32 flags;
         __u32 __reserved;
         __aligned_u64 cap_reg;
         __aligned_u64 ecap_reg;
};

struct iommu_hw_info_arm_smmuv3 {
        __u32 flags;
        __u32 __reserved;
        __u32 idr[6];
        __u32 iidr;
        __u32 aidr;
};

I can think of two kinds of declaration of HostIOMMUDeviceInfo:

struct HostIOMMUDeviceInfo {
     uint8_t aw_bits;
     enum iommu_hw_info_type type;
     union {
         struct iommu_hw_info_vtd vtd;
         struct iommu_hw_info_arm_smmuv3;
         ......
     } data;
}

or

struct HostIOMMUDeviceInfo {
     uint8_t aw_bits;
     enum iommu_hw_info_type type;
     __u32 flags;
     __aligned_u64 cap_reg;
     __aligned_u64 ecap_reg;
     __u32 idr[6];
     __u32 iidr;
     __u32 aidr;
    ......
}

Not clear if any is your expected format.

'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
type attribute and host iommu device type definitions, or as you
suggested with a QOM interface. This is more complex however. In
this case, I would suggest to implement a .compatible() handler to
compare the host iommu device type with the vIOMMU type.

The resulting check_hdev routine would look something like :

static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
*vtd_hdev,
                           Error **errp)
{
     HostIOMMUDevice *hiod = vtd_hdev->dev;
     HostIOMMUDeviceClass *hiodc =
HOST_IOMMU_DEVICE_GET_CLASS(hiod);
     HostIOMMUDevice info;
     int host_aw_bits, ret;

     ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
     if (ret) {
         return ret;
     }

     ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
     if (ret) {
         return ret;
     }

     if (s->aw_bits > info.aw_bits) {
         error_setg(errp, "aw-bits %d > host aw-bits %d",
                    s->aw_bits, info.aw_bits);
         return -EINVAL;
     }
}

and the HostIOMMUDeviceClass::is_compatible() handler would call a
vIOMMUInterface::compatible() handler simply returning
IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?

Not quite get what HostIOMMUDeviceClass::is_compatible() does.

HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend
to determine which IOMMU types are exposed by the host, then calls the
vIOMMUInterface::compatible() handler to do the compare. API is to be
defined.

As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice
interface capabilities, or features, to check more precisely the level
of compatibility between the vIOMMU and the host IOMMU device. This is
similar to what is done between QEMU and KVM.

If you think this is too complex, include type in HostIOMMUDeviceInfo.

Currently legacy and IOMMUFD host device has different check logic, how it can 
help
in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into a single 
vtd_check_hdev()?

IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation, but
if you think the Intel vIOMMU should access directly the iommufd backend
when available, then we should drop this proposal and revisit the design
to take a different approach.

Below is the two functions after nesting series, for your easy reference:

static int vtd_check_legacy_hdev()
{
     if (s->scalable_modern) {
         /* Modern vIOMMU and legacy backend */
         error_setg(errp, "Need IOMMUFD backend in scalable modern mode");
         return -EINVAL;
     }

This part would typically go in the compatible() handler.


     ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
     if (ret) {
         return ret;
     }

     if (s->aw_bits > info.aw_bits) {
         error_setg(errp, "aw-bits %d > host aw-bits %d",
                    s->aw_bits, info.aw_bits);
         return -EINVAL;
     }

     return 0;
}

static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
                                   VTDHostIOMMUDevice *vtd_hdev,
                                   Error **errp)
{
     ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
     if (ret) {
         return ret;
     }

     if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
         error_setg(errp, "IOMMU hardware is not compatible");
         return -EINVAL;
     }

     vtd = &info.data.vtd;
     host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1;
     if (s->aw_bits > host_aw_bits) {
         error_setg(errp, "aw-bits %d > host aw-bits %d",
                    s->aw_bits, host_aw_bits);
         return -EINVAL;
     }

     if (!s->scalable_modern) {
         goto done;
     }

     if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
         error_setg(errp, "Host IOMMU doesn't support nested translation");
         return -EINVAL;
     }

     if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) {
         error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");
         return -EINVAL;
     }

These checks above would typically go in the compatible() handler also.

Now, the question is how useful will that framework be if hotplugging
devices always fail because the vIOMMU and host IOMMU devices have
incompatible settings/capabilities ? Shouldn't we also add properties
at the vIOMMU level ?


Thanks,

C.


Reply via email to