On Thu, Oct 30, 2025 at 08:47:18AM +0000, Tian, Kevin wrote:
> It might need more work to meet this requirement. e.g. after patch4
> I could still spot other errors easily in the attach path:
>
> intel_iommu_attach_device()
> iopf_for_domain_set()
> intel_iommu_enable_iopf():
>
> if (!info->pri_enabled)
> return -ENODEV;
Yea, I missed that.
> intel_iommu_attach_device()
> dmar_domain_attach_device()
> domain_attach_iommu():
>
> curr = xa_cmpxchg(&domain->iommu_array, iommu->seq_id,
> NULL, info, GFP_KERNEL);
> if (curr) {
> ret = xa_err(curr) ? : -EBUSY;
> goto err_clear;
> }
There is actually an xa_load() in this function:
curr = xa_load(&domain->iommu_array, iommu->seq_id);
if (curr) {
curr->refcnt++;
kfree(info);
return 0;
}
[...]
info->refcnt = 1;
info->did = num;
info->iommu = iommu;
curr = xa_cmpxchg(&domain->iommu_array, iommu->seq_id,
NULL, info, GFP_KERNEL);
if (curr) {
ret = xa_err(curr) ? : -EBUSY;
goto err_clear;
}
It seems that this xa_cmpxchg could be just xa_store()?
> intel_iommu_attach_device()
> dmar_domain_attach_device()
> domain_setup_first_level()
> __domain_setup_first_level()
> intel_pasid_setup_first_level():
Yea. There are a few others in the track also..
> pte = intel_pasid_get_entry(dev, pasid);
> if (!pte) {
> spin_unlock(&iommu->lock);
> return -ENODEV;
> }
>
> if (pasid_pte_is_present(pte)) {
> spin_unlock(&iommu->lock);
> return -EBUSY;
> }
Hmm, this is fenced by iommu->lock and can race with !attach_dev
callbacks. It might be difficult to shift these to test_dev..
> On the other hand, how do we communicate whatever errors returned
> by attach_dev in the reset_done path back to userspace? As noted above
> resource allocation failures could still occur in attach_dev, but userspace
> may think the requested attach in middle of a reset has succeeded as
> long as it passes the test_dev check.
That's a legit point. Jason pointed out that we would end up with
some inconsistency between driver and core as well, at the SMMUv3
patch. So, this test_dev doesn't seemingly solve our problem very
well..
> Does it work better to block the attaching process upon ongoing reset
> and wake it up later upon reset_done to resume attach?
Yea, I think returning -EBUSY would be the simplest solution like
we did in the previous version.
But the concern is that VF might not be aware of a PF reset, so it
can still race an attachment, which would be -EBUSY as well. Then,
if its driver doesn't retry/defer the attach, this might break it?
FWIW, I am thinking of another design based on Jason's remarks:
https://lore.kernel.org/linux-iommu/aQBopHFub8wyQh5C@Asurada-Nvidia/
So, instead of core initiating the round trip between the blocking
domain and group->domain, it forwards dev_reset_prepare/done to the
driver where it does a low-level attachment that wouldn't fail:
For SMMUv3, it's an STE update.
For intel_iommu, it seems to be the context table update?
Then, any concurrent would be allowed to carry on to go through all
the compatibility/sanity checks as usual, but it would bypass the
final step: STE or context table update.
Thanks
Nicolin