On Thu, 13 Apr 2023 08:50:45 -0300
Jason Gunthorpe <j...@nvidia.com> wrote:

> On Thu, Apr 13, 2023 at 08:25:52AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <j...@nvidia.com>
> > > Sent: Thursday, April 13, 2023 4:07 AM
> > > 
> > >   
> > > > in which case we need c) a way to
> > > > report the overall set of affected devices regardless of ownership in
> > > > support of 4), BDF?  
> > > 
> > > Yes, continue to use INFO unmodified.
> > >   
> > > > Are we back to replacing group-ids with dev-ids in the INFO structure,
> > > > where an invalid dev-id either indicates an affected device with
> > > > implied ownership (ok) or a gap in ownership (bad) and a flag somewhere
> > > > is meant to indicate the overall disposition based on the availability
> > > > of reset?  
> > > 
> > > As you explore in the following this gets ugly. I prefer to keep INFO
> > > unchanged and add INFO2.
> > >   
> > 
> > INFO needs a change when VFIO_GROUP is disabled. Now it assumes
> > a valid iommu group always exists:
> > 
> > vfio_pci_fill_devs()
> > {
> >     ...
> >     iommu_group = iommu_group_get(&pdev->dev);
> >     if (!iommu_group)
> >             return -EPERM; /* Cannot reset non-isolated devices */
> >     ...
> > }  
> 
> This can still work in a ugly way. With a INFO2 the only purpose of
> INFO would be debugging, so if someone uses no-iommu, with hotreset
> and misconfigures it then the only downside is they don't get the
> debugging print. But we know of nothing that uses this combination
> anyhow..
> 
> > with that plus BDF cap, I'm curious what is the actual purpose of
> > INFO2 or why cannot requirement#3 reuse the information collected
> > via existing INFO?  
> 
> It can - it is just more complicated for userspace to do it, it has to
> extract and match the BDFs and then run some algorithm to determine if
> the opened devices cover the right set of devices in the reset group,
> and it has to have some special code for no-iommu.
> 
> VS info2 would return the dev_id's and a single yes/no if the right
> set is present. Kernel runs the algorithm instead of userspace, it
> seems more abstract this way.
> 
> Also, if we make iommufd return a 'ioas dev_id group' as well it
> composes nicely that userspace just needs one translation from dev_id.


IIUC, the semantics we're proposing is that an INFO2 ioctl would return
success or failure indicating whether the user has sufficient ownership
of the affected devices, and in the success case returns an array of
affected dev-ids within the user's iommufd_ctx.  Unopened, affected
devices, are not reported via INFO2, and unopened, affected devices
outside the user's scope of ownership (ie. outside the owned IOMMU
group) will generate a failure condition.

As for the INFO ioctl, it's described as unchanged, which does raise
the question of what is reported for IOMMU groups and how does the
value there coherently relate to anything else in the cdev-exclusive
vfio API...

We had already iterated a proposal where the group-id is replaced with
a dev-id in the existing ioctl and a flag indicates when the return
value is a dev-id vs group-id.  This had a gap that userspace cannot
determine if a reset is available given this information since un-owned
devices report an invalid dev-id and userspace can't know if it has
implicit ownership.

It seems cleaner to me though that we would could still re-use INFO in
a similar way, simply defining a new flag bit which is valid only in
the case of returning dev-ids and indicates if the reset is available.
Therefore in one ioctl, userspace knows if hot-reset is available
(based on a kernel determination) and can pull valid dev-ids from the
array to associate affected, owned devices, and still has the
equivalent information to know that one or more of the devices listed
with an invalid dev-id are preventing the hot-reset from being
available.

Is that an option?  Thanks,

Alex

Reply via email to