On Thu, 19 Jan 2023 03:43:36 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Matthew Rosato <[email protected]>
> > Sent: Wednesday, January 18, 2023 10:56 PM
> > 
> > On 1/18/23 4:03 AM, Tian, Kevin wrote:  
> > >> From: Alex Williamson
> > >> Sent: Wednesday, January 18, 2023 5:23 AM
> > >>
> > >> On Fri, 13 Jan 2023 19:03:51 -0500
> > >> Matthew Rosato <[email protected]> wrote:
> > >>  
> > >>>  void vfio_device_group_close(struct vfio_device *device)
> > >>>  {
> > >>> +       void (*put_kvm)(struct kvm *kvm);
> > >>> +       struct kvm *kvm;
> > >>> +
> > >>>         mutex_lock(&device->group->group_lock);
> > >>> +       kvm = device->kvm;
> > >>> +       put_kvm = device->put_kvm;
> > >>>         vfio_device_close(device, device->group->iommufd);
> > >>> +       if (kvm == device->kvm)
> > >>> +               kvm = NULL;  
> > >>
> > >> Hmm, so we're using whether the device->kvm pointer gets cleared in
> > >> last_close to detect whether we should put the kvm reference.  That's a
> > >> bit obscure.  Our get and put is also asymmetric.
> > >>
> > >> Did we decide that we couldn't do this via a schedule_work() from the
> > >> last_close function, ie. implementing our own version of an async put?
> > >> It seems like that potentially has a cleaner implementation, symmetric
> > >> call points, handling all the storing and clearing of kvm related
> > >> pointers within the get/put wrappers, passing only a vfio_device to the
> > >> put wrapper, using the "vfio_device_" prefix for both.  Potentially
> > >> we'd just want an unconditional flush outside of lock here for
> > >> deterministic release.
> > >>
> > >> What's the downside?  Thanks,
> > >>  
> > >
> > > btw I guess this can be also fixed by Yi's work here:
> > >
> > > https://lore.kernel.org/kvm/[email protected]/
> > >
> > > with set_kvm(NULL) moved to the release callback of kvm_vfio device,
> > > such circular lock dependency can be avoided too.  
> > 
> > Oh, interesting...  It seems to me that this would eliminate the reported 
> > call
> > chain altogether:
> > 
> > kvm_put_kvm  
> >  -> kvm_destroy_vm
> >   -> kvm_destroy_devices
> >    -> kvm_vfio_destroy (starting here -- this would no longer be executed)
> >     -> kvm_vfio_file_set_kvm
> >      -> vfio_file_set_kvm
> >       -> group->group_lock/group_rwsem  
> > 
> > because kvm_destroy_devices now can't end up calling kvm_vfio_destroy
> > and friends, it won't try and acquire the group lock a 2nd time making a
> > kvm_put_kvm while the group lock is held OK to do.  The vfio_file_set_kvm
> > call will now always come from a separate thread of execution,
> > kvm_vfio_group_add, kvm_vfio_group_del or the release thread:
> > 
> > kvm_device_release (where the group->group_lock would not be held since
> > vfio does not trigger closing of the kvm fd)  
> >  -> kvm_vfio_destroy (or, kvm_vfio_release)
> >   -> kvm_vfio_file_set_kvm
> >    -> vfio_file_set_kvm
> >     -> group->group_lock/group_rwsem  
> 
> Yes, that's my point. If Alex/Jason are also OK with it probably Yi can
> send that patch separately as a fix to this issue. It's much simpler. 😊

If we can extract that flow separate from the cdev refactoring, ideally
something that matches the stable kernel backport rules, then that
sounds like the preferred solution.  Thanks,

Alex

Reply via email to