On Thu, 20 Nov 2025 13:36:42 +0100 Michał Winiarski <[email protected]> wrote:
> Resetting the migration device state is typically delegated to PCI > .reset_done() callback. > With VFIO, reset is usually called under vdev->memory_lock, which causes > lockdep to report a following circular locking dependency scenario: > > 0: set_device_state > driver->state_mutex -> migf->lock > 1: data_read > migf->lock -> mm->mmap_lock > 2: vfio_pin_dma > mm->mmap_lock -> vdev->memory_lock > 3: vfio_pci_ioctl_reset > vdev->memory_lock -> driver->state_mutex > > Introduce a .migration_reset_state() callback called outside of > vdev->memory_lock to break the dependency chain. > > Signed-off-by: Michał Winiarski <[email protected]> > --- > drivers/vfio/pci/vfio_pci_core.c | 25 ++++++++++++++++++++++--- > include/linux/vfio.h | 4 ++++ > 2 files changed, 26 insertions(+), 3 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c > b/drivers/vfio/pci/vfio_pci_core.c > index 7dcf5439dedc9..d919636558ec8 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -553,6 +553,16 @@ int vfio_pci_core_enable(struct vfio_pci_core_device > *vdev) > } > EXPORT_SYMBOL_GPL(vfio_pci_core_enable); > > +static void vfio_pci_dev_migration_reset_state(struct vfio_pci_core_device > *vdev) > +{ > + lockdep_assert_not_held(&vdev->memory_lock); > + > + if (!vdev->vdev.mig_ops->migration_reset_state) mig_ops itself is generally NULL. > + return; > + > + vdev->vdev.mig_ops->migration_reset_state(&vdev->vdev); > +} > + > void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) > { > struct pci_dev *pdev = vdev->pdev; > @@ -662,8 +672,10 @@ void vfio_pci_core_disable(struct vfio_pci_core_device > *vdev) > * overwrite the previously restored configuration information. > */ > if (vdev->reset_works && pci_dev_trylock(pdev)) { > - if (!__pci_reset_function_locked(pdev)) > + if (!__pci_reset_function_locked(pdev)) { > vdev->needs_reset = false; > + vfio_pci_dev_migration_reset_state(vdev); > + } > pci_dev_unlock(pdev); > } > > @@ -1230,6 +1242,8 @@ static int vfio_pci_ioctl_reset(struct > vfio_pci_core_device *vdev, > ret = pci_try_reset_function(vdev->pdev); > up_write(&vdev->memory_lock); > > + vfio_pci_dev_migration_reset_state(vdev); > + > return ret; > } > > @@ -2129,6 +2143,7 @@ int vfio_pci_core_register_device(struct > vfio_pci_core_device *vdev) > if (vdev->vdev.mig_ops) { > if (!(vdev->vdev.mig_ops->migration_get_state && > vdev->vdev.mig_ops->migration_set_state && > + vdev->vdev.mig_ops->migration_reset_state && For bisection purposes it would be better to enforce this after all the drivers are converted. > vdev->vdev.mig_ops->migration_get_data_size) || > !(vdev->vdev.migration_flags & VFIO_MIGRATION_STOP_COPY)) > return -EINVAL; > @@ -2486,8 +2501,10 @@ static int vfio_pci_dev_set_hot_reset(struct > vfio_device_set *dev_set, > > err_undo: > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > - vdev.dev_set_list) > + vdev.dev_set_list) { > up_write(&vdev->memory_lock); > + vfio_pci_dev_migration_reset_state(vdev); > + } > > list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) > pm_runtime_put(&vdev->pdev->dev); > @@ -2543,8 +2560,10 @@ static void vfio_pci_dev_set_try_reset(struct > vfio_device_set *dev_set) > reset_done = true; > > list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { > - if (reset_done) > + if (reset_done) { > cur->needs_reset = false; > + vfio_pci_dev_migration_reset_state(cur); > + } This and the core_disable path above are only called in the close/open-error path. Do we really need this behavior there? We might need separate reconciliation vs .reset_done for these. As Kevin also noted, we're missing the non-ioctl reset paths. This approach seems a bit error prone. I wonder if instead we need a counterpart of vfio_pci_zap_and_down_write_memory_lock(), ie. vfio_pci_up_write_memory_lock_from_reset(). An equal mouthful, but scopes the problem to be more manageable at memory_lock release. Thanks, Alex > > if (!disable_idle_d3) > pm_runtime_put(&cur->pdev->dev); > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index eb563f538dee5..36aab2df40700 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -213,6 +213,9 @@ static inline bool vfio_device_cdev_opened(struct > vfio_device *device) > * @migration_get_state: Optional callback to get the migration state for > * devices that support migration. It's mandatory for > * VFIO_DEVICE_FEATURE_MIGRATION migration support. > + * @migration_reset_state: Optional callback to reset the migration state for > + * devices that support migration. It's mandatory for > + * VFIO_DEVICE_FEATURE_MIGRATION migration support. > * @migration_get_data_size: Optional callback to get the estimated data > * length that will be required to complete stop copy. It's > mandatory for > * VFIO_DEVICE_FEATURE_MIGRATION migration support. > @@ -223,6 +226,7 @@ struct vfio_migration_ops { > enum vfio_device_mig_state new_state); > int (*migration_get_state)(struct vfio_device *device, > enum vfio_device_mig_state *curr_state); > + void (*migration_reset_state)(struct vfio_device *device); > int (*migration_get_data_size)(struct vfio_device *device, > unsigned long *stop_copy_length); > };

