On Wed, Apr 01, 2020 at 04:38:42PM -0700, Jacob Pan wrote:
> On Wed, 1 Apr 2020 16:03:01 +0200
> Jean-Philippe Brucker <jean-phili...@linaro.org> wrote:
> 
> > Hi Jacob,
> > 
> > On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> > > IOASID was introduced in v5.5 as a generic kernel allocator service
> > > for both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub
> > > Stream ID. In addition to basic ID allocation, ioasid_set was
> > > introduced as a token that is shared by a group of IOASIDs. This
> > > set token can be used for permission checking but lack of some
> > > features needed by guest Shared Virtual Address (SVA). In addition,
> > > IOASID support for life cycle management is needed among multiple
> > > users.
> > > 
> > > This patchset introduces two extensions to the IOASID code,
> > > 1. IOASID set operations
> > > 2. Notifications for IOASID state synchronization  
> > 
> > My main concern with this series is patch 7 changing the spinlock to a
> > mutex, which prevents SVA from calling ioasid_free() from the RCU
> > callback of MMU notifiers. Could we use atomic notifiers, or do the
> > FREE notification another way?
> > 
> Maybe I am looking at the wrong code, I thought
> mmu_notifier_ops.free_notifier() is called outside spinlock with
> call_srcu(), which will be invoked in the thread context.
> in mmu_notifier.c mmu_notifier_put()
>       spin_unlock(&mm->notifier_subscriptions->lock);
> 
>       call_srcu(&srcu, &subscription->rcu, mmu_notifier_free_rcu);

free_notifier() is called from RCU callback, and according to
Documentation/RCU/checklist.txt:

5.      If call_rcu() or call_srcu() is used, the callback function will
        be called from softirq context.  In particular, it cannot block.

When applying the patch I get the sleep-in-atomic warning:

[   87.861793] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:935
[   87.863293] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 74, name: 
kworker/6:1
[   87.863993] 2 locks held by kworker/6:1/74:
[   87.864493]  #0: ffffff885ac12538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: 
process_one_work+0x740/0x1880
[   87.865593]  #1: ffffff88591efd30 
((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x740/0x1880
[   87.866993] CPU: 6 PID: 74 Comm: kworker/6:1 Not tainted 
5.6.0-next-20200331+ #121
[   87.867393] Hardware name: FVP Base (DT)
[   87.867893] Workqueue: rcu_gp srcu_invoke_callbacks
[   87.868393] Call trace:
[   87.868793]  dump_backtrace+0x0/0x310
[   87.869293]  show_stack+0x14/0x20
[   87.869693]  dump_stack+0x124/0x180
[   87.870193]  ___might_sleep+0x2ac/0x428
[   87.870693]  __might_sleep+0x88/0x168
[   87.871094]  __mutex_lock+0xa0/0x1270
[   87.871593]  mutex_lock_nested+0x1c/0x28
[   87.872093]  ioasid_free+0x28/0x48
[   87.872493]  io_mm_free+0x1d0/0x608
[   87.872993]  mmu_notifier_free_rcu+0x74/0xe8
[   87.873393]  srcu_invoke_callbacks+0x1d0/0x2c8
[   87.873893]  process_one_work+0x858/0x1880
[   87.874393]  worker_thread+0x314/0xcd0
[   87.874793]  kthread+0x318/0x400
[   87.875293]  ret_from_fork+0x10/0x18

> 
> Anyway, if we have to use atomic. I tried atomic notifier first, there
> are two subscribers to the free event on x86.
> 1. IOMMU
> 2. KVM
> 
> For #1, the problem is that in the free operation, VT-d driver
> needs to do a lot of clean up in thread context.
> - hold a mutex to traverse a list of devices
> - clear PASID entry and flush cache
> 
> For #2, KVM might be able to deal with spinlocks for updating VMCS
> PASID translation table. +Hao
> 
> Perhaps two solutions I can think of:
> 1. Use a cyclic IOASID allocator. The main reason of clean up at free
> is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
> will take long time overflow. Then we can use atomic notifier and a
> deferred workqueue to do IOMMU cleanup. The downside is a large and
> growing PASID table, may not be a performance issue since it has TLB.

That might be a problem for SMMU, which has 1024 * 64kB leaf PASID tables,
for a total of 64MB per endpoint if there is too much fragmentation in
the IOASID space.

> 2. Let VFIO ensure free always happen after unbind. Then there is no
> need to do cleanup. But that requires VFIO to keep track of all the
> PASIDs within each VM. When the VM terminates, VFIO is responsible for
> the clean up. That was Yi's original proposal. I also tried to provide
> an IOASID set iterator for VFIO to free the IOASIDs within each VM/set,
> but the private data belongs to IOMMU driver.

Not really my place to comment on this, but I find it nicer to use the
same gpasid_unbind() path when VFIO frees a PASID as when the guest
explicitly unbinds before freeing. 

Thanks,
Jean

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to