On 1/14/20 5:00 AM, Jason Gunthorpe wrote:
On Mon, Jan 13, 2020 at 02:47:02PM -0800, Ralph Campbell wrote:
  void
  nouveau_svmm_fini(struct nouveau_svmm **psvmm)
  {
        struct nouveau_svmm *svmm = *psvmm;
+       struct mmu_interval_notifier *mni;
+
        if (svmm) {
                mutex_lock(&svmm->mutex);
+               while (true) {
+                       mni = mmu_interval_notifier_find(svmm->mm,
+                                       &nouveau_svm_mni_ops, 0UL, ~0UL);
+                       if (!mni)
+                               break;
+                       mmu_interval_notifier_put(mni);

Oh, now I really don't like the name 'put'. It looks like mni is
refcounted here, and it isn't. put should be called 'remove_deferred'

OK.

And then you also need a way to barrier this scheme on driver unload.

Good point. I can add something like
void mmu_interval_notifier_synchronize(struct mm_struct *mm)
that waits for deferred operations to complete similar to
mmu_interval_read_begin().

+               }
                svmm->vmm = NULL;
                mutex_unlock(&svmm->mutex);
-               mmu_notifier_put(&svmm->notifier);

While here it was actually a refcount.

+static void nouveau_svmm_do_unmap(struct mmu_interval_notifier *mni,
+                                const struct mmu_notifier_range *range)
+{
+       struct svmm_interval *smi =
+               container_of(mni, struct svmm_interval, notifier);
+       struct nouveau_svmm *svmm = smi->svmm;
+       unsigned long start = mmu_interval_notifier_start(mni);
+       unsigned long last = mmu_interval_notifier_last(mni);

This whole algorithm only works if it is protected by the read side of
the interval tree lock. Deserves at least a comment if not an
assertion too.

This is called from the invalidate() callback and while holding the
driver page table lock so the struct mmu_interval_notifier and
the interval tree can't change.
I will add comments for v7.

  static int nouveau_range_fault(struct nouveau_svmm *svmm,
                               struct nouveau_drm *drm, void *data, u32 size,
-                              u64 *pfns, struct svm_notifier *notifier)
+                              u64 *pfns, u64 start, u64 end)
  {
        unsigned long timeout =
                jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
        /* Have HMM fault pages within the fault window to the GPU. */
        struct hmm_range range = {
-               .notifier = &notifier->notifier,
-               .start = notifier->notifier.interval_tree.start,
-               .end = notifier->notifier.interval_tree.last + 1,
+               .start = start,
+               .end = end,
                .pfns = pfns,
                .flags = nouveau_svm_pfn_flags,
                .values = nouveau_svm_pfn_values,
+               .default_flags = 0,
+               .pfn_flags_mask = ~0UL,
                .pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT,
        };
-       struct mm_struct *mm = notifier->notifier.mm;
+       struct mm_struct *mm = svmm->mm;
        long ret;
while (true) {
                if (time_after(jiffies, timeout))
                        return -EBUSY;
- range.notifier_seq = mmu_interval_read_begin(range.notifier);
-               range.default_flags = 0;
-               range.pfn_flags_mask = -1UL;
                down_read(&mm->mmap_sem);

mmap sem doesn't have to be held for the interval search, and again we
have lifetime issues with the membership here.

I agree mmap_sem isn't needed for the interval search, it is needed if
the search doesn't find a registered interval and one needs to be created
to cover the underlying VMA. If an arbitrary size interval was created
instead, then mmap_sem wouldn't be needed.
I don't understand the lifetime/membership issue. The driver is the only thing
that allocates, inserts, or removes struct mmu_interval_notifier and thus
completely controls the lifetime.

+               ret = nouveau_svmm_interval_find(svmm, &range);
+               if (ret) {
+                       up_read(&mm->mmap_sem);
+                       return ret;
+               }
+               range.notifier_seq = mmu_interval_read_begin(range.notifier);
                ret = hmm_range_fault(&range, 0);
                up_read(&mm->mmap_sem);
                if (ret <= 0) {

I'm still not sure this is a better approach than what ODP does. It
looks very expensive on the fault path..

Jason


ODP doesn't have this problem because users have to call ib_reg_mr()
before any I/O can happen to the process address space. That is when
mmu_interval_notifier_insert() / mmu_interval_notifier_remove() can
be called and the driver doesn't have to worry about the interval
changing sizes or being removed while I/O is happening.
For GPU like devices, I'm trying to allow hardware access to any user
level address without pre-registering it. That means inserting mmu
interval notifiers for the ranges the GPU page faults on and updating
the intervals as munmap() calls remove parts of the address space.
I don't want to register an interval per page so the logical range
is the underlying VMA.

It isn't that expensive, there is an extra driver lock/unlock as
part of the lookup and possibly a find_vma() and kmalloc(GFP_ATOMIC)
for new intervals. Also, the deferred interval updates for munmap().
Compared to the cost of updating PTEs in the device and GPU fault
handling, this is minimal overhead.

_______________________________________________
Nouveau mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/nouveau

Reply via email to