On Thu 13-12-18 17:04:00, Johannes Weiner wrote:
[...]
> Acked-by: Johannes Weiner <han...@cmpxchg.org>

Thanks!

> Just one nit:
> 
> > @@ -2993,6 +2993,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
> >     struct vm_area_struct *vma = vmf->vma;
> >     vm_fault_t ret;
> >  
> > +   /*
> > +    * Preallocate pte before we take page_lock because this might lead to
> > +    * deadlocks for memcg reclaim which waits for pages under writeback.
> > +    */
> > +   if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) {
> > +           vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm, 
> > vmf->address);
> > +           if (!vmf->prealloc_pte)
> > +                   return VM_FAULT_OOM;
> > +           smp_wmb(); /* See comment in __pte_alloc() */
> > +   }
> 
> Could you be more specific in the deadlock comment? git blame will
> work fine for a while, but it becomes a pain to find corresponding
> patches after stuff gets moved around for years.
> 
> In particular the race diagram between reclaim with a page lock held
> and the fs doing SetPageWriteback batches before kicking off IO would
> be useful directly in the code, IMO.

This?

diff --git a/mm/memory.c b/mm/memory.c
index bb78e90a9b70..ece221e4da6d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2995,7 +2995,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
 
        /*
         * Preallocate pte before we take page_lock because this might lead to
-        * deadlocks for memcg reclaim which waits for pages under writeback.
+        * deadlocks for memcg reclaim which waits for pages under writeback:
+        *                              lock_page(A)
+        *                              SetPageWriteback(A)
+        *                              unlock_page(A)
+        * lock_page(B)
+        *                              lock_page(B)
+        * pte_alloc_pne
+        *   shrink_page_list
+        *     wait_on_page_writeback(A)
+        *                              SetPageWriteback(B)
+        *                              unlock_page(B)
+        *                              # flush A, B to clear the writeback
         */
        if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) {
                vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm, 
vmf->address);
-- 
Michal Hocko
SUSE Labs

Reply via email to