On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov <[email protected]> wrote:
> > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka <[email protected]> wrote:
> >> On 29.1.2017 13:44, Dmitry Vyukov wrote:
> >>> Hello,
> >>>
> >>> I've got the following deadlock report while running syzkaller fuzzer
> >>> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
> >>>
> >>> [ INFO: possible circular locking dependency detected ]
> >>> 4.10.0-rc5-next-20170125 #1 Not tainted
> >>> -------------------------------------------------------
> >>> syz-executor3/14255 is trying to acquire lock:
> >>>  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff814271c7>]
> >>> get_online_cpus+0x37/0x90 kernel/cpu.c:239
> >>>
> >>> but task is already holding lock:
> >>>  (pcpu_alloc_mutex){+.+.+.}, at: [<ffffffff81937fee>]
> >>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897
> >>>
> >>> which lock already depends on the new lock.
> >>
> >> I suspect the dependency comes from recent changes in drain_all_pages(). 
> >> They
> >> were later redone (for other reasons, but nice to have another validation) 
> >> in
> >> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. 
> >> Could
> >> you try if it helps?
> >
> > It happened only once on linux-next, so I can't verify the fix. But I
> > will watch out for other occurrences.
> 
> Unfortunately it does not seem to help.

I'm a little stuck on how to best handle this. get_online_cpus() can
halt forever if the hotplug operation is holding the mutex when calling
pcpu_alloc. One option would be to add a try_get_online_cpus() helper which
trylocks the mutex. However, given that drain is so unlikely to actually
make that make a difference when racing against parallel allocations,
I think this should be acceptable.

Any objections?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b93879990fd..a3192447e906 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned 
int order,
         */
        if (!page && !drained) {
                unreserve_highatomic_pageblock(ac, false);
-               drain_all_pages(NULL);
+
+               /*
+                * Only drain from contexts allocating for user allocations.
+                * Kernel allocations could be holding a CPU hotplug-related
+                * mutex, particularly hot-add allocating per-cpu structures
+                * while hotplug-related mutex's are held which would prevent
+                * get_online_cpus ever returning.
+                */
+               if (gfp_mask & __GFP_HARDWALL)
+                       drain_all_pages(NULL);
+
                drained = true;
                goto retry;
        }

-- 
Mel Gorman
SUSE Labs

Reply via email to