On Tue, 13 Mar 2018 16:43:47 -0400 Pavel Tatashin <pasha.tatas...@oracle.com>
> > Soft lockup: kernel has run for too long without rescheduling
> > Hard lockup: kernel has run for too long with interrupts disabled
> > Both of these are detected by the NMI watchdog handler.
> > 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling
> > point. Replacing that with touch_nmi_watchdog() won't work (I think).
> > Presumably calling touch_softlockup_watchdog() will "work", in that it
> > suppresses the warning. But it won't fix the thing which the warning
> > is actually warning about: starvation of the CPU scheduler. That's
> > what the cond_resched() does.
> But, unlike memmap_init_zone(), which can be used after boot, here we do
> not worry about kernel running for too long. This is because we are
> booting, and no user programs are running.
> So, it is acceptable to have a long uninterruptible span, as long
> as we making a useful progress. BTW, the boot CPU still has
> interrupts enabled during this span.
> Comment in: include/linux/nmi.h, states:
> * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
> * may be used to reset the timeout - for code which intentionally
> * disables interrupts for a long time. This call is stateless.
> Which is exactly what we are trying to do here, now that these threads
> run with interrupts disabled.
> Before, where they were running with interrupts enabled, and
> cond_resched() was enough to satisfy soft lockups.
hm, maybe. But I'm not sure that touch_nmi_watchdog() will hold off a
soft lockup warning. Maybe it will.
And please let's get the above thoughts into the changlog.
> > I'm not sure what to suggest, really. Your changelog isn't the best:
> > "Vlastimil Babka reported about a window issue during which when
> > deferred pages are initialized, and the current version of on-demand
> > initialization is finished, allocations may fail". Well... where is
> > ths mysterious window? Without such detail it's hard for others to
> > suggest alternative approaches.
> Here is hopefully a better description of the problem:
> Currently, during boot we preinitialize some number of struct pages to
> satisfy all boot allocations. Even if these allocations happen when we
> initialize the reset of deferred pages in page_alloc_init_late(). The problem
> is that we do not know how much kernel will need, and it also depends on
> various options.
> So, with this work, we are changing this behavior to initialize struct pages
> on-demand, only when allocations happen.
> During boot, when we try to allocate memory, the on-demand struct page
> initialization code takes care of it. But, once the deferred pages are
> initializing in:
> for_each_node_state(nid, N_MEMORY)
> We cannot use on-demand initialization, as these threads resize pgdat.
> This whole thing is to take care of this time.
> My first version of on-demand deferred page initialization would simply fail
> to allocate memory during this period of time. But, this new version waits
> for threads to finish initializing deferred memory, and successfully perform
> the allocation.
> Because interrupt handler would wait for pgdat resize lock.
OK, thanks. Please also add to changelog.