On Tue, 13 Mar 2018 16:43:47 -0400 Pavel Tatashin <pasha.tatas...@oracle.com> 
wrote:

> 
> > Soft lockup: kernel has run for too long without rescheduling
> > Hard lockup: kernel has run for too long with interrupts disabled
> > 
> > Both of these are detected by the NMI watchdog handler.
> > 
> > 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling
> > point.  Replacing that with touch_nmi_watchdog() won't work (I think). 
> > Presumably calling touch_softlockup_watchdog() will "work", in that it
> > suppresses the warning.  But it won't fix the thing which the warning
> > is actually warning about: starvation of the CPU scheduler.  That's
> > what the cond_resched() does.
> 
> But, unlike memmap_init_zone(), which can be used after boot, here we do
> not worry about kernel running for too long.  This is because we are
> booting, and no user programs are running.
> 
> So, it is acceptable to have a long uninterruptible span, as long
> as we making a useful progress. BTW, the boot CPU still has
> interrupts enabled during this span.
> 
> Comment in: include/linux/nmi.h, states:
> 
>  * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
>  * may be used to reset the timeout - for code which intentionally
>  * disables interrupts for a long time. This call is stateless.
> 
> Which is exactly what we are trying to do here, now that these threads
> run with interrupts disabled.
> 
> Before, where they were running with interrupts enabled, and
> cond_resched() was enough to satisfy soft lockups.

hm, maybe.  But I'm not sure that touch_nmi_watchdog() will hold off a
soft lockup warning.  Maybe it will.

And please let's get the above thoughts into the changlog.

> > 
> > I'm not sure what to suggest, really.  Your changelog isn't the best:
> > "Vlastimil Babka reported about a window issue during which when
> > deferred pages are initialized, and the current version of on-demand
> > initialization is finished, allocations may fail".  Well...  where is
> > ths mysterious window?  Without such detail it's hard for others to
> > suggest alternative approaches.
> 
> Here is hopefully a better description of the problem:
> 
> Currently, during boot we preinitialize some number of struct pages to 
> satisfy all boot allocations. Even if these allocations happen when we 
> initialize the reset of deferred pages in page_alloc_init_late(). The problem 
> is that we do not know how much kernel will need, and it also depends on 
> various options.
> 
> So, with this work, we are changing this behavior to initialize struct pages 
> on-demand, only when allocations happen.
> 
> During boot, when we try to allocate memory, the on-demand struct page 
> initialization code takes care of it. But, once the deferred pages are 
> initializing in:
> 
> page_alloc_init_late()
>    for_each_node_state(nid, N_MEMORY)
>       kthread_run(deferred_init_memmap())
> 
> We cannot use on-demand initialization, as these threads resize pgdat.
> 
> This whole thing is to take care of this time.
> 
> My first version of on-demand deferred page initialization would simply fail 
> to allocate memory during this period of time. But, this new version waits 
> for threads to finish initializing deferred memory, and successfully perform 
> the allocation.
> 
> Because interrupt handler would wait for pgdat resize lock.

OK, thanks.  Please also add to changelog.

Reply via email to