On Tue, 13 Mar 2018 16:43:47 -0400 Pavel Tatashin <pasha.tatas...@oracle.com> wrote:
> > > Soft lockup: kernel has run for too long without rescheduling > > Hard lockup: kernel has run for too long with interrupts disabled > > > > Both of these are detected by the NMI watchdog handler. > > > > 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling > > point. Replacing that with touch_nmi_watchdog() won't work (I think). > > Presumably calling touch_softlockup_watchdog() will "work", in that it > > suppresses the warning. But it won't fix the thing which the warning > > is actually warning about: starvation of the CPU scheduler. That's > > what the cond_resched() does. > > But, unlike memmap_init_zone(), which can be used after boot, here we do > not worry about kernel running for too long. This is because we are > booting, and no user programs are running. > > So, it is acceptable to have a long uninterruptible span, as long > as we making a useful progress. BTW, the boot CPU still has > interrupts enabled during this span. > > Comment in: include/linux/nmi.h, states: > > * If the architecture supports the NMI watchdog, touch_nmi_watchdog() > * may be used to reset the timeout - for code which intentionally > * disables interrupts for a long time. This call is stateless. > > Which is exactly what we are trying to do here, now that these threads > run with interrupts disabled. > > Before, where they were running with interrupts enabled, and > cond_resched() was enough to satisfy soft lockups. hm, maybe. But I'm not sure that touch_nmi_watchdog() will hold off a soft lockup warning. Maybe it will. And please let's get the above thoughts into the changlog. > > > > I'm not sure what to suggest, really. Your changelog isn't the best: > > "Vlastimil Babka reported about a window issue during which when > > deferred pages are initialized, and the current version of on-demand > > initialization is finished, allocations may fail". Well... where is > > ths mysterious window? Without such detail it's hard for others to > > suggest alternative approaches. > > Here is hopefully a better description of the problem: > > Currently, during boot we preinitialize some number of struct pages to > satisfy all boot allocations. Even if these allocations happen when we > initialize the reset of deferred pages in page_alloc_init_late(). The problem > is that we do not know how much kernel will need, and it also depends on > various options. > > So, with this work, we are changing this behavior to initialize struct pages > on-demand, only when allocations happen. > > During boot, when we try to allocate memory, the on-demand struct page > initialization code takes care of it. But, once the deferred pages are > initializing in: > > page_alloc_init_late() > for_each_node_state(nid, N_MEMORY) > kthread_run(deferred_init_memmap()) > > We cannot use on-demand initialization, as these threads resize pgdat. > > This whole thing is to take care of this time. > > My first version of on-demand deferred page initialization would simply fail > to allocate memory during this period of time. But, this new version waits > for threads to finish initializing deferred memory, and successfully perform > the allocation. > > Because interrupt handler would wait for pgdat resize lock. OK, thanks. Please also add to changelog.