> Let me study your trace, perhaps I will able to figure out the issue > without reproducing it.
Hi Sasha, I've been studying this problem more. The issue happens in this stack: ...subsys_init... topology_init() register_one_node(nid) link_mem_sections(nid, pgdat->node_start_pfn, pgdat->node_spanned_pages) register_mem_sect_under_node(mem_blk, nid) get_nid_for_pfn(pfn) pfn_to_nid(pfn) page_to_nid(page) PF_POISONED_CHECK(page) We are trying to get nid from struct page which has not been initialized. My patches add this extra scrutiny to make sure that we never get invalid nid from a "struct page" by adding PF_POISONED_CHECK() to page_to_nid(). So, the bug already exists in Linux where incorrect nid is read. The question is why does happen? First, I thought, that perhaps struct page is not yet initialized. But, the initcalls are done after deferred pages are initialized, and thus every struct page must be initialized by now. Also, if deferred pages were enabled, we would take a slightly different path and avoid this bug by getting nid from memblock instead of struct page: get_nid_for_pfn(pfn) #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT if (system_state < SYSTEM_RUNNING) return early_pfn_to_nid(pfn); #endif I also verified in your config that CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. So, one way to fix this issue, is to remove this "#ifdef" (I have not checked for dependancies), but this is simply addressing symptom, not fixing the actual issue. Thus, we have a "struct page" backing memory for this pfn, but we have not initialized it. For some reason memmap_init_zone() decided to skip it, and I am not sure why. Looking at the code we skip initializing if: !early_pfn_valid(pfn)) aka !pfn_valid(pfn) and if !early_pfn_in_nid(pfn, nid). I suspect, this has something to do with !pfn_valid(pfn). But, without having a machine on which I could reproduce this problem, I cannot study it further to determine exactly why pfn is not valid. Please replace !pfn_valid_within() with !pfn_valid() in get_nid_for_pfn() and see if problem still happens. If it does not happen, lets study the memory map, pgdata's start end, and the value of this pfn. Thank you, Pasha