On 05/16/2019 03:53 PM, Mark Rutland wrote: > Hi Michal, > > On Wed, May 15, 2019 at 06:58:47PM +0200, Michal Hocko wrote: >> On Tue 14-05-19 14:30:05, Anshuman Khandual wrote: >>> The arm64 pagetable dump code can race with concurrent modification of the >>> kernel page tables. When a leaf entries are modified concurrently, the dump >>> code may log stale or inconsistent information for a VA range, but this is >>> otherwise not harmful. >>> >>> When intermediate levels of table are freed, the dump code will continue to >>> use memory which has been freed and potentially reallocated for another >>> purpose. In such cases, the dump code may dereference bogus addressses, >>> leading to a number of potential problems. >>> >>> Intermediate levels of table may by freed during memory hot-remove, or when >>> installing a huge mapping in the vmalloc region. To avoid racing with these >>> cases, take the memory hotplug lock when walking the kernel page table. >> >> Why is this a problem only on arm64 > > It looks like it's not -- I think we're just the first to realise this. > > AFAICT x86's debugfs ptdump has the same issue if run conccurently with > memory hot remove. If 32-bit arm supported hot-remove, its ptdump code > would have the same issue. > >> and why do we even care for debugfs? Does anybody rely on this thing >> to be reliable? Do we even need it? Who is using the file? > > The debugfs part is used intermittently by a few people working on the > arm64 kernel page tables. We use that both to sanity-check that kernel > page tables are created/updated correctly after changes to the arm64 mmu > code, and also to debug issues if/when we encounter issues that appear > to be the result of kernel page table corruption. > > So while it's rare to need it, it's really useful to have when we do > need it, and I'd rather not remove it. I'd also rather that it didn't > have latent issues where we can accidentally crash the kernel when using > it, which is what this patch is addressing. > >> I am asking because I would really love to make mem hotplug locking less >> scattered outside of the core MM than more. Most users simply shouldn't >> care. Pfn walkers should rely on pfn_to_online_page. > > I'm not sure if that would help us here; IIUC pfn_to_online_page() alone > doesn't ensure that the page remains online. Is there a way to achieve > that other than get_online_mems()?
Still wondering how pfn_to_online_page() is applicable here. It validates a given PFN and whether its online from sparse section mapping perspective before giving it's struct page. IIUC it is used during a linear scanning of a physical address range not for a page table walk. So how it can solve the problem when a struct page which was used as an intermediate level page table page gets released back to the buddy from another concurrent thread ?