On 11/02/2017 09:32 AM, Michal Hocko wrote:
On Tue 31-10-17 11:50:02, Pavel Tatashin wrote:
[...]
The problem happens in this path:

page_alloc_init_late
   deferred_init_memmap
     deferred_init_range
       __def_free
         deferred_free_range
           __free_pages_boot_core(page, order)
             __free_pages()
               __free_pages_ok()
                 free_one_page()
                   __free_one_page(page, pfn, zone, order, migratetype);

deferred_init_range() initializes one page at a time by calling
__init_single_page(), once it initializes pageblock_nr_pages pages, it
calls deferred_free_range() to free the initialized pages to the buddy
allocator. Eventually, we reach __free_one_page(), where we compute buddy
page:
        buddy_pfn = __find_buddy_pfn(pfn, order);
        buddy = page + (buddy_pfn - pfn);

buddy_pfn is computed as pfn ^ (1 << order), or pfn + pageblock_nr_pages.
Thefore, buddy page becomes a page one after the range that currently was
initialized, and we access this page in this function. Also, later when we
return back to deferred_init_range(), the buddy page is initialized again.

So, in order to avoid this issue, we must initialize the buddy page prior
to calling deferred_free_range().

How come we didn't have this problem previously? I am really confused.


Hi Michal,

Previously as before my project? That is because memory for all struct pages was always zeroed in memblock, and in __free_one_page() page_is_buddy() was always returning false, thus we never tried to incorrectly remove it from the list:

837                     list_del(&buddy->lru);

Now, that memory is not zeroed, page_is_buddy() can return true after kexec when memory is dirty (unfortunately memset(1) with CONFIG_VM_DEBUG does not catch this case). And proceed further to incorrectly remove buddy from the list.

This is why we must initialize the computed buddy page beforehand.

Pasha

Reply via email to