On Tue 20-09-16 10:37:04, Mike Kravetz wrote:
> On 09/20/2016 08:53 AM, Gerald Schaefer wrote:
> > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> > list corruption and addressing exception when trying to set a memory
> > block offline that is part (but not the first part) of a gigantic
> > hugetlb page with a size > memory block size.
> > 
> > When no other smaller hugepage sizes are present, the VM_BUG_ON() will
> > trigger directly. In the other case we will run into an addressing
> > exception later, because dissolve_free_huge_page() will not use the head
> > page of the compound hugetlb page which will result in a NULL hstate
> > from page_hstate(). list_del() would also not work well on a tail page.
> > 
> > To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> > use the compound head page in dissolve_free_huge_page().
> > 
> > However, this all assumes that it is the desired behaviour to remove
> > a (gigantic) unused hugetlb page from the pool, just because a small
> > (in relation to the  hugepage size) memory block is going offline. Not
> > sure if this is the right thing, and it doesn't look very consistent
> > given that in this scenario it is _not_ possible to migrate
> > such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
> > will return false in both cases, i.e. the memory block will be reported
> > as removable, no matter if the hugepage that it is part of is unused or
> > in use.
> > 
> > This patch is assuming that it would be OK to remove the hugepage,
> > i.e. memory offline beats pre-allocated unused (gigantic) hugepages.
> > 
> > Any thoughts?
> Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in
> this thread:
> https://lkml.org/lkml/2016/9/13/146
> Their approach (I believe) would be to fail the offline operation in
> this case.  However, I could argue that failing the operation, or
> dissolving the unused huge page containing the area to be offlined is
> the right thing to do.

I am sorry I have noticed this thread only now. I was arguing about this
in the original thread. I would be rather reluctant to free gigantic
page just because somebody wants to offline a small part of it because
setup is really expensive and a lost page would be really hard to get

I would even question the per page block offlining itself. Why would
anybody want to offline few blocks rather than the whole node? What is
the usecase here?
Michal Hocko

Reply via email to