On Fri 27-07-18 12:32:59, John Allen wrote: > On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote: > > On Wed 25-07-18 13:11:15, John Allen wrote: > > [...] > > > Does a failure in do_migrate_range indicate that the range is unmigratable > > > and the loop in __offline_pages should terminate and goto failed_removal? > > > Or > > > should we allow a certain number of retrys before we > > > give up on migrating the range? > > > > Unfortunatelly not. Migration code doesn't tell a difference between > > ephemeral and permanent failures. We are relying on > > start_isolate_page_range to tell us this. So the question is, what kind > > of page is not migratable and for what reason. > > > > Are you able to add some debugging to give us more information. The > > current debugging code in the hotplug/migration sucks... > > After reproducing the problem a couple times, it seems that it can occur for > different types of pages. Running page-types on the offending page over two > separate instances produced the following: > > # tools/vm/page-types -a 307968-308224 > flags page-count MB symbolic-flags > long-symbolic-flags > 0x0000000000000400 1 0 > __________B________________________________ buddy > total 1 0
Huh! How come a buddy page has non zero reference count. > > And the following on a separate run: > > # tools/vm/page-types -a 313088-313344 > flags page-count MB symbolic-flags > long-symbolic-flags > 0x000000000000006c 1 0 > __RU_lA____________________________________ > referenced,uptodate,lru,active > total 1 0 Hmm, what is the expected page count in this case? Seeing 1 doesn't look particularly wrong. -- Michal Hocko SUSE Labs