On Wed 05-12-18 16:58:02, Linus Torvalds wrote: [...] > I realize that we probably do want to just have explicit policies that > do not exist right now, but what are (a) sane defaults, and (b) sane > policies?
I would focus on the current default first (which is defrag=madvise). This means that we only try the cheapest possible THP without MADV_HUGEPAGE. If there is none we simply fallback. We do restrict to the local node. I guess there is a general agreement that this is a sane default. MADV_HUGEPAGE changes the picture because the caller expressed a need for THP and is willing to go extra mile to get it. That involves allocation latency and as of now also a potential remote access. We do not have complete agreement on the later but the prevailing argument is that any strong NUMA locality is just reinventing node-reclaim story again or makes THP success rate down the toilet (to quote Mel). I agree that we do not want to fallback to a remote node overeagerly. I believe that something like the below would be sensible 1) THP on a local node with compaction not giving up too early 2) THP on a remote node in NOWAIT mode - so no direct compaction/reclaim (trigger kswapd/kcompactd only for defrag=defer+madvise) 3) fallback to the base page allocation This would allow both full memory utilization and try to be as local as possible. Whoever strongly prefers NUMA locality should be using MPOL_NODE_RECLAIM (or similar) and that would skip 2 and make 1) and 2) use more aggressive compaction and reclaim. This will also fit into our existing NUMA api. MPOL_NODE_RECLAIM wouldn't be restricted to THP obviously. It would act on base pages as well and it would basically use the same implementation as we have for the global node_reclaim and make it usable again. Does this sound at least remotely sane? -- Michal Hocko SUSE Labs