On Mon, 3 Dec 2018, Andrea Arcangeli wrote: > It's trivial to reproduce the badness by running a memhog process that > allocates more than the RAM of 1 NUMA node, under defrag=always > setting (or by changing memhog to use MADV_HUGEPAGE) and it'll create > swap storms despite 75% of the RAM is completely free in a 4 node NUMA > (or 50% of RAM free in a 2 node NUMA) etc.. > > How can it be ok to push the system into gigabytes of swap by default > without any special capability despite 50% - 75% or more of the RAM is > free? That's the downside of the __GFP_THISNODE optimizaton. >
The swap storm is the issue that is being addressed. If your remote memory is as low as local memory, the patch to clear __GFP_THISNODE has done nothing to fix it: you still get swap storms and memory compaction can still fail if the per-zone freeing scanner cannot utilize the reclaimed memory. Recall that this patch to clear __GFP_THISNODE was measured by me to have a 40% increase in allocation latency for fragmented remote memory on Haswell. It makes the problem much, much worse. > __GFP_THISNODE helps increasing NUMA locality if your app can fit in a > single node which is the common David's workload. But if his workload > would more often than not fit in a single node, he would also run into > an unacceptable slowdown because of the __GFP_THISNODE. > Which is why I have suggested that we do not do direct reclaim, as the page allocator implementation expects all thp page fault allocations to have __GFP_NORETRY set, because no amount of reclaim can be shown to be useful to the memory compaction freeing scanner if it is iterated over by the migration scanner. > I think there's lots of room for improvement for the future, but in my > view that __GFP_THISNODE as it was implemented was an incomplete hack, > that opened the door for bad VM corner cases that should not happen. > __GFP_THISNODE is intended specifically because of the remote access latency increase that is encountered if you fault remote hugepages over local pages of the native page size.