On 2017-01-17 Michal Hocko wrote: > On Tue 17-01-17 14:21:14, Mel Gorman wrote: > > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote: > > > On Mon 16-01-17 11:09:34, Mel Gorman wrote: > > > [...] > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > index 532a2a750952..46aac487b89a 100644 > > > > --- a/mm/vmscan.c > > > > +++ b/mm/vmscan.c > > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist > > > > *zonelist, struct scan_control *sc) continue; > > > > > > > > if (sc->priority != DEF_PRIORITY && > > > > + !buffer_heads_over_limit && > > > > !pgdat_reclaimable(zone->zone_pgdat)) > > > > continue; /* Let kswapd > > > > poll it */ > > > > > > I think we should rather remove pgdat_reclaimable here. This > > > sounds like a wrong layer to decide whether we want to reclaim > > > and how much. > > > > I had considered that but it'd also be important to add the other > > 32-bit patches you have posted to see the impact. Because of the > > ratio of LRU pages to slab pages, it may not have an impact but > > it'd need to be eliminated. > > OK, Trevor you can pull from > git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree > fixes/highmem-node-fixes branch. This contains the current mmotm tree > + the latest highmem fixes. I also do not expect this would help much > in your case but as Mel've said we should rule that out at least.
Hi! The git tree above version oom'd after < 24 hours (3:02am) so it doesn't solve the bug. If you need a oom messages dump let me know. Let me know what to try next, guys, and I'll test it out. > > Before prototyping such a thing, I'd like to hear the outcome of > > this heavy hack and then add your 32-bit patches onto the list. If > > the problem is still there then I'd next look at taking slab pages > > into account in pgdat_reclaimable() instead of an outright removal > > that has a much wider impact. If that doesn't work then I'll > > prototype a heavy-handed forced slab reclaim when lower zones are > > almost all slab pages. I don't think I've tried the "heavy hack" patch yet? It's not in the mhocko tree I just tried? Should I try the heavy hack on top of mhocko git or on vanilla or what? I also want to mention that these PAE boxes suffer from another problem/bug that I've worked around for almost a year now. For some reason it keeps gnawing at me that it might be related. The disk I/O goes to pot on this/these PAE boxes after a certain amount of disk writes (like some unknown number of GB, around 10-ish maybe). Like writes go from 500MB/s to 10MB/s!! Reboot and it's magically 500MB/s again. I detail this here: https://muug.ca/pipermail/roundtable/2016-June/004669.html My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE kernel to be more sane about highmem choices. I never filed a bug because I read a ton of stuff saying Linus hates PAE, don't use over 4G, blah blah. But the other fix is to: set /proc/sys/vm/highmem_is_dirtyable to 1 I'm not bringing this up to get attention to a new bug, I bring this up because it smells like it might be related. If something slowly eats away at the box's vm to the point that I/O gets horribly slow, perhaps it's related to the slab and high/lomem issue we have here? And if related, it may help to solve the oom bug. If I'm way off base here, just ignore my tangent! The funny thing is I thought mem=XG where X<8 solved the problem, but it doesn't! It greatly mitigates it, but I still get subtle slowdown that gets worse over time (like weeks instead of days). I now use the highmem_is_dirtyable on most boxes and that seems to solve it for good in combo with mem=XG. Let me note, however, that I have NOT set highmem_is_dirtyable=1 on the test box I am using for all of this building/testing, as I wanted the config to stay static while I work through this oom bug. (I'm real curious to see if highmem_is_dirtyable=1 would have any impact on the oom though!) Thanks!

