On Fri, 22 Sep 2017, Tejun Heo wrote: > > If you have this low priority maintenance job charging memory to the high > > priority hierarchy, you're already misconfigured unless you adjust > > /proc/pid/oom_score_adj because it will oom kill any larger process than > > itself in today's kernels anyway. > > > > A better configuration would be attach this hypothetical low priority > > maintenance job to its own sibling cgroup with its own memory limit to > > avoid exactly that problem: it going berserk and charging too much memory > > to the high priority container that results in one of its processes > > getting oom killed. > > And how do you guarantee that across delegation boundaries? The > points you raise on why the priority should be applied level-by-level > are exactly the same points why this doesn't really work. OOM killing > priority isn't something which can be distributed across cgroup > hierarchy level-by-level. The resulting decision tree doesn't make > any sense. >
It works very well in practice with real world usecases, and Roman has developed the same design independently that we have used for the past four years. Saying it doesn't make any sense doesn't hold a lot of weight when we both independently designed and implemented the same solution to address our usecases. > I'm not against adding something which works but strict level-by-level > comparison isn't the solution. > Each of the eight versions of Roman's cgroup aware oom killer has done comparisons between siblings at each level. Userspace influence on that comparison would thus also need to be done at each level. It's a very powerful combination in practice. Thanks.