On Fri, 22 Sep 2017, Tejun Heo wrote:
> > If you have this low priority maintenance job charging memory to the high
> > priority hierarchy, you're already misconfigured unless you adjust
> > /proc/pid/oom_score_adj because it will oom kill any larger process than
> > itself in today's kernels anyway.
> > A better configuration would be attach this hypothetical low priority
> > maintenance job to its own sibling cgroup with its own memory limit to
> > avoid exactly that problem: it going berserk and charging too much memory
> > to the high priority container that results in one of its processes
> > getting oom killed.
> And how do you guarantee that across delegation boundaries? The
> points you raise on why the priority should be applied level-by-level
> are exactly the same points why this doesn't really work. OOM killing
> priority isn't something which can be distributed across cgroup
> hierarchy level-by-level. The resulting decision tree doesn't make
> any sense.
It works very well in practice with real world usecases, and Roman has
developed the same design independently that we have used for the past
four years. Saying it doesn't make any sense doesn't hold a lot of weight
when we both independently designed and implemented the same solution to
address our usecases.
> I'm not against adding something which works but strict level-by-level
> comparison isn't the solution.
Each of the eight versions of Roman's cgroup aware oom killer has done
comparisons between siblings at each level. Userspace influence on that
comparison would thus also need to be done at each level. It's a very
powerful combination in practice.