On Thu 29-08-19 19:14:46, Tetsuo Handa wrote:
> On 2019/08/29 16:11, Michal Hocko wrote:
> > On Wed 28-08-19 12:46:20, Edward Chron wrote:
> >> Our belief is if you really think eBPF is the preferred mechanism
> >> then move OOM reporting to an eBPF.
> >
> > I've said that all this additional information has to be dynamically
> > extensible rather than a part of the core kernel. Whether eBPF is the
> > suitable tool, I do not know. I haven't explored that. There are other
> > ways to inject code to the kernel. systemtap/kprobes, kernel modules and
> > probably others.
>
> As for SystemTap, guru mode (an expert mode which disables protection provided
> by SystemTap; allowing kernel to crash when something went wrong) could be
> used
> for holding spinlock. However, as far as I know, holding mutex (or doing any
> operation that might sleep) from such dynamic hooks is not allowed. Also we
> will
> need to export various symbols in order to allow access from such dynamic
> hooks.
This is the oom path and it should better not use any sleeping locks in
the first place.
> I'm not familiar with eBPF, but I guess that eBPF is similar.
>
> But please be aware that, I REPEAT AGAIN, I don't think neither eBPF nor
> SystemTap will be suitable for dumping OOM information. OOM situation means
> that even single page fault event cannot complete, and temporary memory
> allocation for reading from kernel or writing to files cannot complete.
And I repeat that no such reporting is going to write to files. This is
an OOM path afterall.
> Therefore, we will need to hold all information in kernel memory (without
> allocating any memory when OOM event happened). Dynamic hooks could hold
> a few lines of output, but not all lines we want. The only possible buffer
> which is preallocated and large enough would be printk()'s buffer. Thus,
> I believe that we will have to use printk() in order to dump OOM information.
> At that point,
Yes, this is what I've had in mind.
>
> static bool (*oom_handler)(struct oom_control *oc) = default_oom_killer;
>
> bool out_of_memory(struct oom_control *oc)
> {
> return oom_handler(oc);
> }
>
> and let in-tree kernel modules override current OOM killer would be
> the only practical choice (if we refuse adding many knobs).
Or simply provide a hook with the oom_control to be called to report
without replacing the whole oom killer behavior. That is not necessary.
--
Michal Hocko
SUSE Labs