On (01/10/18 19:21), Peter Zijlstra wrote: > > On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > > 2. System runs out of memory, OOM triggers. > > 3. OOM handler is printing out OOM debug info. > > 4. While trying to emit the messages for netconsole, the network stack > > / driver tries to allocate memory and then fail, which in turn > > triggers allocation failure or other warning messages. printk was > > already flushing, so the messages are queued on the ring. > > 5. OOM handler keeps flushing but 4 repeats and the queue is never > > shrinking. Because OOM handler is trapped in printk flushing, it > > never manages to free memory and no one else can enter OOM path > > either, so the system is trapped in this state. > > Why not kill recursive OOM (msgs) ?
hm... do I understand it correctly that there is a console_unlock()->call_console_drivers()->FOO_write()->kmalloc()->printk() recursion? we call console drivers from printk-safe context now. so those printks from kmalloc are redirected to per-CPU printk-safe buffer, which is limited in size (we probably might start losing some of those OOM messages) and which is flushed (log_store()) from another context. -ss