On Tue 2016-10-11 16:35:28, Sergey Senozhatsky wrote:
> On (10/10/16 13:17), Petr Mladek wrote:
> > > it may look that lockdep *probably* can report the issues via 'safe'
> > > printk,
> > > but that's a notably huge behavior breakage -- if lockdep report comes
> > > from
> > > an about-to-deadlock irq handler, then we won't see anything from that CPU
> > > unless there is a panic/nmi panic.
> > >
> > > so it probably has to be semi-automatic/semi-manual:
> > > - add might_printk() that would acquire/release console sem; or
> > > logbuf_lock (which is probably even better)
> > > - find all functions that do printk/WARN in kernel/time and kernel/sched
> > > - add might_printk() to those functions (just like might_sleep())
> > > - run the kernel
> > > - ...
> > > - profit
> > I like the idea with might_printk(). I hope that it will be acceptable
> > for the scheduler/timekeeping people.
> > JFYI, I could work on the printk-context handling in lockdep.
> > I am just working on a lockdep support in NMI and am getting
> > kind of familiar with that code.
> sorry, what do you mean by 'printk-context handling in lockdep'?
> wouldn't `lockdep + might_printk() + printk_safe' be enough? am I
> missing something?
Good question. It was my intuition that we would need some extra
support in lockdep. It is quite complicated and I have not thought
about it deep enough yet.
Let's unwind some of my thoughts:
Well, the nice thing about the lockdep is that it is able to detect
dangerous lock chains in advance. It means even without being
in the deadlock situation.
IMHO, the printk-related deadlocks will be solved in two ways.
Either by replacing the classic printk() with the deferred
variant or by surrounding a problematic code with
printk_safe_enter()/exit() calls. If we do the fix correctly,
the lockdep warning should disappear.
Now, replacing printk() with printk_deferred() will have effect
even with the current lockdep code. It is because printk_deferred()
will not longer take the console-related locks, so they will not
longer appear in the call chain.
But what about adding printk_safe_enter()/exit()? Of course,
if we do it correctly, it will prevent a deadlock. But lockdep
is supposed to detect this in advance. And these functions just
handle the printk_context variable. The guarded code will still
take the same locks. Therefore my feeling is that lockdep will
need to be aware of the printk_context. It would have similar
effect like taking lock with interrupts enabled or disabled.
By other words, safe printk context would prevent entering
the same chain of locks recursively. It is like the disabled
interrupts prevent a recursion.
Finally, note that the proposed might_printk() emulates the classic
printk() call. It the classic printk() causes problems somewhere,
it will be replaced by printk_deferred() and might_printk()
will be removed. By other words, might_printk() shows where
we need the first solution (using printk_deferred) but it
does not tell the lockdep about the entire story.
I am not sure if the above makes sense. I need much more
coffee, sleep, and thinking to sort the problem completely.
Let me look at it from a slightly different angle:
The deadlock is caused by a chain of taken locks. lockdep
is able to warn about it in advance because it knows that
two partial lock chains might eventually happen together.
For this, the information about the interrupt context
and disabled interrupts is important. It tells what
chains of locks might happen together and what are
prevented. IMHO, the printk_context has similar effect
like the interrupt context stuff.
Well, there is a difference. Being in interrupt context
and having disabled interrupts are two separate values.
printk_context defines just one value. It either means
that the situation with printk is easier or that we
will need one more variable for the lockdep handling
of the printk state.
Anyway, I would not solve lockdep in this patchset.
Let's keep it as another challenge.