On Wed 2016-10-05 10:27:14, Sergey Senozhatsky wrote: > On (10/04/16 16:52), Petr Mladek wrote: > > > > > > Or is there any other catch that I do not see at the moment? > > > > And there is :-( The above logic looked at the problem only from > > one side. It was about errors starting from the printk() > > code itself, for example: > > yes, like I said - printk recursion and printk deadlock are different > things. and recursion cases are a subset of deadlock cases.
I see. > > The only thing that might help here is to call > > alt_printk_enter()/exit() in wake_up_process() itself. Otherwise, > > we still would need to keep the printk_deferred() stuff. > > yes. > or > - combine alt_printk and DEFERRED_WARN/etc. The question is if alt_printk brings any win after all, see below. > or > - rewrite printk() to be lock-less by default (for all invocations). We already have it and it is called trace_printk(). But it is very tricky and have some limitations. For example, it does not support random parallel readers. Also printing the trace log is noticeably slow. > > By other words, we might need to put alt_printk_enter()/exit() > > into the scheduler and timekeeping code. In theory it might > > be easier to maintain than the separated printk_deferred() calls. > > But there might be some catches because we need to disable > > the interrupts, ... > > right. and I have some doubts that people will be willing to put > alt_printk_enter/exit into those hot paths. I have the same doubts. > > Sigh, this 2nd scenario is much more likely than the 1st one. > > I guess that warnings in the scheduler/timekeeping code > > will be triggered outside printk() most of the time. > > hm. may be. but the reports we received so far starts from printk() > and end up in printk() - IOW, recursion. My statement might have been too strong. Well, my thinking was the following: I am not aware of any real life bug reports caused by recursion inside locbuf_lock garded section. I guess that it is because the most sensitive one is guarded by that logbuf_cpu check. I am not aware of any generic recursions inside the console code. In fact, it is not easily possible because we console_trylock(). It means that we do not call console if it is already being handled. We are basically down to the recursion/deadlock caused by the wake_up_process() call. And there are much more such calls outside printk(). In fact, I am aware only about one report. It was related to the async printk patchset, the added wake_up_process(), and used RT scheduler. This one started from printk() almost by definition. > > It means that this approach might be much harder to sell > > after all :-( > > well, it solves a number of problems that the existing implementation > cannot handle. Please, provide a summary. I wonder if these are real life problems. Note that we need to put aside all problems that are solvable with printk_deferred(). It seems that printk_deferred() will need to stay because it avoids the deadlock caused by scheduler/timekeeping code locks. By other words, if there is a missing printk_deferred() we need to put it there anyway because the same code might get first called outside printk(). Or do I miss something? Best Regards, Petr