On Fri, 15 Dec 2017 15:52:05 +0900 Sergey Senozhatsky <sergey.senozhatsky.w...@gmail.com> wrote:
> On (12/15/17 14:06), Sergey Senozhatsky wrote: > [..] > > > Where do we do the above? And has this been proven to be an issue? > > > > um... hundreds of cases. > > > > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus) > > happening at the same moment + NMI backtraces from all the CPUs (more > > than 3 cpus) that follows the lockups, over not-so-fast serial console. > > exactly the bug report I received two days ago. so which one of the CPUs > > here is a good candidate to successfully emit all of the pending logbuf > > entries? none. all of them either have local IRQs disabled, or dump_stack() > > from either backtrace IPI or backtrace NMI (depending on the > > configuration). > > > and, Steven, one more thing. wondering what's your opinion. > > > suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing > printk-s and several atomic CPUs doing printk-s. Is proposed hand off > scheme really useful in this case? CPUs will now > > a) print their lines (a potentially slow call_console_drivers()) My question is, is this an actual problem. Because we haven't fully evaluated if my patch is enough to make this a non issue. > > and > > b) spin in vprintk_emit on console_owner with local IRQs disabled > waiting for either non-atomic printk CPU or another atomic CPU > to finish printing its line (call_console_drivers()) and to hand > off printing. so current CPU, after busy-waiting for foreign CPU's > call_console_drivers(), will go and do his own call_console_drivers(). > which, time-wise, simply doubles (roughly) the amount of time that > CPU spends in printk()->console_unlock(). agreed? Worse case is doubling, if the two printks happen at the same time. Today, it is unbounded, which means my patch is much better than what we do today, and hence we still don't know if it is "good enough". This is where I say you are trying to solve another problem than what we are encountering today. Because we don't know if the problem is due to the unbounded nature of printk, or just a slight longer (but bounded) time. > > if we previously could have a case when non-atomic printk CPU would > grab the console_sem and print all atomic printk CPUs messages first, > and then its own messages, thus atomic printk CPUs would have just > log_store(), now we will have CPUs to call_console_driver() and to > spin on console_sem owner waiting for call_console_driver() on a foreign > CPU [not all of them: it's one CPU doing the print out and one CPU > spinning console_owner. but overall I think all CPUs will experience > that spin on console_sem waiting for call_console_driver() and then do > its own call_console_driver()]. > > > even two CPUs case is not so simple anymore. see below. > > - first, assume one CPU is atomic and one is non-atomic. > - second, assume that both CPUs are atomic CPUs, and go thought it again. > > > CPU0 CPU1 > > printk() printk() > log_store() > log_store() > console_unlock() > set console_owner > sees console_owner > sets console_waiter > spin > call_console_drivers() > sees console_waiter > break > > printk() > log_store() > console_unlock() > set console_owner > sees console_owner > sets console_waiter > spin > call_console_drivers() > sees console_waiter > break > > printk() > log_store() > console_unlock() > set console_owner > sees console_owner > sets console_waiter > spin > call_console_drivers() > sees console_waiter > break > > printk() > log_store() > console_unlock() > set console_owner > sees console_owner > sets console_waiter > spin > > .... > > > that "wait for call_console_drivers() on another CPU and then do > its own call_console_drivers()" pattern does look dangerous. the > benefit of hand-off is really fragile sometimes, isn't it? Actually, I see your scenario as a perfect example of my patch working well. You have two CPUs spamming the console with printks, and instead of one CPU stuck doing nothing but outputting both CPUs work, the two share the load. Again, I'm not convinced that all the issues you have encountered so far can not be solved with my patch. I would like to see a real example of where it fails. Lets not make this a theoretical approach, lets do it incrementally and solve one problem at a time. I know for a fact that the one CPU outputting all other CPUs printks is a problem. That has been shown time and time again (and I believe is the cause for the problems you stated). The cause of the above issue is the unbounded nature of printk. The fact that there's no guarantee that printk will ever finish. My patch makes it have to finish and makes it bounded. In RT, that's what we always work on, and that usually is good enough. Priority inheritance can take a long time too, but it causes everything to have a guaranteed max latency. My patch makes printk guaranteed to stop, and I want to see if this is good enough for all current issues we've come across before we go into any more elaborate algorithms. -- Steve