On (01/29/16 15:37), Sergey Senozhatsky wrote: > > panic()->console_panic_mode()->{for_each_console()->reset(), > zap_locks()}->console_trelock()->console_unlock().
Hello, This is not a final submission, just a RFC, so we can settle a better plan. the patches are not signed off, have known problems (and likely some unknown). I put a summary in here and send them out as a reply to this email, so it'll be easier to review/comment/discuss. patch 0001 *************** CPU stop IPI issued from panic() on CPUA, can leave console_sem locked on CPUB if that cpu was holding the console_sem lock at the time when IPI arrived. console_flush_on_panic() is trying to workaround it by ignoring the return status of console_trylock() and unconditionally executing console_unlock(). console_unlock() has a dependency on at least one more lock - `logbuf_lock', which can be corrupted, for example, thus console_unlock() may not be able to print anything afterall. Introduce console_reset_on_panic() function to zap (re-init) printk locks and call this function from panic(). WARNING ======= This must be improved. console_reset_on_panic() is called before smp_send_stop(), so: a) we can have several CPU looping in console_unlock(), which is not so critical. b) we can re-init logbuf_lock while other CPU is holding it. Which is more serious and needs to fixed. The reason why console_reset_on_panic() is called this early is that panicing CPU does pr_emerg("Kernel panic...") and dump_stack() before it sends out smp_send_stop(). So if console_sem or logbug_lock, or some console device driver lock is/are corrupted then panic() may never smp_send_stop(). patch 0002 *************** Console driver(-s) can be in any state when CPU stop IPI arrives from panic() issued on another CPU, so console_flush_on_panic()->console_unlock() can call con->write() callback on a locked console driver. Introduce reset_console_drivers() that attempts to reset() every console in via a console driver specific ->reset() call. Invoke reset_console_drivers() from console_reset_on_panic(). WARNING ======= console_reset_on_panic() needs to be fixed. patch 0003 -- detect recursive spin_dump() and panic() the system *************** spin_dump() calls printk() which can attempt to reacquire the 'buggy' lock (one of printk's lock, or console device driver lock, etc.) and thus spin_dump() will recursive into itself. Steal most significant bit of spin_lock->owner_cpu to keep there a mark that spin_dump() is in progress for that particular spin_lock. spin_dump() will now set SPIN_DUMP_IN_PROGRESS bit at the beginning of spin_dump() and clear it at the end, so it's possible to detect recursive spin_dump() calls by checking if lock's owner_cpu already has SPIN_DUMP_IN_PROGRESS bit already set. panic() the system when spin_dump() recursion occurs. -ss