Hi Sergey, first, thanks a lot for the detailed description. I have finally understood what was important on the "non-important" messages and how you used them. I am sorry that I was not able to get it earlier.
On Tue 2017-01-10 17:49:39, Sergey Senozhatsky wrote: > On (01/09/17 17:56), Petr Mladek wrote: > > It is possible that your fix is fine. If we lose messages, > > we are screwed anyway. But I still have problems to accept > > that we would start printing less important messages (that would > > normally be ignored) in situation when we have troubles > > to print the more important ones. This logic rings warning > > bells in my head and this is why I suggest more conservative > > solution and ask the many questions. > > once the system is in "oh, let me drop some of the messages for you" > mood, loglevel filtering is unreliable and in some cases unneeded. > it's so unreliable that I'm even considering disabling it in *in-house* > builds when console_unlock() detects that there was no room for all > 'yet to be seen' messages. > > those are another messages, with 'visible' loglevel or with 'suppressed' > loglevel or both 'visible' and 'suppressed' loglevels, that caused the > logbuf overflow. > > now, if the loss of messages was caused by: > > a) flood of suppressed loglevel messages > then printing at least some of those messages makes *a lot* of sense. > > b) flood of visible loglevel messages > then may be those messages are not so important. there a whole logbuf of > them. per my experience, it is quite hard to overflow the logbuf with > really important, unique, sensible messages of 'visible' loglevel with > active loglevel filtering. Just for record, I guess that the same is true also for the messages with lower level. I mean that they are repeating as well. It would be great to make it easier to throttle the same messages or do it a generic way. But this a food for the future work. > once the system is out of logbuf space it is impossible to clearly > distinguish between 'important' and 'not so important' messages. all > we know in console_unlock(), when we pick up next_idx message, is that > there is an abnormal/unusual/weird/unexpected/sick/whatever amount of > messages - 'suppressed' or 'visible' or both. and that's the problem. It is true that lost messages is a "serious" problem because you might miss message about a "really" serious problem. The normally important messages are less useful because they are incomplete. It makes sense to debug what causes the flood. The key is to ignore loglevel and print what is being stored. Your patch makes perfect sense from this point of view. Please, mention such an explanation in the next iteration of the patch. Ah, you will kill me. I still have one thing. The levels are defined like this: #define KERN_EMERG KERN_SOH "0" /* system is unusable */ #define KERN_ALERT KERN_SOH "1" /* action must be taken immediately */ #define KERN_CRIT KERN_SOH "2" /* critical conditions */ #define KERN_ERR KERN_SOH "3" /* error conditions */ The flood of messages usually means something pretty wrong. But it might also be caused by too many or forgotten debug messages. It think that lost messages belong to the level "2". Note that the warning about lost NMI messages and recent printk recursion were printed with loglevel '2' as well. Would it make sense and be acceptable to ignore the log level only when console_level allows to show KERN_CRIT messages? Best Regards, Petr

