On Mon,  8 Feb 2016 21:35:03 +0100
Denys Vlasenko <[email protected]> wrote:


> This patch is reported to make affected user's machine survive.

Would be nice to have a test case for this. Make a test module to
reproduce the issue?

> 
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: Steven Rostedt <[email protected]>
> CC: Tejun Heo <[email protected]>
> CC: Peter Hurley <[email protected]>
> ---
>  kernel/printk/printk.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c963ba5..ca4f9d55 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2235,6 +2235,7 @@ void console_unlock(void)
>       unsigned long flags;
>       bool wake_klogd = false;
>       bool do_cond_resched, retry;
> +     unsigned cnt;
>  
>       if (console_suspended) {
>               up_console_sem();
> @@ -2257,6 +2258,7 @@ void console_unlock(void)
>       /* flush buffered message fragment immediately to console */
>       console_cont_flush(text, sizeof(text));
>  again:
> +     cnt = 5;
>       for (;;) {
>               struct printk_log *msg;
>               size_t ext_len = 0;
> @@ -2284,6 +2286,9 @@ skip:
>               if (console_seq == log_next_seq)
>                       break;
>  
> +             if (--cnt == 0)
> +                     break;  /* Someone else printk's like crazy */
> +
>               msg = log_from_idx(console_idx);
>               if (msg->flags & LOG_NOCONS) {
>                       /*
> @@ -2350,6 +2355,26 @@ skip:
>       if (retry && console_trylock())
>               goto again;
>  
> +     if (cnt == 0) {
> +             /*
> +              * Other CPU(s) printk like crazy, filling log_buf[].
> +              * Try to get rid of the "honor" of servicing their data:
> +              * give _them_ time to grab console_sem and start working.
> +              */
> +             cnt = 9999;

I'll ignore that this looks very hacky.

> +             while (--cnt != 0) {
> +                     cpu_relax();
> +                     if (console_seq == log_next_seq) {

First, console_seq needs logbuf_lock protection. On some archs, this may
hit 9999 every time as the console_seq is most likely in cache and isn't
updating. Not to mention the race of another task moving log_next_seq
too and this could have been on another CPU changing both console_seq
and log_next_seq.

Perhaps just save off console_seq and see if it changes at all.


> +                             /* Good, other CPU entered "for(;;)" loop */
> +                             goto out;
> +                     }
> +             }
> +             /* No one seems to be willing to take it... */
> +             if (console_trylock())
> +                     goto again; /* we took it */

Perhaps add a few loops to the taking of the console sem. But again,
this just sounds like playing with heuristics, and I hate heuristics.

There's gotta be a better solution.

-- Steve

> +             /* Nope, someone else holds console_sem! Good */
> +     }
> +out:
>       if (wake_klogd)
>               wake_up_klogd();
>  }

Reply via email to