On Tue, 19 Dec 2017 09:52:48 +0900
Sergey Senozhatsky <sergey.senozhatsky.w...@gmail.com> wrote:

> > The case here, you are talking about a CPU doing console_lock() from a
> > non printk() case. Which is what I was asking about how often this
> > happens.  
> 
> I'd say often enough. but the point I was trying to make is that we can
> have non-atomic CPUs which can do the print out, instead of "sharing the
> load" between atomic CPUs.

We don't even know if sharing between "atomic" and "non-atomic" is an
issue. Anything that does a printk() in an atomic location, is going to
have latency to begin with.

> 
> > As for why there's no handoff. Does the non printk()
> > console_lock/unlock ever happen from a critical location? I don't think
> > it does (but I haven't checked). Then it is the perfect candidate to do
> > all the printing.  
> 
> that's right. that is the point I was trying making. we can have better
> candidates to do all the printing.

Sure, but we don't even know if we have to. A problem scenario hasn't
come up that wasn't due to the current implementation (which my patch
changes).


> I did tests yesterday, traces are available. I can't conclude that
> the patch fixes the unfairness of printk().

It doesn't fix the "unfairness" it fixes the unboundedness of printk.
That is what has been triggering all the issues from before.



> consider the following case
> 
> we have console_lock() from non-atomic context. console_sem owner is
> getting preempted, under console_sem. which is totally possible and
> happens a lot. in the mean time we have OOM, which can print a lot of
> info. by the time console_sem returns back to TASK_RUNNING logbuf
> contains some number of pending messages [lets say 10 seconds worth
> of printing]. console owner goes to console_unlock(). accidentally
> we have printk from IRQ on CPUz. console_owner hands over printing
> duty to CPUz. so now we have to print 10 seconds worth of OOM messages
> from irq.

Yes that can happen. But printk's from irq context is not nice to have
either, and should only happen when things are going wrong to begin
with.

> 
> 
> 
> CPU0                        CPU1 ~ CPUx                     CPUz
> 
> console_lock
> 
>  << preempted >>
> 
> 
>    OOM                    OOM printouts, lots
>                           of OOM traces, etc.
> 
>                           OOM end [progress done].
> 
>  << back to RUNNING >>
> 
>   console_unlock()
>     
>     for (;;)
>       sets console_owner
>       call_console_drivers()                            IRQ
>                                                          printk
>                                                         sees console_owner
>                                                         sets console_waiter
> 
>       clears console_owner
>       sees console_waier
>       handoff
>                                                           for (;;) {
>                                                            
> call_console_drivers()
>                                                            ??? lockup
>                                                         }
>                                                         up()
> 
> this is how we down() from non-atomic and up() from atomic [if we make
> it to up(). we might end up in NMI panic]. this scenario is totally possible,

The printk buffer needs to be very big, and bad things have to happen
first. This is a theoretical scenario, and I'd like to see it happen in
the real world before we try to fix it. My patch should make printk
behave *MUCH BETTER* than it currently does.

If you are worried about NMI panics, then we could add a touch nmi
within the printk loop.


> isn't it? the optimistic expectation here is that some other printk() from
> non-atomic CPU will jump in and take over printing from atomic CPUz. but I
> don't see why we are counting on it.

I don't see why we even care. Placing a printk in an atomic context is
a problem to begin with, and should only happen if there's issues in
the system.

-- Steve

Reply via email to