On Wed 2015-06-10 21:23:04, Peter Zijlstra wrote:
> Below a version which does x-cpu stuff to allow the
> trigger_all*_cpu_backtrace() initiator to flush buffers on behalf of
> other CPUs.
>
> Compile tested only.

The output from "echo l >/proc/sysrq-trigger" looks reasonable.
It does not mix output from different CPUs. This is expected
because of the @lock.

The other observation is that it prints CPUs in _random_ order:
28, 24, 25, 1, 26, 2, 27, 3, ...

The order is fine when I disable the irq_work.

It means that irq_works are usually faster than printk_nmi_flush() =>
printk_nmi_flush() is not that useful => all the complexity with
the three atomic variables (head, tail, read) did not bring
much win.

Anyway, I think that the current solution is racy and it cannot be fixed
easily, see below.


> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c099b082cd02..99bfc1e3f32a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1821,13 +1821,200 @@ int vprintk_default(const char *fmt, va_list args)
> +static void __printk_nmi_flush(struct irq_work *work)
> +{
> +     static raw_spinlock_t lock = __RAW_SPIN_LOCK_INITIALIZER(lock);
> +     struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> +     int len, head, size, i, last_i;
> +
> +again:
> +     /*
> +      * vprintk_nmi()        truncate
> +      *
> +      * [S] head             [S] head
> +      *     wmb              mb
> +      * [S] tail             [S] read

BTW, this is quite cryptic for me. Coffee did not help :-)

         *
> +      * therefore:
> +      */
> +     i = atomic_read(&s->read);
> +     len = atomic_read(&s->tail); /* up to the tail is stable */
> +     smp_rmb();
> +     head = atomic_read(&s->head);
> +
> +     /*
> +      * We cannot truncate tail because it could overwrite a store from
> +      * vprintk_nmi(), however vprintk_nmi() will always update tail to the
> +      * correct value.
> +      *
> +      * Therefore if head < tail, we missed a truncate and should do so now.
> +      */
> +     if (head < len)
> +             len = 0;

This is a bit confusing. It is a complicated way how to return on the next test.

If I get this correctly. This might happen only inside
_printk_nmi_flush() called on another CPU (from
printk_nmi_flush()) when it interferes with the queued
irq_work. The irq_work is faster and truncates the buffer.

So, the return is fine after all because the irq_work printed
everything.


> +     if (len - i <= 0) /* nothing to do */
> +             return;
> +     /*
> +      * 'Consume' this chunk, avoids concurrent callers printing the same
> +      * stuff.
> +      */
> +     if (atomic_cmpxchg(&s->read, i, len) != i)
> +             goto again;

I think that this is racy:

CPU0                                    CPU7

printk_nmi_flush()

  __printk_nmi_flush(for CPU7)

    i = atomic_read(&s->read);     (100)
    len = atomic_read(&s->tail);   (200)
    head = atomic_read(&s->head);  (200)

    if (atomic_cmpxchg(&s->read, i, len) != i)

    we pass but we get interrupted
    or rescheduled on preemptive kernel

                                        another vprintk_nmi()
                                        leaves: head(400), tail(400)

                                        __printk_nmi_flush() in irq_work

                                        it prints string between 200-400
                                        truncate buffer: head(0), read(0)

                                        another vprintk_nmi()
                                        returns: head(150), tail(150)

    print string between (100-200) =>
    part of the new and part of old message
    and modifies @head and @read a wrong way

I think that such races are hard to avoid without indexing the printed
messages. But it would make the approach too complicated.

I think that ordering CPUs is not worth it. I would go back to the
first solution, add the @lock there, and double check races with
seq_buf().

I stop here with commenting the code for now.

Best Regards,
Petr

PS: I had two cups of coffee and hope that my comments are smaller fiasco
than yesterday.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to