On Tue, Oct 18, 2016 at 5:10 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> Adding Andy to the cc, because this *might* be triggered by the
> vmalloc stack code itself. Maybe the re-use of stacks showing some
> problem? Maybe Chris (who can't see the problem) doesn't have

I bet it's the plug itself that is the stack address. In fact, it's
probably that mq_list head pointer

I think every single users of block plugging uses the pattern

        struct blk_plug plug;


and then we'll have


which initializes that mq_list head with the stack addresses pointing to itself.

So when we see something like this:

  list_add corruption. prev->next should be next (ffffe8ffff806648),
but was ffffc9000067fcd8. (prev=ffff880503878b80)

and it comes from

    list_add_tail(&rq->queuelist, &plug->mq_list);

which will expand to

    __list_add(new, head->prev, head)

which in this case *should* be:

    __list_add(&rq->queuelist, plug->mq_list.prev, &plug->mq_list);

so in fact we *should* have "next" be a stack address.

So that debug message is really really odd. I would expect that "next"
is the stack address (because we're adding to the tail of the list, so
"next" is the list head itself), but the debug message corruption
printout says that "was" is the stack address, but next isn't.

Weird.The "but was" value actually looks like the right address should
look, but the actual address (which *should* be just "&plug->mq_list"
and really should be on the stack) looks bogus.

I'm now very confused.

