On 05/09/2014 06:12 AM, Shaohua Li wrote:
> On Thu, May 08, 2014 at 09:27:42PM -0600, Jens Axboe wrote:
>> On 2014-05-08 21:22, Sasha Levin wrote:
>>> On 05/07/2014 11:55 AM, Jens Axboe wrote:
>>>> On 05/07/2014 09:53 AM, Sasha Levin wrote:
>>>>> On 05/07/2014 11:45 AM, Jens Axboe wrote:
>>>>>> On 05/07/2014 09:37 AM, Sasha Levin wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> While fuzzing with trinity inside a KVM tools guest running the latest 
>>>>>>> -next
>>>>>>> kernel I've stumbled on the following spew:
>>>>>>>
>>>>>>> [  986.962569] WARNING: CPU: 41 PID: 41607 at block/blk-mq.c:585 
>>>>>>> __blk_mq_run_hw_queue+0x90/0x500()
>>>>>>
>>>>>> I'm going to need more info than this. What were you running? How as kvm
>>>>>> invoked (nr cpus)?
>>>>>
>>>>> Sure!
>>>>>
>>>>> It's running in a KVM tools guest (not qemu), with the following options:
>>>>>
>>>>> '--rng --balloon -m 28000 -c 48 -p "numa=fake=32 init=/virt/init zcache 
>>>>> ftrace_dump_on_oops debugpat kvm.mmu_audit=1 slub_debug=FZPU 
>>>>> rcutorture.rcutorture_runnable=0 loop.max_loop=64 zram.num_devices=4 
>>>>> rcutorture.nreaders=8 oops=panic nr_hugepages=1000 numa_balancing=enable'.
>>>>>
>>>>> So basically 48 vcpus (the host has 128 physical ones), and ~28G of RAM.
>>>>>
>>>>> I've been running trinity as a fuzzer, which doesn't handle logging too 
>>>>> well,
>>>>> so I can't reproduce it's actions easily.
>>>>>
>>>>> There was an additional stress of hotplugging CPUs and memory during this 
>>>>> recent
>>>>> fuzzing run, so it's fair to suspect that this happened as a result of 
>>>>> that.
>>>>
>>>> Aha!
>>>>
>>>>> Anything else that might be helpful?
>>>>
>>>> No, not too surprising given the info that cpu hotplug was being
>>>> stressed at the same time. blk-mq doesn't quiesce when this happens, so
>>>> it's very unlikely that there are races between updating the cpu masks
>>>> and flushing out the previously queued work.
>>>
>>> So this warning is something you'd expect when CPUs go up/down?
>>
>> Let me put it this way - I'm not surprised that it triggered, but it
>> will of course be fixed up.
> 
> Does reverting 1eaade629f5c47 change anything?
> 
> The ctx->online isn't changed immediately when cpu is offline, likely there 
> are
> something wrong. I'm wondering why we need that patch?

We don't strictly need it. That commit isn't in what Sasha tested, however.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to