On 1/23/18 6:48 AM, David Zarzycki wrote:
>
>
>> On Jan 22, 2018, at 20:20, Jens Axboe <[email protected]> wrote:
>>
>> All of these are off the blk-wbt completion path. I suggested earlier to
>> try and disable CONFIG_BLK_WBT to see if it goes away, or at least to
>> see if the pattern changes.
>
> Hi Jens,
>
> Bingo! Disabling CONFIG_BLK_WBT makes the problem go away.
Interesting. The only thing I can think of is
block/blk-wbt.c:get_rq_wait() returning a bogus pointer, but your
compiler would need to be broken for that. And I think your lockdep
would have exploded if that was the case. See below for a quick'n dirty
you can try and run to disprove that theory.
>>> I’m open to trying anything at this point. Thanks for helping,
>>
>> I'd try other types of stress testing. Has the machine otherwise been
>> stable, or is it a new box?
>
> It is a new box. Other than the CONFIG_BLK_WBT problem, it handles
> stress just fine. If you want to debug this further, I’m willing to
> run instrumented code.
The below is a long shot, but I'll try and think about it some more. I
haven't had any reports like this, ever, so it's very puzzling.
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index ae8de9780085..5a45e9245d89 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -103,7 +103,7 @@ static bool wb_recent_wait(struct rq_wb *rwb)
static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd)
{
- return &rwb->rq_wait[is_kswapd];
+ return &rwb->rq_wait[!!is_kswapd];
}
static void rwb_wake_all(struct rq_wb *rwb)
--
Jens Axboe