Hi Jens,

The bug still reproduces with this change. How confident are we that kernel 
objects are properly reference counted while they are throttled?

Dave


> On Jan 23, 2018, at 10:34, Jens Axboe <[email protected]> wrote:
> 
> On 1/23/18 6:48 AM, David Zarzycki wrote:
>> 
>> 
>>> On Jan 22, 2018, at 20:20, Jens Axboe <[email protected]> wrote:
>>> 
>>> All of these are off the blk-wbt completion path. I suggested earlier to
>>> try and disable CONFIG_BLK_WBT to see if it goes away, or at least to
>>> see if the pattern changes.
>> 
>> Hi Jens,
>> 
>> Bingo! Disabling CONFIG_BLK_WBT makes the problem go away.
> 
> Interesting. The only thing I can think of is
> block/blk-wbt.c:get_rq_wait() returning a bogus pointer, but your
> compiler would need to be broken for that. And I think your lockdep
> would have exploded if that was the case. See below for a quick'n dirty
> you can try and run to disprove that theory.
> 
>>>> I’m open to trying anything at this point. Thanks for helping,
>>> 
>>> I'd try other types of stress testing. Has the machine otherwise been
>>> stable, or is it a new box?
>> 
>> It is a new box. Other than the CONFIG_BLK_WBT problem, it handles
>> stress just fine. If you want to debug this further, I’m willing to
>> run instrumented code.
> 
> The below is a long shot, but I'll try and think about it some more. I
> haven't had any reports like this, ever, so it's very puzzling.
> 
> 
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index ae8de9780085..5a45e9245d89 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -103,7 +103,7 @@ static bool wb_recent_wait(struct rq_wb *rwb)
> 
> static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd)
> {
> -     return &rwb->rq_wait[is_kswapd];
> +     return &rwb->rq_wait[!!is_kswapd];
> }
> 
> static void rwb_wake_all(struct rq_wb *rwb)
> 
> -- 
> Jens Axboe
> 

Reply via email to