On 3/7/19 2:37 PM, Josef Bacik wrote:
> We discovered a problem in newer kernels where a disconnect of a NBD
> device while the flush request was pending would result in a hang.  This
> is because the blk mq timeout handler does
> 
>         if (!refcount_inc_not_zero(&rq->ref))
>                 return true;
> 
> to determine if it's ok to run the timeout handler for the request.
> Flush_rq's don't have a ref count set, so we'd skip running the timeout
> handler for this request and it would just sit there in limbo forever.
> 
> Fix this by always setting the refcount of any request going through
> blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
> requests to verify that it hung, and then tested with this patch to
> verify I got the timeout as expected and the error handling kicked in.
> Thanks,

Looks good to me, thanks Josef.

-- 
Jens Axboe

Reply via email to