On Tue, Jan 22, 2019 at 5:13 AM Florian Stecker <[email protected]> wrote:
>
> Hi everyone,
>
> on my laptop, I am experiencing occasional hangs of applications during
> fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS
> which spans two partitions on the same SSD (one of them used to contain
> a Windows, but I removed it and added the partition to the BTRFS volume
> instead). Also, the problem only occurs when an I/O scheduler
> (mq-deadline) is in use. I'm running kernel version 4.20.3.
>
> From what I understand so far, what happens is that a sync request
> fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a
> "Non-NCQ command" and can not be queued together with other commands.
> This propagates up into blk_mq_dispatch_rq_list(), where the call
>
> ret = q->mq_ops->queue_rq(hctx, &bd);
>
> returns BLK_STS_DEV_RESOURCE. Later in blk_mq_dispatch_rq_list(), there
> is the piece of code
>
> needs_restart = blk_mq_sched_needs_restart(hctx);
> if (!needs_restart ||
> (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
> blk_mq_run_hw_queue(hctx, true);
> else if (needs_restart && (ret == BLK_STS_RESOURCE))
> blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY);
>
> which restarts the queue after a delay if BLK_STS_RESOURCE was returned,
> but somehow not for BLK_STS_DEV_RESOURCE. Instead, nothing happens and
> fsync() seems to hang until some other process wants to do I/O.
>
> So if I do
>
> - else if (needs_restart && (ret == BLK_STS_RESOURCE))
> + else if (needs_restart && (ret == BLK_STS_RESOURCE || ret ==
> BLK_STS_DEV_RESOURCE))
>
> it fixes my problem. But was there a reason why BLK_STS_DEV_RESOURCE was
> treated differently that BLK_STS_RESOURCE here?
Please see the comment:
/*
* BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
* device related resources are unavailable, but the driver can guarantee
* that the queue will be rerun in the future once resources become
* available again. This is typically the case for device specific
* resources that are consumed for IO. If the driver fails allocating these
* resources, we know that inflight (or pending) IO will free these
* resource upon completion.
*
* This is different from BLK_STS_RESOURCE in that it explicitly references
* a device specific resource. For resources of wider scope, allocation
* failure can happen without having pending IO. This means that we can't
* rely on request completions freeing these resources, as IO may not be in
* flight. Examples of that are kernel memory allocations, DMA mappings, or
* any other system wide resources.
*/
#define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13)
>
> In any case, it seems wrong to me that ret is used here at all, as it
> just contains the return value of the last request in the list, and
> whether we rerun the queue should probably not only depend on the last
> request?
>
> Can anyone of the experts tell me whether this makes sense or I got
> something completely wrong?
Sounds a bug in SCSI or ata driver.
I remember there is hole in SCSI wrt. returning BLK_STS_DEV_RESOURCE,
but I never get lucky to reproduce it.
scsi_queue_rq():
......
case BLK_STS_RESOURCE:
if (atomic_read(&sdev->device_busy) ||
scsi_device_blocked(sdev))
ret = BLK_STS_DEV_RESOURCE;
All in-flight request may complete between reading 'sdev->device_busy'
and setting ret as 'BLK_STS_DEV_RESOURCE', then this IO hang may
be triggered.
Thanks,
Ming Lei