On Wed, Sep 13, 2017 at 11:50 AM, Ilya Dryomov <[email protected]> wrote:
> On Wed, Sep 6, 2017 at 7:38 PM, Ilya Dryomov <[email protected]> wrote:
>> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
>> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
>> is set. This means blkdev_issue_zeroout() must cope with WRITE SAME
>> failing with BLK_STS_TARGET/-EREMOTEIO and explicitly write zeroes,
>> unless BLKDEV_ZERO_NOFALLBACK is specified.
>>
>> Commit 71027e97d796 ("block: stop using discards for zeroing") added
>> the following to __blkdev_issue_zeroout() comment:
>>
>> "Note that this function may fail with -EOPNOTSUPP if the driver signals
>> zeroing offload support, but the device fails to process the command (for
>> some devices there is no non-destructive way to verify whether this
>> operation is actually supported). In this case the caller should call
>> retry the call to blkdev_issue_zeroout() and the fallback path will be
>> used."
>>
>> But __blkdev_issue_zeroout() doesn't fail in this case: if WRITE SAME
>> support is indicated, a REQ_OP_WRITE_ZEROES bio is built and returned
>> to blkdev_issue_zeroout(). -EREMOTEIO is then propagated up:
>>
>> $ fallocate -zn -l 1k /dev/sdg
>> fallocate: fallocate failed: Remote I/O error
>> $ fallocate -zn -l 1k /dev/sdg # OK
>>
>> (The second fallocate(1) succeeds because sd_done() sets ->no_write_same
>> in response to a sense that would become BLK_STS_TARGET.)
>>
>> Retry __blkdev_issue_zeroout() if the I/O fails with -EREMOTEIO. This
>> is roughly what we did until 4.12, sans BLKDEV_ZERO_NOFALLBACK knob.
>>
>> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
>> Cc: Christoph Hellwig <[email protected]>
>> Cc: "Martin K. Petersen" <[email protected]>
>> Cc: Hannes Reinecke <[email protected]>
>> Cc: [email protected] # 4.12+
>> Signed-off-by: Ilya Dryomov <[email protected]>
>> ---
>> block/blk-lib.c | 49 +++++++++++++++++++++++++++++++++----------------
>> 1 file changed, 33 insertions(+), 16 deletions(-)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 3fe0aec90597..876b5478c1a2 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -287,12 +287,6 @@ static unsigned int
>> __blkdev_sectors_to_bio_pages(sector_t nr_sects)
>> * Zero-fill a block range, either using hardware offload or by explicitly
>> * writing zeroes to the device.
>> *
>> - * Note that this function may fail with -EOPNOTSUPP if the driver signals
>> - * zeroing offload support, but the device fails to process the command
>> (for
>> - * some devices there is no non-destructive way to verify whether this
>> - * operation is actually supported). In this case the caller should call
>> - * retry the call to blkdev_issue_zeroout() and the fallback path will be
>> used.
>> - *
>> * If a device is using logical block provisioning, the underlying space
>> will
>> * not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>> *
>> @@ -343,6 +337,34 @@ int __blkdev_issue_zeroout(struct block_device *bdev,
>> sector_t sector,
>> }
>> EXPORT_SYMBOL(__blkdev_issue_zeroout);
>>
>> +/*
>> + * This function may fail with -EREMOTEIO if the driver signals zeroing
>> + * offload support, but the device fails to process the command (for
>> + * some devices there is no non-destructive way to verify whether this
>> + * operation is actually supported). In this case the caller should
>> + * retry and the fallback path in __blkdev_issue_zeroout() will be used
>> + * unless %flags contains BLKDEV_ZERO_NOFALLBACK.
>> + */
>> +static int __blkdev_issue_zeroout_wait(struct block_device *bdev,
>> + sector_t sector, sector_t nr_sects,
>> + gfp_t gfp_mask, unsigned flags)
>> +{
>> + int ret;
>> + struct bio *bio = NULL;
>> + struct blk_plug plug;
>> +
>> + blk_start_plug(&plug);
>> + ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
>> + &bio, flags);
>> + if (ret == 0 && bio) {
>> + ret = submit_bio_wait(bio);
>> + bio_put(bio);
>> + }
>> + blk_finish_plug(&plug);
>> +
>> + return ret;
>> +}
>> +
>> /**
>> * blkdev_issue_zeroout - zero-fill a block range
>> * @bdev: blockdev to write
>> @@ -360,17 +382,12 @@ int blkdev_issue_zeroout(struct block_device *bdev,
>> sector_t sector,
>> sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
>> {
>> int ret;
>> - struct bio *bio = NULL;
>> - struct blk_plug plug;
>>
>> - blk_start_plug(&plug);
>> - ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
>> - &bio, flags);
>> - if (ret == 0 && bio) {
>> - ret = submit_bio_wait(bio);
>> - bio_put(bio);
>> - }
>> - blk_finish_plug(&plug);
>> + ret = __blkdev_issue_zeroout_wait(bdev, sector, nr_sects,
>> + gfp_mask, flags);
>> + if (ret == -EREMOTEIO)
>> + ret = __blkdev_issue_zeroout_wait(bdev, sector, nr_sects,
>> + gfp_mask, flags);
>>
>> return ret;
>> }
>
> Ping... This fixes a regression introduced in 4.12. Christoph,
> Martin, could you please take a look?
Christoph, the regression got introduced with your "always use
REQ_OP_WRITE_ZEROES for zeroing offload" series, could you please look
at this?
Thanks,
Ilya