On Thu, Sep 21, 2017 at 07:12:52PM +0200, Ilya Dryomov wrote:
> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
> is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
> SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:
> 
>   $ fallocate -zn -l 1k /dev/sdg
>   fallocate: fallocate failed: Remote I/O error
>   $ fallocate -zn -l 1k /dev/sdg  # OK
>   $ fallocate -zn -l 1k /dev/sdg  # OK
> 
> The following calls succeed because sd_done() sets ->no_write_same in
> response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
> __blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.
> 
> This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
> and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
> specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
> sd_done() has just set ->no_write_same thus indicating lack of offload
> support.
> 
> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
> Cc: Christoph Hellwig <[email protected]>
> Cc: "Martin K. Petersen" <[email protected]>
> Cc: Hannes Reinecke <[email protected]>
> Signed-off-by: Ilya Dryomov <[email protected]>
> ---
>  block/blk-lib.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 6b97feb71065..1cb402beb983 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -316,12 +316,6 @@ static void __blkdev_issue_zero_pages(struct 
> block_device *bdev,
>   *  Zero-fill a block range, either using hardware offload or by explicitly
>   *  writing zeroes to the device.
>   *
> - *  Note that this function may fail with -EOPNOTSUPP if the driver signals
> - *  zeroing offload support, but the device fails to process the command (for
> - *  some devices there is no non-destructive way to verify whether this
> - *  operation is actually supported).  In this case the caller should call
> - *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
> used.
> - *
>   *  If a device is using logical block provisioning, the underlying space 
> will
>   *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>   *
> @@ -374,6 +368,27 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
>                       &bio, flags);
>       if (ret == 0 && bio) {
>               ret = submit_bio_wait(bio);
> +             /*
> +              * Fall back to a manual zeroout on any error, if allowed.
> +              *
> +              * Particularly, WRITE ZEROES may fail with -EREMOTEIO if the
> +              * driver signals zeroing offload support, but the device
> +              * fails to process the command (for some devices there is no
> +              * non-destructive way to verify whether this operation is
> +              * actually supported).
> +              */
> +             if (ret && bio_op(bio) == REQ_OP_WRITE_ZEROES) {

No need for the additional levels of indentation here.  Also I
really do not like the logic, we shouldn't have to duplicate much
of the logic multiple times.

I'd more go for something like (sketched in mail):

        bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);

retry:
        bio = NULL;
        blk_start_plug(&plug);
        if (try_write_zeroes)
                ret = __blkdev_issue_write_zeroes(...)
        else
                ret = __blkdev_issue_zero_pages(...)
        if (ret == 0 && bio) {
                ret = submit_bio_wait(bio);
                bio_put(bio);
        }
        blk_finish_plug(&plug);
        if (ret && try_write_zeroes) {
                try_write_zeroes = false;
                goto retry;
        }

Reply via email to