On Fri, Nov 07, 2014 at 12:08:14AM -0500, Martin K. Petersen wrote:
> blkdev_issue_discard() will zero a given block range on disk. This is
> done by way of either WRITE SAME or regular WRITE. I.e. the blocks on
> disk will be written and thus provisioned.
>
> There are use cases where the desired behavior is to zero the blocks but
> unprovision them if possible. The blocks must deterministically contain
> zeroes when they are subsequently read back.
>
> This patch introduces a blkdev_issue_zeroout_discard() call that
> provides this functionality. If a block device guarantees
> discard_zeroes_data the new function will use discard to clear the block
> range. If the device does not support discard_zeroes_data or if the
> discard request fails we will fall back to blkdev_issue_zeroout() to
> ensure predictable results.
Can this be plumbed into a BLK* ioctl too? I'll write a patch, if this is ok
with everyone:
struct blkzeroout_t {
__u64 start;
__u64 end;
__u32 flags;
};
#define BLKZEROOUT_DISCARD_OK 1
#define BLKZEROOUT_V2 _IOR(0x12, 127, sizeof(struct blkzeroout_t))
...and make it zap the page cache per earlier discussion. This seems to be a
good fit with what we've been discussing for mke2fs.
--D
>
> Signed-off-by: Martin K. Petersen <[email protected]>
> ---
> block/blk-lib.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
> include/linux/blkdev.h | 2 ++
> 2 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 8411be3c19d3..2ffec6a01c71 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -278,14 +278,18 @@ static int __blkdev_issue_zeroout(struct block_device
> *bdev, sector_t sector,
> }
>
> /**
> - * blkdev_issue_zeroout - zero-fill a block range
> + * blkdev_issue_zeroout - zero-fill and provision a block range
> * @bdev: blockdev to write
> * @sector: start sector
> * @nr_sects: number of sectors to write
> * @gfp_mask: memory allocation flags (for bio_alloc)
> *
> * Description:
> - * Generate and issue number of bios with zerofiled pages.
> + * Zero-fill a block range. The blocks will be provisioned
> + * (allocated/anchored) and are guaranteed to return zeroes when read
> + * back. This function will attempt to use WRITE SAME to optimize the
> + * process if the block device supports it. Otherwise it will fall back
> + * to zeroing the blocks using regular WRITE calls.
> */
>
> int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> @@ -305,3 +309,39 @@ int blkdev_issue_zeroout(struct block_device *bdev,
> sector_t sector,
> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> }
> EXPORT_SYMBOL(blkdev_issue_zeroout);
> +
> +/**
> + * blkdev_issue_zeroout_discard - zero-fill and attempt to discard block
> range
> + * @bdev: blockdev to write
> + * @sector: start sector
> + * @nr_sects: number of sectors to write
> + * @gfp_mask: memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + * Zero-fill a block range. In contrast to blkdev_issue_zeroout() this
> + * function will attempt to deprovision (deallocate/discard) the blocks
> + * in question. It will only do so if the underlying device guarantees
> + * that subsequent READ operations to the block range in question will
> + * return zeroes. If the device does not provide hard guarantees or if
> + * the DISCARD attempt should fail the block range will be explicitly
> + * zeroed using blkdev_issue_zeroout().
> + */
> +
> +int blkdev_issue_zeroout_discard(struct block_device *bdev, sector_t sector,
> + sector_t nr_sects, gfp_t gfp_mask)
> +{
> + struct request_queue *q = bdev_get_queue(bdev);
> +
> + if (blk_queue_discard(q) && q->limits.discard_zeroes_data) {
> + unsigned char bdn[BDEVNAME_SIZE];
> +
> + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0))
> + return 0;
> +
> + bdevname(bdev, bdn);
> + pr_err("%s: DISCARD failed. Manually zeroing.\n", bdn);
> + }
> +
> + return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> +}
> +EXPORT_SYMBOL(blkdev_issue_zeroout_discard);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index aac0f9ea952a..078b6e5f488a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1164,6 +1164,8 @@ extern int blkdev_issue_write_same(struct block_device
> *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask, struct page *page);
> extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask);
> +extern int blkdev_issue_zeroout_discard(struct block_device *bdev,
> + sector_t sector, sector_t nr_sects, gfp_t gfp_mask);
> static inline int sb_issue_discard(struct super_block *sb, sector_t block,
> sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
> {
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html