On 10/28/25 10:30, Chao Yu via Linux-f2fs-devel wrote:
On 10/27/25 21:06, Yongpeng Yang wrote:
On 10/27/25 16:35, Chao Yu via Linux-f2fs-devel wrote:
On 10/24/25 22:37, Yongpeng Yang wrote:
From: Yongpeng Yang <[email protected]>
When F2FS uses multiple block devices, each device may have a
different discard granularity. The minimum trim granularity must be
at least the maximum discard granularity of all devices, excluding
zoned devices. Use max_t instead of the max() macro to compute the
maximum value.
Signed-off-by: Yongpeng Yang <[email protected]>
---
fs/f2fs/f2fs.h | 12 ++++++++++++
fs/f2fs/file.c | 12 ++++++------
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 32fb2e7338b7..064bdbf463f7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4762,6 +4762,18 @@ static inline bool f2fs_hw_support_discard(struct
f2fs_sb_info *sbi)
return false;
}
+static inline unsigned int f2fs_hw_discard_granularity(struct f2fs_sb_info
*sbi)
+{
+ int i = 1;
+ unsigned int discard_granularity =
bdev_discard_granularity(sbi->sb->s_bdev);
Yongpeng,
The patch makes sense to me.
One extra question, if a zoned device contains both conventional zones and
sequential zones, what discard granularity will it exposes?
Thanks,
I don't have such a device. I think the exposed discard granularity should be
that of the conventional zones, since the sequential zones have a default reset
granularity of 1 zone, and no additional information is needed to indicate that.
I guess you can have a virtual one simulated by null_blk driver?
https://zonedstorage.io/docs/getting-started/zbd-emulation#zoned-block-device-emulation-with-null_blk
1. When using qemu to emulate a zns ssd, a namespace cannot
simultaneously contain both conventional zones and sequential zones.
Additionally, for the emulated zoned device, the discard_granularity
cannot be configured manually. Its size is defaulted to the maximum
value between the logical_block_size and 4KiB.
static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp)
{
...
if (ns->blkconf.discard_granularity == -1) {
ns->blkconf.discard_granularity =
MAX(ns->blkconf.logical_block_size, MIN_DISCARD_GRANULARITY);
}
...
}
The default value of discard_granularity is set to logical_block_size in
nvme driver.
static void nvme_config_discard(struct nvme_ns *ns, struct queue_limits
*lim)
{
...
lim->discard_granularity = lim->logical_block_size;
...
}
2. QEMU cannot emulate SMR HDDs. From scsi driver code, I found that the
discard_granularity of a scsi device is as follows. The value of
sdkp->unmap_granularity is shared across multiple LUNs, meaning that
both conventional LUNs and sequential LUNs have the same
sdkp->unmap_granularity. As a result, the discard_granularity is also
the same for both types of zones. Therefore, from the driver
perspective, a zoned device that contains both conventional zones and
sequential zones will have the same discard_granularity as other
conventional devices.
static void sd_config_discard(struct scsi_disk *sdkp, struct
queue_limits *lim,
unsigned int mode)
{
...
lim->discard_granularity = max(sdkp->physical_block_size,
sdkp->unmap_granularity * logical_block_size);
...
}
static void sd_read_block_limits(struct scsi_disk *sdkp,
struct queue_limits *lim)
{
...
sdkp->unmap_granularity = get_unaligned_be32(&vpd->data[28]);
...
}
3. It seems that discard_granularity is related to logical_block_size
and physical_block_size, and is not associated with the zone size. For
zoned device, discard_granularity is meaningless.
- nullblk_create.sh 512 2 1024 1024
- cat /sys/block/nullb1/queue/discard_*
0
0
0
0
I didn't dig into more details, though. :)
Thanks,
I found that null device didn't config discard_*.
static int null_add_dev(struct nullb_device *dev)
{
...
struct queue_limits lim = {
.logical_block_size = dev->blocksize,
.physical_block_size = dev->blocksize,
.max_hw_sectors = dev->max_sectors,
};
...
}>>
Yongpeng>
+
+ if (f2fs_is_multi_device(sbi))
+ for (; i < sbi->s_ndevs && !bdev_is_zoned(FDEV(i).bdev); i++)
+ discard_granularity = max_t(unsigned int, discard_granularity,
+ bdev_discard_granularity(FDEV(i).bdev));
+ return discard_granularity;
+}
+
static inline bool f2fs_realtime_discard_enable(struct f2fs_sb_info *sbi)
{
return (test_opt(sbi, DISCARD) && f2fs_hw_support_discard(sbi)) ||
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6d42e2d28861..ced0f78532c9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2588,14 +2588,14 @@ static int f2fs_keep_noreuse_range(struct inode *inode,
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
- struct super_block *sb = inode->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct fstrim_range range;
int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
- if (!f2fs_hw_support_discard(F2FS_SB(sb)))
+ if (!f2fs_hw_support_discard(sbi))
return -EOPNOTSUPP;
if (copy_from_user(&range, (struct fstrim_range __user *)arg,
@@ -2606,9 +2606,9 @@ static int f2fs_ioc_fitrim(struct file *filp, unsigned
long arg)
if (ret)
return ret;
- range.minlen = max((unsigned int)range.minlen,
- bdev_discard_granularity(sb->s_bdev));
- ret = f2fs_trim_fs(F2FS_SB(sb), &range);
+ range.minlen = max_t(unsigned int, range.minlen,
+ f2fs_hw_discard_granularity(sbi));
+ ret = f2fs_trim_fs(sbi, &range);
mnt_drop_write_file(filp);
if (ret < 0)
return ret;
@@ -2616,7 +2616,7 @@ static int f2fs_ioc_fitrim(struct file *filp, unsigned
long arg)
if (copy_to_user((struct fstrim_range __user *)arg, &range,
sizeof(range)))
return -EFAULT;
- f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+ f2fs_update_time(sbi, REQ_TIME);
return 0;
}
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel