On 10/28/25 10:30, Chao Yu via Linux-f2fs-devel wrote:
On 10/27/25 21:06, Yongpeng Yang wrote:
On 10/27/25 16:35, Chao Yu via Linux-f2fs-devel wrote:
On 10/24/25 22:37, Yongpeng Yang wrote:
From: Yongpeng Yang <[email protected]>

When F2FS uses multiple block devices, each device may have a
different discard granularity. The minimum trim granularity must be
at least the maximum discard granularity of all devices, excluding
zoned devices. Use max_t instead of the max() macro to compute the
maximum value.

Signed-off-by: Yongpeng Yang <[email protected]>
---
   fs/f2fs/f2fs.h | 12 ++++++++++++
   fs/f2fs/file.c | 12 ++++++------
   2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 32fb2e7338b7..064bdbf463f7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4762,6 +4762,18 @@ static inline bool f2fs_hw_support_discard(struct 
f2fs_sb_info *sbi)
       return false;
   }
   +static inline unsigned int f2fs_hw_discard_granularity(struct f2fs_sb_info 
*sbi)
+{
+    int i = 1;
+    unsigned int discard_granularity = 
bdev_discard_granularity(sbi->sb->s_bdev);

Yongpeng,

The patch makes sense to me.

One extra question, if a zoned device contains both conventional zones and
sequential zones, what discard granularity will it exposes?

Thanks,
I don't have such a device. I think the exposed discard granularity should be 
that of the conventional zones, since the sequential zones have a default reset 
granularity of 1 zone, and no additional information is needed to indicate that.

I guess you can have a virtual one simulated by null_blk driver?

https://zonedstorage.io/docs/getting-started/zbd-emulation#zoned-block-device-emulation-with-null_blk
1. When using qemu to emulate a zns ssd, a namespace cannot simultaneously contain both conventional zones and sequential zones. Additionally, for the emulated zoned device, the discard_granularity cannot be configured manually. Its size is defaulted to the maximum value between the logical_block_size and 4KiB.

static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp)
{
...
    if (ns->blkconf.discard_granularity == -1) {
        ns->blkconf.discard_granularity =
            MAX(ns->blkconf.logical_block_size, MIN_DISCARD_GRANULARITY);
    }
...
}

The default value of discard_granularity is set to logical_block_size in nvme driver. static void nvme_config_discard(struct nvme_ns *ns, struct queue_limits *lim)
{
...
        lim->discard_granularity = lim->logical_block_size;
...
}

2. QEMU cannot emulate SMR HDDs. From scsi driver code, I found that the discard_granularity of a scsi device is as follows. The value of sdkp->unmap_granularity is shared across multiple LUNs, meaning that both conventional LUNs and sequential LUNs have the same sdkp->unmap_granularity. As a result, the discard_granularity is also the same for both types of zones. Therefore, from the driver perspective, a zoned device that contains both conventional zones and sequential zones will have the same discard_granularity as other conventional devices.


static void sd_config_discard(struct scsi_disk *sdkp, struct queue_limits *lim,
                unsigned int mode)
{
...
    lim->discard_granularity = max(sdkp->physical_block_size,
                        sdkp->unmap_granularity * logical_block_size);
...
}

static void sd_read_block_limits(struct scsi_disk *sdkp,
                struct queue_limits *lim)
{
...
    sdkp->unmap_granularity = get_unaligned_be32(&vpd->data[28]);
...
}

3. It seems that discard_granularity is related to logical_block_size and physical_block_size, and is not associated with the zone size. For zoned device, discard_granularity is meaningless.


- nullblk_create.sh 512 2 1024 1024
- cat /sys/block/nullb1/queue/discard_*
0
0
0
0

I didn't dig into more details, though. :)

Thanks,


I found that null device didn't config discard_*.
static int null_add_dev(struct nullb_device *dev)
{
...
        struct queue_limits lim = {
                .logical_block_size     = dev->blocksize,
                .physical_block_size    = dev->blocksize,
                .max_hw_sectors         = dev->max_sectors,
        };
...
}>>
Yongpeng>
+
+    if (f2fs_is_multi_device(sbi))
+        for (; i < sbi->s_ndevs && !bdev_is_zoned(FDEV(i).bdev); i++)
+            discard_granularity = max_t(unsigned int, discard_granularity,
+                        bdev_discard_granularity(FDEV(i).bdev));
+    return discard_granularity;
+}
+
   static inline bool f2fs_realtime_discard_enable(struct f2fs_sb_info *sbi)
   {
       return (test_opt(sbi, DISCARD) && f2fs_hw_support_discard(sbi)) ||
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6d42e2d28861..ced0f78532c9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2588,14 +2588,14 @@ static int f2fs_keep_noreuse_range(struct inode *inode,
   static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
   {
       struct inode *inode = file_inode(filp);
-    struct super_block *sb = inode->i_sb;
+    struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
       struct fstrim_range range;
       int ret;
         if (!capable(CAP_SYS_ADMIN))
           return -EPERM;
   -    if (!f2fs_hw_support_discard(F2FS_SB(sb)))
+    if (!f2fs_hw_support_discard(sbi))
           return -EOPNOTSUPP;
         if (copy_from_user(&range, (struct fstrim_range __user *)arg,
@@ -2606,9 +2606,9 @@ static int f2fs_ioc_fitrim(struct file *filp, unsigned 
long arg)
       if (ret)
           return ret;
   -    range.minlen = max((unsigned int)range.minlen,
-               bdev_discard_granularity(sb->s_bdev));
-    ret = f2fs_trim_fs(F2FS_SB(sb), &range);
+    range.minlen = max_t(unsigned int, range.minlen,
+            f2fs_hw_discard_granularity(sbi));
+    ret = f2fs_trim_fs(sbi, &range);
       mnt_drop_write_file(filp);
       if (ret < 0)
           return ret;
@@ -2616,7 +2616,7 @@ static int f2fs_ioc_fitrim(struct file *filp, unsigned 
long arg)
       if (copy_to_user((struct fstrim_range __user *)arg, &range,
                   sizeof(range)))
           return -EFAULT;
-    f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+    f2fs_update_time(sbi, REQ_TIME);
       return 0;
   }



_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel



_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to