在 9/5/2025 4:07 PM, Chao Yu 写道:
On 9/4/2025 5:35 PM, Wang Xiaojun wrote:
Hi Chao,

We previously thought that "triggering checkpoint for fsync after falloc
-k" could solve this problem.

But I found that the above method can be invalid in the following scenarios.

case 1:
write fileA 2M |  falloc -k 2M 100M | truncate 10M
At this point, the file size is 10MB, while the disk space consumed is
100MB.

case 2:
write fileA 2M |  falloc -k 2M 100M | truncate 1M
At this point, the file size is 1MB, while the disk space consumed is 1MB.

Even if we perform a checkpoint after falloc,

case 1:
write fileA 2M |  falloc -k 2M 100M | checkpoint | truncate 10M | SPO

case 2:
write fileA 2M |  falloc -k 2M 100M | checkpoint | truncate 1M | SPO

But during the recovery process,

we cannot determine whether the 100MB space pre-allocated by falloc
needs to be retained.

Xiaojun, thanks for mentioning this issue.

So we need an on-disk flag to indicate whether there is fallocated blkaddrs
after i_size, right?

For above case:
falloc -k 2M 100M -> set the flag
checkpoint
truncate 1M -> clear the flag
recovery: truncate blkaddrs after i_size as the flag is not set?

Thoughts?

Thanks,

Hi Chao,

I agree with your idea.

I think the core of this issue is that the truncate information is lost during the recovery process.

Currently, only the file_size information is available, but the file_size cannot reflect the fallocated size.

Therefore, I think the solution you provided can work.

Thanks,



Perhaps we need to research a new mechanism to solve this problem.


Thanks,

在 8/28/2025 12:49 PM, 王晓珺 写道:
在 8/28/2025 9:44 AM, Chao Yu 写道:
On 8/26/25 09:48, 王晓珺 wrote:
在 8/25/2025 10:08 AM, Chao Yu 写道:
On 8/20/25 15:54, Wang Xiaojun wrote:
This patch fixes missing space reclamation during the recovery process.

In the following scenarios, F2FS cannot reclaim truncated space.
case 1:
write file A, size is 1G | CP | truncate A to 1M | fsync A | SPO

case 2:
CP | write file A, size is 1G | fsync A | truncate A to 1M | fsync A |SPO

During the recovery process, F2FS will recover file A,
but the 1M-1G space cannot be reclaimed.

But the combination of truncate and falloc complicates the recovery
process.
For example, in the following scenario:
write fileA 2M | fsync | truncate 256K | falloc -k 256K 1M | fsync A | SPO
The falloc (256K, 1M) need to be recovered as pre-allocated space.

However in the following scenarios, the situation is the opposite.
write fileA 2M | fsync | falloc -k 2M 10M | fsync A | truncate 256K |
fsync A | SPO
In this scenario, the space allocated by falloc needs to be truncated.

During the recovery process, it is difficult to distinguish
between the above two types of falloc.

So in this case of falloc -k we need to trigger a checkpoint for fsync.

Fixes: d624c96fb3249 ("f2fs: add recovery routines for roll-forward")

Signed-off-by: Wang Xiaojun <wangxiao...@vivo.com>
---
v4: Trigger checkpoint for fsync after falloc -k
v3: Add a Fixes line.
v2: Apply Chao's suggestion from v1. No logical changes.
v1: Fix missing space reclamation during the recovery process.
---
     fs/f2fs/checkpoint.c |  3 +++
     fs/f2fs/f2fs.h       |  3 +++
     fs/f2fs/file.c       |  8 ++++++--
     fs/f2fs/recovery.c   | 18 +++++++++++++++++-
     4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index db3831f7f2f5..775e3333097e 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1151,6 +1151,9 @@ static int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi)
             if (inode) {
                 sync_inode_metadata(inode, 0);
     +            if (is_inode_flag_set(inode, FI_FALLOC_KEEP_SIZE))
+                clear_inode_flag(inode, FI_FALLOC_KEEP_SIZE);
+
                 /* it's on eviction */
                 if (is_inode_flag_set(inode, FI_DIRTY_INODE))
                     f2fs_update_inode_page(inode);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 46be7560548c..f5a54bc848d5 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -459,6 +459,7 @@ struct fsync_inode_entry {
         struct inode *inode;    /* vfs inode pointer */
         block_t blkaddr;    /* block address locating the last fsync */          block_t last_dentry;    /* block address locating the last dentry */ +    loff_t max_i_size;    /* previous max file size for truncate */
     };
          #define nats_in_cursum(jnl) (le16_to_cpu((jnl)->n_nats))
@@ -835,6 +836,7 @@ enum {
         FI_ATOMIC_REPLACE,    /* indicate atomic replace */
         FI_OPENED_FILE,        /* indicate file has been opened */
         FI_DONATE_FINISHED,    /* indicate page donation of file has been finished */ +    FI_FALLOC_KEEP_SIZE,    /* file allocate reserved space and keep size */
         FI_MAX,            /* max flag, never be used */
     };
     @@ -1193,6 +1195,7 @@ enum cp_reason_type {
         CP_SPEC_LOG_NUM,
         CP_RECOVER_DIR,
         CP_XATTR_DIR,
+    CP_FALLOC_FILE,
     };
          enum iostat_type {
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 42faaed6a02d..f0820f817824 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -236,6 +236,8 @@ static inline enum cp_reason_type need_do_checkpoint(struct inode *inode)          else if (f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino,
                                 XATTR_DIR_INO))
             cp_reason = CP_XATTR_DIR;
+    else if (is_inode_flag_set(inode, FI_FALLOC_KEEP_SIZE))
+        cp_reason = CP_FALLOC_FILE;
              return cp_reason;
     }
@@ -1953,10 +1955,12 @@ static int f2fs_expand_inode_data(struct inode *inode, loff_t offset,
         }
              if (new_size > i_size_read(inode)) {
-        if (mode & FALLOC_FL_KEEP_SIZE)
+        if (mode & FALLOC_FL_KEEP_SIZE) {
+            set_inode_flag(inode, FI_FALLOC_KEEP_SIZE);
Xiaojun,

Well, what about this case?

falloc -k ofs size file
flush all data and metadata of file
Hi Chao,
Flush all data and metadata of file, but without using fsync or CP?
Xiaojun,

I think so, or am I missing someting?

Thanks,
Hi Chao,
I think this case is possible. Thank you for pointing out this issue.
I will fix it in the next version.

Thanks,

Thanks,

evict inode
write file & fsync file won't trigger a checkpoint?

Or am I missing something?

Thanks,

file_set_keep_isize(inode);
-        else
+        } else {
                 f2fs_i_size_write(inode, new_size);
+        }
         }
              return err;
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 4cb3a91801b4..68b62c8a74d3 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -95,6 +95,7 @@ static struct fsync_inode_entry *add_fsync_inode(struct f2fs_sb_info *sbi,
         entry = f2fs_kmem_cache_alloc(fsync_entry_slab,
                         GFP_F2FS_ZERO, true, NULL);
         entry->inode = inode;
+    entry->max_i_size = i_size_read(inode);
         list_add_tail(&entry->list, head);
              return entry;
@@ -796,6 +797,7 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list,
         while (1) {
             struct fsync_inode_entry *entry;
             struct folio *folio;
+        loff_t i_size;
                  if (!f2fs_is_valid_blkaddr(sbi, blkaddr, META_POR))
                 break;
@@ -828,6 +830,9 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list,
                     break;
                 }
                 recovered_inode++;
+            i_size = i_size_read(entry->inode);
+            if (entry->max_i_size < i_size)
+                entry->max_i_size = i_size;
             }
             if (entry->last_dentry == blkaddr) {
                 err = recover_dentry(entry->inode, folio, dir_list); @@ -844,8 +849,19 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list,
             }
             recovered_dnode++;
     -        if (entry->blkaddr == blkaddr)
+        if (entry->blkaddr == blkaddr) {
+            i_size = i_size_read(entry->inode);
+            if (entry->max_i_size > i_size) {
+                err = f2fs_truncate_blocks(entry->inode,
+                            i_size, false);
+                if (err) {
+                    f2fs_folio_put(folio, true);
+                    break;
+                }
+ f2fs_mark_inode_dirty_sync(entry->inode, true);
+            }
                 list_move_tail(&entry->list, tmp_inode_list);
+        }
     next:
             ra_blocks = adjust_por_ra_blocks(sbi, ra_blocks, blkaddr,
                         next_blkaddr_of_node(folio));



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to