On 12/1/25 19:24, Xiaole He wrote:
Hi Shengyong, Hi Jinbao,

Thank you for providing the patch. I have tested the patch and found a critical 
regression that can lead to data loss or unexpected failure during a no-op 
resize.

The issue arises because the fixed HDIO_GETGEO calculation, when applied to a 
legacy image created with start_sector=4097s, causes the segment_count to 
unexpectedly shrink due to the new alignment calculation.

[snip]


Hi, Xiaole,

Thanks for your very detailed testing and analysis!


2. Analysis and Proposed Solution
This issue confirms that the corrected geometry calculation can lead to 
undesirable outcomes for legacy images. When the new correct alignment requires 
a larger offset (zone_align_start_offset), it causes a reduction in the space 
allocated for the main_area.

I observe that this volume shrink and subsequent failure primarily occur during 
a no-op resize (resize.f2fs attempting to resize to the same size) when the 
filesystem is near full capacity. When normally growing the partition size, 
this issue can typically be avoided.

Could you please confirm if this failure during a no-op resize scenario is 
considered acceptable behavior for resize.f2fs?

Right, for a no-op resize, it should indeed do nothing. So if segment_count 
changes, we
should adjust the "target device size" (rather than the real device size) to 
detect
whether a resize is actually needed. I haven't tested the following change yet, 
but I'll
do it as soon as I can :-(

diff --git a/fsck/resize.c b/fsck/resize.c
index 78082e90e9d2..c7a644498655 100644
--- a/fsck/resize.c
+++ b/fsck/resize.c
@@ -814,6 +814,32 @@ int f2fs_resize(struct f2fs_sb_info *sbi)
 {
        struct f2fs_super_block *sb = F2FS_RAW_SUPER(sbi);
        uint64_t target_blocks = c.target_sectors * c.sector_size >> 
get_sb(log_blocksize);
+       uint32_t blk_size_bytes = 1 << get_sb(log_blocksize);
+       uint32_t segment_size_bytes = 1 << (get_sb(log_blocksize) +
+                                                get_sb(log_blocks_per_seg));
+       uint32_t segs_per_zone = get_sb(segs_per_sec) * get_sb(secs_per_zone);
+       uint32_t zone_size_bytes = segment_size_bytes * segs_per_zone;
+       uint32_t diff = 0;
+
+       /* adjust target size if segment count will be changed
+        * FIXME: it's better to check old/new seg0_blkaddr?
+        */
+       if ((target_blocks * blk_size_bytes) % zone_size_bytes) {
+               delta_blks = ((target_blocks * blk_size_bytes) %one_size_bytes) 
/ blk_size_bytes;
+               target_blocks -= delta_blks;
+
+               /* because of the new alignment, there will have at most 3 
segments(3 or 2?)
+                * that are regarded as padding areas:
+                *  - less than 2 segs between sb and cp
+                *  - tail blocks which are less than a seg
+                */
+               diff = 3 << get_sb(log_blocks_per_seg);
+       }
+       if (target_blocks < diff) {
+               ASSERT_MSG(0, "Nothing to resize, not allowed?\n");
+               return -1;
+       }
+       target_blocks -= diff;
MSG(0, "Info: reszie wanted target FS sectors = %"PRIu64" (%"PRIu64" MB)\n" ,
                        target_blocks << sbi->log_sectors_per_block,


If this behavior is considered an issue that needs to be addressed, I have two 
proposals regarding the geometric calculation for legacy images:

A. Proactive Geometry Override (Preferred Approach):
We should attempt to deduce the original start_sector from the existing Superblock 
(specifically by referencing old_sb->segment0_blkaddr). If the deduced legacy 
start_sector is inconsistent with the sector reported by HDIO_GETGEO (e.g., if the 
image was created using the old buggy logic), we should use the deduced/legacy 
start_sector to override the HDIO_GETGEO result for the duration of the resize 
calculation. This ensures the geometric calculation matches the existing on-disk 
layout, preventing the unexpected shrink and failure.

This seems to be the simplest workaround for resize, while we're trying to fix 
the
HDIO_GETGEO typo for now.


B. Maintain Legacy Bug (If deduction is too complex):
If implementing the reliable deduction logic (Proposal A) proves to be overly 
complex or risky, would it be better to maintain the original, incorrect 
HDIO_GETGEO behavior for resize.f2fs to prioritize compatibility with existing 
images over geometric correctness?

I think proposal A won't be too complex, if I miss anything please correct me.
seg0_blkadd can help us to calculate the "start_sector".
Yeah, keep the incorrect HDIO_GETGEO behavior may be the safest way for now :-(


3. Resource Constraints & Test Coverage
The root cause of this issue is that after the bug in HDIO_GETGEO was fixed, 
the obtained value for start_sector differs from its previous value (which was 
implicitly 0). Consequently, the zone_align_start_offset calculation, which 
depends on start_sector, changed. Many crucial fields within the superblock, in 
turn, depend on zone_align_start_offset.

Further analysis reveals that the zone_align_start_offset exhibits a periodic 
change relative to start_sector. For instance, the zone_align_start_offset 
generated when start_sector is in the range of 0 to 4095 is identical to the 
one generated when start_sector is in the range of 4096 to 8191.

Based on this analysis, I have developed a test script that iterates through 
start_sector values from 4096 to 8191 to observe the behavior produced by 
resize.f2fs. The script is as follows:


Thanks again for the test script. I’ll run it when I have a free slot.

thanks,
shengyong

[snip]

But as an individual researcher with limited hardware, my testing environment 
is currently constrained by my academic work.

Please note that I only have access to a single SSD around 100GB for my 
testing, which limits the scope of my regression validation (especially for 
large-scale performance or capacity checks). Given your corporate environment, 
your team would be much better positioned to run extensive, large-scale 
regression tests on the final fix. Of course, whenever I have spare time, I 
will also continue running this script for further testing.

I kindly request that you use these reproducers and the test script to validate 
the fix on your end once implemented. If there is anything further I can do to 
assist with the analysis or implementation, please do not hesitate to reach out.

Note on Script Validation Logic:

The validation logic within this script for checking new vs. old superblock 
fields (specifically the difference in field values) may not be perfectly 
accurate or comprehensive. However, this is acceptable, as these checks are 
primarily for informational purposes and will not be used as the definitive 
criteria for judging the bug.

The true criteria for bug judgment remain:
1.  Whether resize.f2fs executes successfully without failure.
2.  More importantly, whether the original created files remains fully 
consistent after the execution of resize.f2fs.

Please feel free to reach out to me if you have any questions or require 
clarification regarding the script or the expected outcomes.

Thanks,
Xiaole He




_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to