On 12/1/25 19:24, Xiaole He wrote:
Hi Shengyong, Hi Jinbao,
Thank you for providing the patch. I have tested the patch and found a critical
regression that can lead to data loss or unexpected failure during a no-op
resize.
The issue arises because the fixed HDIO_GETGEO calculation, when applied to a
legacy image created with start_sector=4097s, causes the segment_count to
unexpectedly shrink due to the new alignment calculation.
[snip]
Hi, Xiaole,
Thanks for your very detailed testing and analysis!
2. Analysis and Proposed Solution
This issue confirms that the corrected geometry calculation can lead to
undesirable outcomes for legacy images. When the new correct alignment requires
a larger offset (zone_align_start_offset), it causes a reduction in the space
allocated for the main_area.
I observe that this volume shrink and subsequent failure primarily occur during
a no-op resize (resize.f2fs attempting to resize to the same size) when the
filesystem is near full capacity. When normally growing the partition size,
this issue can typically be avoided.
Could you please confirm if this failure during a no-op resize scenario is
considered acceptable behavior for resize.f2fs?
Right, for a no-op resize, it should indeed do nothing. So if segment_count
changes, we
should adjust the "target device size" (rather than the real device size) to
detect
whether a resize is actually needed. I haven't tested the following change yet,
but I'll
do it as soon as I can :-(
diff --git a/fsck/resize.c b/fsck/resize.c
index 78082e90e9d2..c7a644498655 100644
--- a/fsck/resize.c
+++ b/fsck/resize.c
@@ -814,6 +814,32 @@ int f2fs_resize(struct f2fs_sb_info *sbi)
{
struct f2fs_super_block *sb = F2FS_RAW_SUPER(sbi);
uint64_t target_blocks = c.target_sectors * c.sector_size >>
get_sb(log_blocksize);
+ uint32_t blk_size_bytes = 1 << get_sb(log_blocksize);
+ uint32_t segment_size_bytes = 1 << (get_sb(log_blocksize) +
+ get_sb(log_blocks_per_seg));
+ uint32_t segs_per_zone = get_sb(segs_per_sec) * get_sb(secs_per_zone);
+ uint32_t zone_size_bytes = segment_size_bytes * segs_per_zone;
+ uint32_t diff = 0;
+
+ /* adjust target size if segment count will be changed
+ * FIXME: it's better to check old/new seg0_blkaddr?
+ */
+ if ((target_blocks * blk_size_bytes) % zone_size_bytes) {
+ delta_blks = ((target_blocks * blk_size_bytes) %one_size_bytes)
/ blk_size_bytes;
+ target_blocks -= delta_blks;
+
+ /* because of the new alignment, there will have at most 3
segments(3 or 2?)
+ * that are regarded as padding areas:
+ * - less than 2 segs between sb and cp
+ * - tail blocks which are less than a seg
+ */
+ diff = 3 << get_sb(log_blocks_per_seg);
+ }
+ if (target_blocks < diff) {
+ ASSERT_MSG(0, "Nothing to resize, not allowed?\n");
+ return -1;
+ }
+ target_blocks -= diff;
MSG(0, "Info: reszie wanted target FS sectors = %"PRIu64" (%"PRIu64" MB)\n" ,
target_blocks << sbi->log_sectors_per_block,
If this behavior is considered an issue that needs to be addressed, I have two
proposals regarding the geometric calculation for legacy images:
A. Proactive Geometry Override (Preferred Approach):
We should attempt to deduce the original start_sector from the existing Superblock
(specifically by referencing old_sb->segment0_blkaddr). If the deduced legacy
start_sector is inconsistent with the sector reported by HDIO_GETGEO (e.g., if the
image was created using the old buggy logic), we should use the deduced/legacy
start_sector to override the HDIO_GETGEO result for the duration of the resize
calculation. This ensures the geometric calculation matches the existing on-disk
layout, preventing the unexpected shrink and failure.
This seems to be the simplest workaround for resize, while we're trying to fix
the
HDIO_GETGEO typo for now.
B. Maintain Legacy Bug (If deduction is too complex):
If implementing the reliable deduction logic (Proposal A) proves to be overly
complex or risky, would it be better to maintain the original, incorrect
HDIO_GETGEO behavior for resize.f2fs to prioritize compatibility with existing
images over geometric correctness?
I think proposal A won't be too complex, if I miss anything please correct me.
seg0_blkadd can help us to calculate the "start_sector".
Yeah, keep the incorrect HDIO_GETGEO behavior may be the safest way for now :-(
3. Resource Constraints & Test Coverage
The root cause of this issue is that after the bug in HDIO_GETGEO was fixed,
the obtained value for start_sector differs from its previous value (which was
implicitly 0). Consequently, the zone_align_start_offset calculation, which
depends on start_sector, changed. Many crucial fields within the superblock, in
turn, depend on zone_align_start_offset.
Further analysis reveals that the zone_align_start_offset exhibits a periodic
change relative to start_sector. For instance, the zone_align_start_offset
generated when start_sector is in the range of 0 to 4095 is identical to the
one generated when start_sector is in the range of 4096 to 8191.
Based on this analysis, I have developed a test script that iterates through
start_sector values from 4096 to 8191 to observe the behavior produced by
resize.f2fs. The script is as follows:
Thanks again for the test script. I’ll run it when I have a free slot.
thanks,
shengyong
[snip]
But as an individual researcher with limited hardware, my testing environment
is currently constrained by my academic work.
Please note that I only have access to a single SSD around 100GB for my
testing, which limits the scope of my regression validation (especially for
large-scale performance or capacity checks). Given your corporate environment,
your team would be much better positioned to run extensive, large-scale
regression tests on the final fix. Of course, whenever I have spare time, I
will also continue running this script for further testing.
I kindly request that you use these reproducers and the test script to validate
the fix on your end once implemented. If there is anything further I can do to
assist with the analysis or implementation, please do not hesitate to reach out.
Note on Script Validation Logic:
The validation logic within this script for checking new vs. old superblock
fields (specifically the difference in field values) may not be perfectly
accurate or comprehensive. However, this is acceptable, as these checks are
primarily for informational purposes and will not be used as the definitive
criteria for judging the bug.
The true criteria for bug judgment remain:
1. Whether resize.f2fs executes successfully without failure.
2. More importantly, whether the original created files remains fully
consistent after the execution of resize.f2fs.
Please feel free to reach out to me if you have any questions or require
clarification regarding the script or the expected outcomes.
Thanks,
Xiaole He
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel