On Nov 11, 2019 / 11:14, Chao Yu wrote: > On 2019/11/6 17:45, Shinichiro Kawasaki wrote: > > On Nov 05, 2019 / 19:06, Chao Yu wrote: > >> On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote: > >>> On sudden f2fs shutdown, write pointers of zoned block devices can go > >>> further but f2fs meta data keeps current segments at positions before the > >>> write operations. After remounting the f2fs, this inconsistency causes > >>> write operations not at write pointers and "Unaligned write command" > >>> error is reported. > >>> > >>> To avoid the error, have f2fs.fsck check consistency of write pointers > >>> of open zones that current segments point to. Compare each current > >>> segment's position and the write pointer position of the open zone. If > >>> inconsistency is found and 'fix_on' flag is set, assign a new zone to the > >>> current segment and check the newly assigned zone has write pointer at > >>> the zone start. Leave the original zone as is to keep data recorded in > >>> it. > >>> > >>> To care about fsync data, refer each seg_entry's ckpt_valid_map to get > >>> the last valid block in the zone. If the last valid block is beyond the > >>> current segments position, fsync data exits in the zone. In case fsync > >>> data exists, do not assign a new zone to the current segment not to lose > >>> the fsync data. It is expected that the kernel replay the fsync data and > >>> fix the write pointer inconsistency at mount time. > >>> > >>> Also check consistency between write pointer of the zone the current > >>> segment points to with valid block maps of the zone. If the last valid > >>> block is beyond the write pointer position, report to indicate f2fs bug. > >>> If 'fix_on' flag is set, assign a new zone to the current segment. > >>> > >>> When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to > >>> ask users to fix them or not. When inconsistencies get fixed, turn on > >>> 'force' flag in fsck_verify() to enforce fixes in following checks. > >>> > >>> This check and fix is done twice. The first is done at the beginning of > >>> do_fsck() function so that other fixes can reflect the current segment > >>> modification. The second is done in fsck_verify() to reflect updated meta > >>> data by other fixes. > >>> > >>> Signed-off-by: Shin'ichiro Kawasaki <[email protected]> > >>> --- > >>> fsck/f2fs.h | 5 ++ > >>> fsck/fsck.c | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> fsck/fsck.h | 3 + > >>> fsck/main.c | 2 + > >>> fsck/mount.c | 49 +++++++++++++++- > >>> 5 files changed, 212 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/fsck/f2fs.h b/fsck/f2fs.h > >>> index 399c74d..07513cb 100644 > >>> --- a/fsck/f2fs.h > >>> +++ b/fsck/f2fs.h > >>> @@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct > >>> f2fs_sb_info *sbi) > >>> #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr) > >>> \ > >>> (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1)) > >>> > >>> +#define GET_SEC_FROM_SEG(sbi, segno) > >>> \ > >>> + ((segno) / (sbi)->segs_per_sec) > >>> +#define GET_SEG_FROM_SEC(sbi, secno) > >>> \ > >>> + ((secno) * (sbi)->segs_per_sec) > >>> + > >>> #define FREE_I_START_SEGNO(sbi) > >>> \ > >>> GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr) > >>> #define GET_R2L_SEGNO(sbi, segno) (segno + > >>> FREE_I_START_SEGNO(sbi)) > >>> diff --git a/fsck/fsck.c b/fsck/fsck.c > >>> index 2ae3bd5..e0eda4e 100644 > >>> --- a/fsck/fsck.c > >>> +++ b/fsck/fsck.c > >>> @@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info > >>> *sbi) > >>> fix_checkpoint(sbi); > >>> } > >>> > >>> +#ifdef HAVE_LINUX_BLKZONED_H > >>> + > >>> +/* > >>> + * Refer valid block map and return offset of the last valid block in > >>> the zone. > >>> + * Obtain valid block map from SIT and fsync data. > >>> + * If there is no valid block in the zone, return -1. > >>> + */ > >>> +static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi, > >>> + unsigned int zone_segno) > >>> +{ > >>> + unsigned int s; > >>> + unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone; > >>> + struct seg_entry *se; > >>> + block_t b; > >>> + int ret = -1; > >>> + > >>> + for (s = 0; s < segs_per_zone; s++) { > >>> + se = get_seg_entry(sbi, zone_segno + s); > >>> + > >>> + /* > >>> + * Refer not cur_valid_map but ckpt_valid_map which reflects > >>> + * fsync data. > >>> + */ > >>> + ASSERT(se->ckpt_valid_map); > >>> + for (b = 0; b < sbi->blocks_per_seg; b++) > >>> + if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) > >>> + ret = b + (s << sbi->log_blocks_per_seg); > >> > >> Minor thing, I guess we only need to find last valid block in target zone? > >> > >> int s, b; > >> > >> for (s = segs_per_zone - 1; s >= 0; s--) { > >> for (b = sbi->blocks_per_seg - 1; b >= 0; b--) > >> if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) { > >> ret = b + (s << sbi->log_blocks_per_seg); > >> break; > >> } > >> } > > > > Yes, reveresed search is the better. Will modify the code as suggested. > > > >> > >>> + } > >>> + > >>> + return ret; > >>> +} > >>> + > >>> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type) > >>> +{ > >>> + struct curseg_info *curseg = CURSEG_I(sbi, type); > >>> + struct f2fs_fsck *fsck = F2FS_FSCK(sbi); > >>> + struct blk_zone blkz; > >>> + block_t cs_block, wp_block, zone_last_vblock; > >>> + u_int64_t cs_sector, wp_sector; > >>> + int i, ret; > >>> + unsigned int zone_segno; > >>> + int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; > >>> + > >>> + /* get the device the curseg points to */ > >>> + cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff; > >>> + for (i = 0; i < MAX_DEVICES; i++) { > >>> + if (!c.devices[i].path) > >>> + break; > >>> + if (c.devices[i].start_blkaddr <= cs_block && > >>> + cs_block <= c.devices[i].end_blkaddr) > >>> + break; > >>> + } > >>> + > >>> + if (i >= MAX_DEVICES) > >>> + return -EINVAL; > >>> + > >>> + /* get write pointer position of the zone the curseg points to */ > >>> + cs_sector = (cs_block - c.devices[i].start_blkaddr) > >>> + << log_sectors_per_block; > >>> + ret = f2fs_report_zone(i, cs_sector, &blkz); > >>> + if (ret) > >>> + return ret; > >>> + > >>> + if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ) > >>> + return 0; > >>> + > >>> + /* check consistency between the curseg and the write pointer */ > >>> + wp_block = c.devices[i].start_blkaddr + > >>> + (blk_zone_wp_sector(&blkz) >> log_sectors_per_block); > >>> + wp_sector = blk_zone_wp_sector(&blkz); > >>> + > >>> + if (cs_sector == wp_sector) > >>> + return 0; > >>> + > >>> + if (cs_sector > wp_sector) { > >>> + MSG(0, "Inconsistent write pointer with curseg %d: " > >>> + "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n", > >>> + type, type, curseg->segno, curseg->next_blkoff, > >>> + GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block)); > >>> + fsck->chk.wp_inconsistent_zones++; > >>> + return -EINVAL; > >>> + } > >>> + > >>> + MSG(0, "Write pointer goes advance from curseg %d: " > >>> + "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n", > >>> + type, type, curseg->segno, curseg->next_blkoff, > >>> + GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block)); > >>> + > >>> + zone_segno = GET_SEG_FROM_SEC(sbi, > >>> + GET_SEC_FROM_SEG(sbi, curseg->segno)); > >>> + zone_last_vblock = START_BLOCK(sbi, zone_segno) + > >>> + last_vblk_off_in_zone(sbi, zone_segno); > >>> + > >>> + /* > >>> + * If fsync data exists between the curseg and the last valid block, > >>> + * it is not an error to fix. Leave it for kernel to recover later. > >>> + */ > >>> + if (cs_block <= zone_last_vblock) { > >> > >> According to above comments, should condition be? > >> > >> if (cs_block < zone_last_vblock && zone_last_vblock <= wp_block) > >> > > > > To be precise, cs_block points to curseg->next_blkoff, which is the block > > curseg will write in the next write request. Then, if cs_block equals to > > zone_last_vblock, it means that the block curseg->next_blkoff points to > > already have valid block and fsync data. Then, comparator between cs_block > > and zone_last_vblock should be "<=". > > You're right. > > > > > I agree that it is the better to check zone_last_vblock with wp_block. > > wp_block corresponds to the write pointer position that next write will be > > made. It wp_block equals to zone_last_vblock, it means that unexpected data > > is written beyond the write pointer. Then, comparator should be "<" between > > zone_last_vblock and wp_block. > > Oh, so wp has almost the same meaning to .next_blkoff in f2fs, it points to > next > free block/sector. I will keep that in mind.
Right, wp and .next_blkoff are really similar, or same :) > > > > > In short, I suggest the condition check below as the good one. > > > > if (cs_block <= zone_last_vblock && zone_last_vblock < wp_block) > > It's fine to me. :) Ok, will reflect in the next patch post. -- Best Regards, Shin'ichiro Kawasaki _______________________________________________ Linux-f2fs-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
