On Nov 13, 2019 / 10:41, Shin'ichiro Kawasaki wrote: > On Nov 11, 2019 / 11:27, Chao Yu wrote: > > On 2019/11/8 12:27, Shinichiro Kawasaki wrote: > > > On Nov 05, 2019 / 20:22, Chao Yu wrote: > > >> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: > > >>> To catch f2fs bugs in write pointer handling code for zoned block > > >>> devices, check write pointers of non-open zones that current segments do > > >>> not point to. Do this check at mount time, after the fsync data recovery > > >>> and current segments' write pointer consistency fix. Check two items > > >>> comparing write pointers with valid block maps in SIT. > > >>> > > >>> The first item is check for zones with no valid blocks. When there is no > > >>> valid blocks in a zone, the write pointer should be at the start of the > > >>> zone. If not, next write operation to the zone will cause unaligned > > >>> write > > >>> error. If write pointer is not at the zone start, make mount fail and > > >>> ask > > >>> users to run fsck. > > >>> > > >>> The second item is check between the write pointer position and the last > > >>> valid block in the zone. It is unexpected that the last valid block > > >>> position is beyond the write pointer. In such a case, report as the bug. > > >>> Fix is not required for such zone, because the zone is not selected for > > >>> next write operation until the zone get discarded. > > >>> > > >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it > > >>> in segment.c also. > > >>> > > >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawas...@wdc.com> > > >>> --- > > >>> fs/f2fs/f2fs.h | 3 + > > >>> fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++ > > >>> fs/f2fs/super.c | 11 ++-- > > >>> 3 files changed, 157 insertions(+), 4 deletions(-) > > >>> > > >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > >>> index 0216282c5b80..e8524be17852 100644 > > >>> --- a/fs/f2fs/f2fs.h > > >>> +++ b/fs/f2fs/f2fs.h > > >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct > > >>> f2fs_journal *journal, int type, > > >>> unsigned int val, int alloc); > > >>> void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct > > >>> cp_control *cpc); > > >>> int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool > > >>> check_only); > > >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); > > >>> int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); > > >>> void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); > > >>> int __init f2fs_create_segment_manager_caches(void); > > >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct > > >>> f2fs_sb_info *sbi, int devi, > > >>> > > >>> return test_bit(zno, FDEV(devi).blkz_seq); > > >>> } > > >>> + > > >>> +#define F2FS_REPORT_NR_ZONES 4096 > > >>> #endif > > >>> > > >>> static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) > > >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > > >>> index 2b6e637dd6d3..20ef5b3705e1 100644 > > >>> --- a/fs/f2fs/segment.c > > >>> +++ b/fs/f2fs/segment.c > > >>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct > > >>> f2fs_sb_info *sbi) > > >>> > > >>> #ifdef CONFIG_BLK_DEV_ZONED > > >>> > > >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi, > > >>> + struct f2fs_dev_info *fdev, > > >>> + struct blk_zone *zone) > > >>> +{ > > >>> + unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, > > >>> segno; > > >>> + block_t zone_block, wp_block, last_valid_block, b; > > >>> + unsigned int log_sectors_per_block = sbi->log_blocksize - > > >>> SECTOR_SHIFT; > > >>> + int i; > > >>> + struct seg_entry *se; > > >>> + > > >>> + wp_block = fdev->start_blk + (zone->wp >> > > >>> log_sectors_per_block); > > >>> + wp_segno = GET_SEGNO(sbi, wp_block); > > >>> + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); > > >>> + zone_block = fdev->start_blk + (zone->start >> > > >>> log_sectors_per_block); > > >>> + zone_segno = GET_SEGNO(sbi, zone_block); > > >>> + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); > > >>> + > > >>> + if (zone_segno >= MAIN_SEGS(sbi)) > > >>> + return 0; > > >>> + > > >>> + /* > > >>> + * If a curseg points to the zone, skip check because the zone > > >>> + * may have fsync data that valid block map does not mark. > > >> > > >> None-curseg zone may also contain fsynced data as well? Maybe we can > > >> only verify > > >> on clean image or recovered image? > > > > > > Right. This function for none-curseg zones should be called after fsync > > > data > > > > If so, any place to check recovery is done? You know, user can choose to > > skip > > recovery by using disable_roll_forward/norecovery mount option. > > Ah, disable_roll_forward/norecovery mount option needs some care. The patch I > posted did not care it well enough. > > When disable_roll_forward/norecovery mount option is set, I think the function > for none-curseg zones should be called only if there is no fsync data. If > fsync > data exists, mount fails regardless of write pointer status. > > I will modify the function calling step and conditions in the next patch.
I have sent out the v3 patch series, which reflects the comments above. > > > > > > recovery. I think my comment above is confusing. The point is that this > > > function is for none-curseg zones, and other function covers check for > > > curseg > > > zones. Let me revise the comment as follows: > > > > > > Skip check of zones cursegs point to, since > > > fix_curseg_write_pointer() > > > checks them. > > > > > >> > > >>> + */ > > >>> + for (i = 0; i < NO_CHECK_TYPE; i++) > > >>> + if (zone_secno == GET_SEC_FROM_SEG(sbi, > > >>> + CURSEG_I(sbi, > > >>> i)->segno)) > > >>> + return 0; > > >>> + > > >>> + /* > > >>> + * Get last valid block of the zone. > > >>> + */ > > >>> + last_valid_block = zone_block - 1; > > >>> + for (s = 0; s < sbi->segs_per_sec; s++) { > > >>> + segno = zone_segno + s; > > >>> + se = get_seg_entry(sbi, segno); > > >>> + for (b = 0; b < sbi->blocks_per_seg; b++) > > >>> + if (f2fs_test_bit(b, se->cur_valid_map)) > > >>> + last_valid_block = START_BLOCK(sbi, > > >>> segno) + b; > > >>> + } > > >> > > >> We search bitmap table reversedly. > > > > > > Yes, will reverse the loops in the next post. > > > > > >> > > >>> + > > >>> + /* > > >>> + * If last valid block is beyond the write pointer, report the > > >>> + * inconsistency. This inconsistency does not cause write error > > >>> + * because the zone will not be selected for write operation > > >>> until > > >>> + * it get discarded. Just report it. > > >>> + */ > > >>> + if (last_valid_block >= wp_block) { > > >>> + f2fs_notice(sbi, "Valid block beyond write pointer: " > > >>> + "valid block[0x%x,0x%x] wp[0x%x,0x%x]", > > >>> + GET_SEGNO(sbi, last_valid_block), > > >>> + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), > > >>> + wp_segno, wp_blkoff); > > >>> + return 0; > > >>> + } > > >>> + > > >>> + /* > > >>> + * If there is no valid block in the zone and if write pointer > > >>> is > > >>> + * not at zone start, report the error to run fsck and mark the > > >>> + * zone as used. > > >>> + */ > > >>> + if (last_valid_block + 1 == zone_block && zone->wp != > > >>> zone->start) { > > >>> + f2fs_notice(sbi, > > >>> + "Zone without valid block has non-zero > > >>> write " > > >>> + "pointer, run fsck to fix: wp[0x%x,0x%x]", > > >>> + wp_segno, wp_blkoff); > > >>> + __set_inuse(sbi, zone_segno); > > >> > > >> Why do we need to set it inused? if this is necessary, we need to call > > >> this > > >> under free_i->segmap_lock. > > > > > > I once thought that I need to set inconsistent zones in use, because I > > > observed > > > that write operation happened after zone consistency check failure (in > > > fill_super() after free_meta label). It caused unaligned writer error. To > > > avoid > > > it, I added __set_inuse() to keep inconsistent zones not selected for the > > > write > > > target. > > > > > > But that write operation happened because the write pointer fix curseg > > > was done > > > out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can > > > protect > > > write operation, and I don't think set in use for the inconsistent zones > > > is > > > required. Will remove __set_inuse() calls from this patch and the first > > > patch. > > > > Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd > > be > > safe here. > > Ok, I note it. Now I'm looking at the code again, and I think the write > pointer > fix for cursegs can be done within SBI_POR_DOING protection. At this moment, I > think f2fs_stop_checkpoint() is not necessary. My comment above could be confusing, which is about the 1st patch. I removed f2fs_stop_checkpoint() from the v3 1st patch. As for none-curseg zones cannot be done within SBI_POR_DOING, still f2fs_stop_checkpoint() is required. I kept it in v3 2nd patch. Thanks in advance for the reviews and comments on the v3 seires. -- Best Regards, Shin'ichiro Kawasaki _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel