On Nov 13, 2019 / 10:41, Shin'ichiro Kawasaki wrote:
> On Nov 11, 2019 / 11:27, Chao Yu wrote:
> > On 2019/11/8 12:27, Shinichiro Kawasaki wrote:
> > > On Nov 05, 2019 / 20:22, Chao Yu wrote:
> > >> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
> > >>> To catch f2fs bugs in write pointer handling code for zoned block
> > >>> devices, check write pointers of non-open zones that current segments do
> > >>> not point to. Do this check at mount time, after the fsync data recovery
> > >>> and current segments' write pointer consistency fix. Check two items
> > >>> comparing write pointers with valid block maps in SIT.
> > >>>
> > >>> The first item is check for zones with no valid blocks. When there is no
> > >>> valid blocks in a zone, the write pointer should be at the start of the
> > >>> zone. If not, next write operation to the zone will cause unaligned 
> > >>> write
> > >>> error. If write pointer is not at the zone start, make mount fail and 
> > >>> ask
> > >>> users to run fsck.
> > >>>
> > >>> The second item is check between the write pointer position and the last
> > >>> valid block in the zone. It is unexpected that the last valid block
> > >>> position is beyond the write pointer. In such a case, report as the bug.
> > >>> Fix is not required for such zone, because the zone is not selected for
> > >>> next write operation until the zone get discarded.
> > >>>
> > >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> > >>> in segment.c also.
> > >>>
> > >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawas...@wdc.com>
> > >>> ---
> > >>>  fs/f2fs/f2fs.h    |   3 +
> > >>>  fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++
> > >>>  fs/f2fs/super.c   |  11 ++--
> > >>>  3 files changed, 157 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > >>> index 0216282c5b80..e8524be17852 100644
> > >>> --- a/fs/f2fs/f2fs.h
> > >>> +++ b/fs/f2fs/f2fs.h
> > >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct 
> > >>> f2fs_journal *journal, int type,
> > >>>                         unsigned int val, int alloc);
> > >>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct 
> > >>> cp_control *cpc);
> > >>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool 
> > >>> check_only);
> > >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> > >>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> > >>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> > >>>  int __init f2fs_create_segment_manager_caches(void);
> > >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct 
> > >>> f2fs_sb_info *sbi, int devi,
> > >>>  
> > >>>         return test_bit(zno, FDEV(devi).blkz_seq);
> > >>>  }
> > >>> +
> > >>> +#define F2FS_REPORT_NR_ZONES   4096
> > >>>  #endif
> > >>>  
> > >>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> > >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > >>> index 2b6e637dd6d3..20ef5b3705e1 100644
> > >>> --- a/fs/f2fs/segment.c
> > >>> +++ b/fs/f2fs/segment.c
> > >>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct 
> > >>> f2fs_sb_info *sbi)
> > >>>  
> > >>>  #ifdef CONFIG_BLK_DEV_ZONED
> > >>>  
> > >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> > >>> +                                   struct f2fs_dev_info *fdev,
> > >>> +                                   struct blk_zone *zone)
> > >>> +{
> > >>> +       unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, 
> > >>> segno;
> > >>> +       block_t zone_block, wp_block, last_valid_block, b;
> > >>> +       unsigned int log_sectors_per_block = sbi->log_blocksize - 
> > >>> SECTOR_SHIFT;
> > >>> +       int i;
> > >>> +       struct seg_entry *se;
> > >>> +
> > >>> +       wp_block = fdev->start_blk + (zone->wp >> 
> > >>> log_sectors_per_block);
> > >>> +       wp_segno = GET_SEGNO(sbi, wp_block);
> > >>> +       wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> > >>> +       zone_block = fdev->start_blk + (zone->start >> 
> > >>> log_sectors_per_block);
> > >>> +       zone_segno = GET_SEGNO(sbi, zone_block);
> > >>> +       zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> > >>> +
> > >>> +       if (zone_segno >= MAIN_SEGS(sbi))
> > >>> +               return 0;
> > >>> +
> > >>> +       /*
> > >>> +        * If a curseg points to the zone, skip check because the zone
> > >>> +        * may have fsync data that valid block map does not mark.
> > >>
> > >> None-curseg zone may also contain fsynced data as well? Maybe we can 
> > >> only verify
> > >> on clean image or recovered image?
> > > 
> > > Right. This function for none-curseg zones should be called after fsync 
> > > data
> > 
> > If so, any place to check recovery is done? You know, user can choose to 
> > skip
> > recovery by using disable_roll_forward/norecovery mount option.
> 
> Ah, disable_roll_forward/norecovery mount option needs some care. The patch I
> posted did not care it well enough.
> 
> When disable_roll_forward/norecovery mount option is set, I think the function
> for none-curseg zones should be called only if there is no fsync data. If 
> fsync
> data exists, mount fails regardless of write pointer status.
> 
> I will modify the function calling step and conditions in the next patch.

I have sent out the v3 patch series, which reflects the comments above.

> 
> > 
> > > recovery. I think my comment above is confusing. The point is that this
> > > function is for none-curseg zones, and other function covers check for 
> > > curseg
> > > zones. Let me revise the comment as follows:
> > > 
> > >      Skip check of zones cursegs point to, since 
> > > fix_curseg_write_pointer()
> > >      checks them.
> > > 
> > >>
> > >>> +        */
> > >>> +       for (i = 0; i < NO_CHECK_TYPE; i++)
> > >>> +               if (zone_secno == GET_SEC_FROM_SEG(sbi,
> > >>> +                                                  CURSEG_I(sbi, 
> > >>> i)->segno))
> > >>> +                       return 0;
> > >>> +
> > >>> +       /*
> > >>> +        * Get last valid block of the zone.
> > >>> +        */
> > >>> +       last_valid_block = zone_block - 1;
> > >>> +       for (s = 0; s < sbi->segs_per_sec; s++) {
> > >>> +               segno = zone_segno + s;
> > >>> +               se = get_seg_entry(sbi, segno);
> > >>> +               for (b = 0; b < sbi->blocks_per_seg; b++)
> > >>> +                       if (f2fs_test_bit(b, se->cur_valid_map))
> > >>> +                               last_valid_block = START_BLOCK(sbi, 
> > >>> segno) + b;
> > >>> +       }
> > >>
> > >> We search bitmap table reversedly.
> > > 
> > > Yes, will reverse the loops in the next post.
> > > 
> > >>
> > >>> +
> > >>> +       /*
> > >>> +        * If last valid block is beyond the write pointer, report the
> > >>> +        * inconsistency. This inconsistency does not cause write error
> > >>> +        * because the zone will not be selected for write operation 
> > >>> until
> > >>> +        * it get discarded. Just report it.
> > >>> +        */
> > >>> +       if (last_valid_block >= wp_block) {
> > >>> +               f2fs_notice(sbi, "Valid block beyond write pointer: "
> > >>> +                           "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> > >>> +                           GET_SEGNO(sbi, last_valid_block),
> > >>> +                           GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> > >>> +                           wp_segno, wp_blkoff);
> > >>> +               return 0;
> > >>> +       }
> > >>> +
> > >>> +       /*
> > >>> +        * If there is no valid block in the zone and if write pointer 
> > >>> is
> > >>> +        * not at zone start, report the error to run fsck and mark the
> > >>> +        * zone as used.
> > >>> +        */
> > >>> +       if (last_valid_block + 1 == zone_block && zone->wp != 
> > >>> zone->start) {
> > >>> +               f2fs_notice(sbi,
> > >>> +                           "Zone without valid block has non-zero 
> > >>> write "
> > >>> +                           "pointer, run fsck to fix: wp[0x%x,0x%x]",
> > >>> +                           wp_segno, wp_blkoff);
> > >>> +               __set_inuse(sbi, zone_segno);
> > >>
> > >> Why do we need to set it inused? if this is necessary, we need to call 
> > >> this
> > >> under free_i->segmap_lock.
> > > 
> > > I once thought that I need to set inconsistent zones in use, because I 
> > > observed
> > > that write operation happened after zone consistency check failure (in
> > > fill_super() after free_meta label). It caused unaligned writer error. To 
> > > avoid
> > > it, I added __set_inuse() to keep inconsistent zones not selected for the 
> > > write
> > > target.
> > > 
> > > But that write operation happened because the write pointer fix curseg 
> > > was done
> > > out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can 
> > > protect
> > > write operation, and I don't think set in use for the inconsistent zones 
> > > is
> > > required. Will remove __set_inuse() calls from this patch and the first 
> > > patch.
> > 
> > Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd 
> > be
> > safe here.
> 
> Ok, I note it. Now I'm looking at the code again, and I think the write 
> pointer
> fix for cursegs can be done within SBI_POR_DOING protection. At this moment, I
> think f2fs_stop_checkpoint() is not necessary.

My comment above could be confusing, which is about the 1st patch. I removed
f2fs_stop_checkpoint() from the v3 1st patch.

As for none-curseg zones cannot be done within SBI_POR_DOING, still
f2fs_stop_checkpoint() is required. I kept it in v3 2nd patch.

Thanks in advance for the reviews and comments on the v3 seires.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to