On Nov 11, 2019 / 11:14, Chao Yu wrote:
> On 2019/11/6 17:45, Shinichiro Kawasaki wrote:
> > On Nov 05, 2019 / 19:06, Chao Yu wrote:
> >> On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
> >>> On sudden f2fs shutdown, write pointers of zoned block devices can go
> >>> further but f2fs meta data keeps current segments at positions before the
> >>> write operations. After remounting the f2fs, this inconsistency causes
> >>> write operations not at write pointers and "Unaligned write command"
> >>> error is reported.
> >>>
> >>> To avoid the error, have f2fs.fsck check consistency of write pointers
> >>> of open zones that current segments point to. Compare each current
> >>> segment's position and the write pointer position of the open zone. If
> >>> inconsistency is found and 'fix_on' flag is set, assign a new zone to the
> >>> current segment and check the newly assigned zone has write pointer at
> >>> the zone start. Leave the original zone as is to keep data recorded in
> >>> it.
> >>>
> >>> To care about fsync data, refer each seg_entry's ckpt_valid_map to get
> >>> the last valid block in the zone. If the last valid block is beyond the
> >>> current segments position, fsync data exits in the zone. In case fsync
> >>> data exists, do not assign a new zone to the current segment not to lose
> >>> the fsync data. It is expected that the kernel replay the fsync data and
> >>> fix the write pointer inconsistency at mount time.
> >>>
> >>> Also check consistency between write pointer of the zone the current
> >>> segment points to with valid block maps of the zone. If the last valid
> >>> block is beyond the write pointer position, report to indicate f2fs bug.
> >>> If 'fix_on' flag is set, assign a new zone to the current segment.
> >>>
> >>> When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
> >>> ask users to fix them or not. When inconsistencies get fixed, turn on
> >>> 'force' flag in fsck_verify() to enforce fixes in following checks.
> >>>
> >>> This check and fix is done twice. The first is done at the beginning of
> >>> do_fsck() function so that other fixes can reflect the current segment
> >>> modification. The second is done in fsck_verify() to reflect updated meta
> >>> data by other fixes.
> >>>
> >>> Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
> >>> ---
> >>>  fsck/f2fs.h  |   5 ++
> >>>  fsck/fsck.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  fsck/fsck.h  |   3 +
> >>>  fsck/main.c  |   2 +
> >>>  fsck/mount.c |  49 +++++++++++++++-
> >>>  5 files changed, 212 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/fsck/f2fs.h b/fsck/f2fs.h
> >>> index 399c74d..07513cb 100644
> >>> --- a/fsck/f2fs.h
> >>> +++ b/fsck/f2fs.h
> >>> @@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct 
> >>> f2fs_sb_info *sbi)
> >>>  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)                              
> >>> \
> >>>   (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
> >>>  
> >>> +#define GET_SEC_FROM_SEG(sbi, segno)                                     
> >>> \
> >>> + ((segno) / (sbi)->segs_per_sec)
> >>> +#define GET_SEG_FROM_SEC(sbi, secno)                                     
> >>> \
> >>> + ((secno) * (sbi)->segs_per_sec)
> >>> +
> >>>  #define FREE_I_START_SEGNO(sbi)                                          
> >>> \
> >>>   GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
> >>>  #define GET_R2L_SEGNO(sbi, segno)        (segno + 
> >>> FREE_I_START_SEGNO(sbi))
> >>> diff --git a/fsck/fsck.c b/fsck/fsck.c
> >>> index 2ae3bd5..e0eda4e 100644
> >>> --- a/fsck/fsck.c
> >>> +++ b/fsck/fsck.c
> >>> @@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info 
> >>> *sbi)
> >>>   fix_checkpoint(sbi);
> >>>  }
> >>>  
> >>> +#ifdef HAVE_LINUX_BLKZONED_H
> >>> +
> >>> +/*
> >>> + * Refer valid block map and return offset of the last valid block in 
> >>> the zone.
> >>> + * Obtain valid block map from SIT and fsync data.
> >>> + * If there is no valid block in the zone, return -1.
> >>> + */
> >>> +static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi,
> >>> +                          unsigned int zone_segno)
> >>> +{
> >>> + unsigned int s;
> >>> + unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> >>> + struct seg_entry *se;
> >>> + block_t b;
> >>> + int ret = -1;
> >>> +
> >>> + for (s = 0; s < segs_per_zone; s++) {
> >>> +         se = get_seg_entry(sbi, zone_segno + s);
> >>> +
> >>> +         /*
> >>> +          * Refer not cur_valid_map but ckpt_valid_map which reflects
> >>> +          * fsync data.
> >>> +          */
> >>> +         ASSERT(se->ckpt_valid_map);
> >>> +         for (b = 0; b < sbi->blocks_per_seg; b++)
> >>> +                 if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map))
> >>> +                         ret = b + (s << sbi->log_blocks_per_seg);
> >>
> >> Minor thing, I guess we only need to find last valid block in target zone?
> >>
> >> int s, b;
> >>
> >> for (s = segs_per_zone - 1; s >= 0; s--) {
> >>    for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> >>            if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) {
> >>                    ret = b + (s << sbi->log_blocks_per_seg);
> >>                    break;
> >>            }
> >> }
> > 
> > Yes, reveresed search is the better. Will modify the code as suggested.
> > 
> >>
> >>> + }
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> >>> +{
> >>> + struct curseg_info *curseg = CURSEG_I(sbi, type);
> >>> + struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> >>> + struct blk_zone blkz;
> >>> + block_t cs_block, wp_block, zone_last_vblock;
> >>> + u_int64_t cs_sector, wp_sector;
> >>> + int i, ret;
> >>> + unsigned int zone_segno;
> >>> + int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >>> +
> >>> + /* get the device the curseg points to */
> >>> + cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
> >>> + for (i = 0; i < MAX_DEVICES; i++) {
> >>> +         if (!c.devices[i].path)
> >>> +                 break;
> >>> +         if (c.devices[i].start_blkaddr <= cs_block &&
> >>> +             cs_block <= c.devices[i].end_blkaddr)
> >>> +                 break;
> >>> + }
> >>> +
> >>> + if (i >= MAX_DEVICES)
> >>> +         return -EINVAL;
> >>> +
> >>> + /* get write pointer position of the zone the curseg points to */
> >>> + cs_sector = (cs_block - c.devices[i].start_blkaddr)
> >>> +         << log_sectors_per_block;
> >>> + ret = f2fs_report_zone(i, cs_sector, &blkz);
> >>> + if (ret)
> >>> +         return ret;
> >>> +
> >>> + if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
> >>> +         return 0;
> >>> +
> >>> + /* check consistency between the curseg and the write pointer */
> >>> + wp_block = c.devices[i].start_blkaddr +
> >>> +         (blk_zone_wp_sector(&blkz) >> log_sectors_per_block);
> >>> + wp_sector = blk_zone_wp_sector(&blkz);
> >>> +
> >>> + if (cs_sector == wp_sector)
> >>> +         return 0;
> >>> +
> >>> + if (cs_sector > wp_sector) {
> >>> +         MSG(0, "Inconsistent write pointer with curseg %d: "
> >>> +             "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n",
> >>> +             type, type, curseg->segno, curseg->next_blkoff,
> >>> +             GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> >>> +         fsck->chk.wp_inconsistent_zones++;
> >>> +         return -EINVAL;
> >>> + }
> >>> +
> >>> + MSG(0, "Write pointer goes advance from curseg %d: "
> >>> +     "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
> >>> +     type, type, curseg->segno, curseg->next_blkoff,
> >>> +     GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> >>> +
> >>> + zone_segno = GET_SEG_FROM_SEC(sbi,
> >>> +                               GET_SEC_FROM_SEG(sbi, curseg->segno));
> >>> + zone_last_vblock = START_BLOCK(sbi, zone_segno) +
> >>> +         last_vblk_off_in_zone(sbi, zone_segno);
> >>> +
> >>> + /*
> >>> +  * If fsync data exists between the curseg and the last valid block,
> >>> +  * it is not an error to fix. Leave it for kernel to recover later.
> >>> +  */
> >>> + if (cs_block <= zone_last_vblock) {
> >>
> >> According to above comments, should condition be?
> >>
> >> if (cs_block < zone_last_vblock && zone_last_vblock <= wp_block)
> >>
> > 
> > To be precise, cs_block points to curseg->next_blkoff, which is the block
> > curseg will write in the next write request. Then, if cs_block equals to
> > zone_last_vblock, it means that the block curseg->next_blkoff points to
> > already have valid block and fsync data. Then, comparator between cs_block
> > and zone_last_vblock should be "<=".
> 
> You're right.
> 
> > 
> > I agree that it is the better to check zone_last_vblock with wp_block.
> > wp_block corresponds to the write pointer position that next write will be
> > made. It wp_block equals to zone_last_vblock, it means that unexpected data
> > is written beyond the write pointer. Then, comparator should be "<" between
> > zone_last_vblock and wp_block.
> 
> Oh, so wp has almost the same meaning to .next_blkoff in f2fs, it points to 
> next
> free block/sector. I will keep that in mind.

Right, wp and .next_blkoff are really similar, or same :)

> 
> > 
> > In short, I suggest the condition check below as the good one.
> > 
> > if (cs_block <= zone_last_vblock && zone_last_vblock < wp_block)
> 
> It's fine to me. :)

Ok, will reflect in the next patch post.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to