On Aug 27, 2019 / 10:25, Chao Yu wrote:
> On 2019/8/21 12:48, Shin'ichiro Kawasaki wrote:
> > When sudden f2fs shutdown happens on zoned block devices, write
> > pointers can be inconsistent with valid blocks counts in meta data.
> > The failure scenario is as follows:
> > 
> > - Just before a sudden shutdown, a new segment in a new zone is selected
> >   for a current segment. Write commands were executed to the segment.
> >   and the zone has a write pointer not at zone start.
> > - Before the write commands complete, shutdown happens. Meta data is
> >   not updated and still keeps zero valid blocks count for the zone.
> > - After next mount of the file system, the zone is selected for the next
> >   write target because it has zero valid blocks count. However, it has
> >   the write pointer not at zone start. Then "Unaligned write command"
> >   error happens.
> > 
> > To avoid this potential error path, reset write pointers if the zone
> > does not have a current segment, the write pointer is not at the zone
> > start and the zone has no valid blocks.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawas...@wdc.com>
> > ---
> >  fsck/fsck.c | 30 +++++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fsck/fsck.c b/fsck/fsck.c
> > index 21a06ac..cc9bbc0 100644
> > --- a/fsck/fsck.c
> > +++ b/fsck/fsck.c
> > @@ -2595,6 +2595,7 @@ static int fsck_chk_write_pointer(int i, struct 
> > blk_zone *blkz, void *opaque)
> >     int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >     unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> >     void *zero_blk;
> > +   block_t zone_valid_blocks = 0;
> >  
> >     if (blk_zone_conv(blkz))
> >             return 0;
> > @@ -2615,8 +2616,35 @@ static int fsck_chk_write_pointer(int i, struct 
> > blk_zone *blkz, void *opaque)
> >                     break;
> >     }
> >  
> > -   if (cs_index >= NR_CURSEG_TYPE)
> > +   if (cs_index >= NR_CURSEG_TYPE) {
> > +           for (b = zone_block; b < zone_block + c.zone_blocks &&
> > +                        IS_VALID_BLK_ADDR(sbi, b); b += c.blks_per_seg) {
> > +                   se = get_seg_entry(sbi, GET_SEGNO(sbi, b));
> > +                   zone_valid_blocks += se->valid_blocks;
> > +           }
> > +           if (wp_block == zone_block || zone_valid_blocks)
> > +                   return 0;
> > +
> > +           /*
> > +            * The write pointer is not at zone start but there is no valid
> > +            * block in the zone. Segments in the zone can be selected for
> > +            * next write. Need to reset the write pointer to avoid
> > +            * unaligned write command error.
> 
> In SPOR (sudden power-off recovery) of kernel side, we may revalidate blocks
> belong to fsynced file in such zone within range of [0, write pointer], if we
> just reset zone, will we lose those data for ever?

Yes. This patch resets zone and the data will be lost. I walked through
fs/f2fs/recovery.c and learned that nodes with fsync mark are recovered at
remount. Such fsync recovery cannot be done after zone reset. To avoid the
data loss, I would like to drop this fourth patch at this moment.

Later on, I will consider safer approach not to reset the zone, but to set next
write target block at the write pointer. I guess this approach will need kernel
side patch to change block selection logic.

> 
> BTW, how you think enabling f2fs kernel module to recover incorrect write
> pointer of zone? Once f2fs-tools doesn't upgrade, however kernel does...

Current f2fs allows to mount zoned block devices even when they have
inconsistency with f2fs meta data. This is not good. Then I believe kernel side
needs the feature to check write pointer inconsistency at mount time and fix it.

As you indicate, fix by kernel is more handy than notice to run fsck, especially
when users do not have latest f2fs-tools. Still fix by fsck is needed when users
use the kernel without the fix feature. I think both approaches are required:
fix by kernel and fix by fsck.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to