On Sep 10, 2019 / 17:12, Chao Yu wrote:
> On 2019/9/10 16:10, Shinichiro Kawasaki wrote:
> > On Sep 09, 2019 / 15:14, Chao Yu wrote:
> >> On 2019/9/6 16:31, Shinichiro Kawasaki wrote:
> >>> On Sep 05, 2019 / 17:58, Chao Yu wrote:
> >>>> Hi Shinichiro,
> >>>>
> >>>> Sorry for the delay.
> >>>>
> >>>> On 2019/9/3 16:37, Shinichiro Kawasaki wrote:
> >>>>> On Sep 02, 2019 / 15:02, Chao Yu wrote:
> >>>>>> On 2019/8/30 18:19, Shin'ichiro Kawasaki wrote:
> >>>>>>> On sudden f2fs shutdown, zoned block device status and f2fs current
> >>>>>>> segment positions in meta data can be inconsistent. When f2fs shutdown
> >>>>>>> happens before write operations completes, write pointers of zoned 
> >>>>>>> block
> >>>>>>> devices can go further but f2fs meta data keeps current segments at
> >>>>>>> positions before the write operations. After remounting the f2fs, the
> >>>>>>> inconsistency causes write operations not at write pointers and
> >>>>>>> "Unaligned write command" error is reported. This error was observed 
> >>>>>>> when
> >>>>>>> xfstests test case generic/388 was run with f2fs on a zoned block 
> >>>>>>> device.
> >>>>>>>
> >>>>>>> To avoid the error, have f2fs.fsck check consistency between each 
> >>>>>>> current
> >>>>>>> segment's position and the write pointer of the zone the current 
> >>>>>>> segment
> >>>>>>> points to. If the write pointer goes advance from the current segment,
> >>>>>>> fix the current segment position setting at same as the write pointer
> >>>>>>> position. If the write pointer goes to the zone end, find a new zone 
> >>>>>>> and
> >>>>>>> set the current segment position at the new zone start. In case the 
> >>>>>>> write
> >>>>>>> pointer is behind the current segment, write zero data at the write
> >>>>>>> pointer position to make write pointer position at same as the current
> >>>>>>> segment.
> >>>>>>>
> >>>>>>> When inconsistencies are found, turn on c.bug_on flag in 
> >>>>>>> fsck_verify() to
> >>>>>>> ask users to fix them or not. When inconsistencies get fixed, turn on
> >>>>>>> 'force' flag in fsck_verify() to enforce fixes in following checks. 
> >>>>>>> This
> >>>>>>> position fix is done at the beginning of do_fsck() function so that 
> >>>>>>> other
> >>>>>>> checks reflect the current segment modification.
> >>>>>>>
> >>>>>>> Also add GET_SEC_FROM_SEG and GET_SEG_FROM_SEC macros in fsck/fsck.h 
> >>>>>>> to
> >>>>>>> simplify the code.
> >>>>>>>
> >>>>>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawas...@wdc.com>
> >>>>>>> ---
> >>>>>>>  fsck/f2fs.h |   5 ++
> >>>>>>>  fsck/fsck.c | 198 
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>  fsck/fsck.h |   3 +
> >>>>>>>  fsck/main.c |   2 +
> >>>>>>>  4 files changed, 208 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/fsck/f2fs.h b/fsck/f2fs.h
> >>>>>>> index 4dc6698..2c1c2b3 100644
> >>>>>>> --- a/fsck/f2fs.h
> >>>>>>> +++ b/fsck/f2fs.h
> >>>>>>> @@ -337,6 +337,11 @@ static inline block_t __end_block_addr(struct 
> >>>>>>> f2fs_sb_info *sbi)
> >>>>>>>  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)                          
> >>>>>>> \
> >>>>>>>       (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 
> >>>>>>> 1))
> >>>>>>>  
> >>>>>>> +#define GET_SEC_FROM_SEG(sbi, segno)                                 
> >>>>>>> \
> >>>>>>> +     ((segno) / (sbi)->segs_per_sec)
> >>>>>>> +#define GET_SEG_FROM_SEC(sbi, secno)                                 
> >>>>>>> \
> >>>>>>> +     ((secno) * (sbi)->segs_per_sec)
> >>>>>>> +
> >>>>>>>  #define FREE_I_START_SEGNO(sbi)                                      
> >>>>>>>         \
> >>>>>>>       GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
> >>>>>>>  #define GET_R2L_SEGNO(sbi, segno)    (segno + 
> >>>>>>> FREE_I_START_SEGNO(sbi))
> >>>>>>> diff --git a/fsck/fsck.c b/fsck/fsck.c
> >>>>>>> index 8953ca1..a0f6849 100644
> >>>>>>> --- a/fsck/fsck.c
> >>>>>>> +++ b/fsck/fsck.c
> >>>>>>> @@ -2574,6 +2574,190 @@ out:
> >>>>>>>       return cnt;
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> +/*
> >>>>>>> + * Search a free section in main area. Start search from the section 
> >>>>>>> specified
> >>>>>>> + * with segno argument toward main area end. Return first segment of 
> >>>>>>> the found
> >>>>>>> + * section in segno argument.
> >>>>>>> + */
> >>>>>>> +static int find_next_free_section(struct f2fs_sb_info *sbi,
> >>>>>>> +                               unsigned int *segno)
> >>>>>>> +{
> >>>>>>> +     unsigned int i, sec, section_valid_blocks;
> >>>>>>> +     unsigned int end_segno = GET_SEGNO(sbi, SM_I(sbi)->main_blkaddr)
> >>>>>>> +             + SM_I(sbi)->main_segments;
> >>>>>>> +     unsigned int end_sec = GET_SEC_FROM_SEG(sbi, end_segno);
> >>>>>>> +     struct seg_entry *se;
> >>>>>>> +     struct curseg_info *cs;
> >>>>>>> +
> >>>>>>> +     for (sec = GET_SEC_FROM_SEG(sbi, *segno); sec < end_sec; sec++) 
> >>>>>>> {
> >>>>>>> +             /* find a section without valid blocks */
> >>>>>>> +             section_valid_blocks = 0;
> >>>>>>> +             for (i = 0; i < sbi->segs_per_sec; i++) {
> >>>>>>> +                     se = get_seg_entry(sbi, GET_SEG_FROM_SEC(sbi, 
> >>>>>>> sec) + i);
> >>>>>>> +                     section_valid_blocks += se->valid_blocks;
> >>>>>>
> >>>>>> If we want to find a 'real' free section, we'd better to use
> >>>>>> se->ckpt_valid_blocks (wrapped with get_seg_vblocks()) in where we has 
> >>>>>> recorded
> >>>>>> fsynced data count.
> >>>>>
> >>>>> Thanks. When I create next patch series, I will use get_seg_vblocks().
> >>>>> I will rebase to dev-test branch in which get_seg_vblocks() is 
> >>>>> available.
> >>>>>
> >>>>>>
> >>>>>>> +             }
> >>>>>>> +             if (section_valid_blocks)
> >>>>>>> +                     continue;
> >>>>>>> +
> >>>>>>> +             /* check the cursegs do not use the section */
> >>>>>>> +             for (i = 0; i < NO_CHECK_TYPE; i++) {
> >>>>>>> +                     cs = &SM_I(sbi)->curseg_array[i];
> >>>>>>> +                     if (GET_SEC_FROM_SEG(sbi, cs->segno) == sec)
> >>>>>>> +                             break;
> >>>>>>> +             }
> >>>>>>> +             if (i >= NR_CURSEG_TYPE) {
> >>>>>>> +                     *segno = GET_SEG_FROM_SEC(sbi, sec);
> >>>>>>> +                     return 0;
> >>>>>>> +             }
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>> +     return -1;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +struct write_pointer_check_data {
> >>>>>>> +     struct f2fs_sb_info *sbi;
> >>>>>>> +     struct device_info *dev;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +static int fsck_chk_write_pointer(int i, struct blk_zone *blkz, void 
> >>>>>>> *opaque)
> >>>>>>> +{
> >>>>>>> +     struct write_pointer_check_data *wpd = opaque;
> >>>>>>> +     struct f2fs_sb_info *sbi = wpd->sbi;
> >>>>>>> +     struct device_info *dev = wpd->dev;
> >>>>>>> +     struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> >>>>>>> +     block_t zone_block, wp_block, wp_blkoff, cs_block, b;
> >>>>>>> +     unsigned int zone_segno, wp_segno, new_segno;
> >>>>>>> +     struct seg_entry *se;
> >>>>>>> +     struct curseg_info *cs;
> >>>>>>> +     int cs_index, ret;
> >>>>>>> +     int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >>>>>>> +     unsigned int segs_per_zone = sbi->segs_per_sec * 
> >>>>>>> sbi->secs_per_zone;
> >>>>>>> +     void *zero_blk;
> >>>>>>> +
> >>>>>>> +     if (blk_zone_conv(blkz))
> >>>>>>> +             return 0;
> >>>>>>> +
> >>>>>>> +     zone_block = dev->start_blkaddr
> >>>>>>> +             + (blk_zone_sector(blkz) >> log_sectors_per_block);
> >>>>>>> +     zone_segno = GET_SEGNO(sbi, zone_block);
> >>>>>>> +     wp_block = dev->start_blkaddr
> >>>>>>> +             + (blk_zone_wp_sector(blkz) >> log_sectors_per_block);
> >>>>>>> +     wp_segno = GET_SEGNO(sbi, wp_block);
> >>>>>>> +     wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> >>>>>>> +
> >>>>>>> +     /* find the curseg which points to the zone */
> >>>>>>> +     for (cs_index = 0; cs_index < NO_CHECK_TYPE; cs_index++) {
> >>>>>>> +             cs = &SM_I(sbi)->curseg_array[cs_index];
> >>>>>>> +             if (zone_segno <= cs->segno &&
> >>>>>>> +                 cs->segno < zone_segno + segs_per_zone)
> >>>>>>> +                     break;
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>> +     if (cs_index >= NR_CURSEG_TYPE)
> >>>>>>> +             return 0;
> >>>>>>> +
> >>>>>>> +     /* check write pointer consistency with the curseg in the zone 
> >>>>>>> */
> >>>>>>> +     cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
> >>>>>>> +     if (wp_block == cs_block)
> >>>>>>> +             return 0;
> >>>>>>> +
> >>>>>>> +     if (!c.fix_on) {
> >>>>>>> +             MSG(0, "Inconsistent write pointer: "
> >>>>>>> +                 "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
> >>>>>>> +                 cs_index, cs->segno, cs->next_blkoff, wp_segno, 
> >>>>>>> wp_blkoff);
> >>>>>>> +             fsck->chk.wp_inconsistent_zones++;
> >>>>>>> +             return 0;
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>> +     /*
> >>>>>>> +      * If the curseg is in advance from the write pointer, write 
> >>>>>>> zero to
> >>>>>>> +      * move the write pointer forward to the same position as the 
> >>>>>>> curseg.
> >>>>>>> +      */
> >>>>>>> +     if (wp_block < cs_block) {
> >>>>>>> +             ret = 0;
> >>>>>>> +             zero_blk = calloc(BLOCK_SZ, 1);
> >>>>>>> +             if (!zero_blk)
> >>>>>>> +                     return -EINVAL;
> >>>>>>> +
> >>>>>>> +             FIX_MSG("Advance write pointer to match with curseg %d: 
> >>>>>>> "
> >>>>>>> +                     "[0x%x,0x%x]->[0x%x,0x%x]",
> >>>>>>> +                     cs_index, wp_segno, wp_blkoff,
> >>>>>>> +                     cs->segno, cs->next_blkoff);
> >>>>>>> +             for (b = wp_block; b < cs_block && !ret; b++)
> >>>>>>> +                     ret = dev_write_block(zero_blk, b);
> >>>>>>> +
> >>>>>>> +             fsck->chk.wp_fixed_zones++;
> >>>>>>> +             free(zero_blk);
> >>>>>>> +             return ret;
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>> +     if (wp_segno == zone_segno + segs_per_zone) {
> >>>>>>> +             /*
> >>>>>>> +              * If the write pointer is in advance from the curseg 
> >>>>>>> and at
> >>>>>>> +              * the zone end (section end), search a new free zone 
> >>>>>>> (section)
> >>>>>>> +              * between the curseg and main area end.
> >>>>>>> +              */
> >>>>>>> +             new_segno = wp_segno;
> >>>>>>> +             ret = find_next_free_section(sbi, &new_segno);
> >>>>>>> +             if (ret) {
> >>>>>>> +                     /* search again from main area start */
> >>>>>>> +                     new_segno = GET_SEGNO(sbi, 
> >>>>>>> SM_I(sbi)->main_blkaddr);
> >>>>>>> +                     ret = find_next_free_section(sbi, &new_segno);
> >>>>>>> +             }
> >>>>>>
> >>>>>> If curseg's type is warm node, relocating curseg would ruin warm node 
> >>>>>> chain,
> >>>>>> result in losing fsynced data for ever as well.
> >>>>>>
> >>>>>> So I guess we should migrate all dnode blocks chained with 
> >>>>>> cs->next_blkoff in
> >>>>>> current section.
> >>>>>>
> >>>>>>> +             if (ret) {
> >>>>>>> +                     MSG(0, "Free section not found\n");
> >>>>>>> +                     return ret;
> >>>>>>> +             }
> >>>>>>> +             FIX_MSG("New section for curseg %d: 
> >>>>>>> [0x%x,0x%x]->[0x%x,0x%x]",
> >>>>>>> +                     cs_index, cs->segno, cs->next_blkoff, 
> >>>>>>> new_segno, 0);
> >>>>>>> +             cs->segno = new_segno;
> >>>>>>> +             cs->next_blkoff = 0;
> >>>>>>> +     } else {
> >>>>>>> +             /*
> >>>>>>> +              * If the write pointer is in advance from the curseg 
> >>>>>>> within
> >>>>>>> +              * the zone, modify the curseg position to be same as 
> >>>>>>> the
> >>>>>>> +              * write pointer.
> >>>>>>> +              */
> >>>>>>> +             ASSERT(wp_segno < zone_segno + segs_per_zone);
> >>>>>>> +             FIX_MSG("Advance curseg %d: [0x%x,0x%x]->[0x%x,0x%x]",
> >>>>>>> +                     cs_index, cs->segno, cs->next_blkoff,
> >>>>>>> +                     wp_segno, wp_blkoff);
> >>>>>>> +             cs->segno = wp_segno;
> >>>>>>> +             cs->next_blkoff = wp_blkoff;
> >>>>>>
> >>>>>> The same data lose problem here, I guess we'd better handle it with 
> >>>>>> the same way
> >>>>>> as above.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>
> >>>>> I created f2fs status with fsync data and warm node chain, and 
> >>>>> confirmed the v4
> >>>>> implementation ruins the dnode blocks chain. Hmm.
> >>>>>
> >>>>> My understanding is that when f2fs kernel module recovers the fsync 
> >>>>> data, it
> >>>>> sets the warm node curseg position at the start of the fsync data, and 
> >>>>> the fsync
> >>>>> data will be overwritten as new nodes created. Is this understanding 
> >>>>> correct?
> >>>>
> >>>> Sorry, I'm not sure I've understood you correctly.
> >>>
> >>> Apology is mine, my question was not clear enough.
> >>> And thanks for the explanation below. It helps me to understand better.
> >>>
> >>>> Current recovery flow is:
> >>>> 1)   find all valid fsynced dnode in warm node chain
> >>>> 2.a) recover fsynced dnode in memory, and set it dirty
> >>>> 2.b) recover directory entry in memory, and set it dirty
> >>>> 2.c) during regular's dnode recovery, physical blocks indexed by 
> >>>> recovered dnode
> >>>> will be revalidated
> >>>> Note: we didn't move any cursegs before 3)
> >>>> 3)   relocate all cursegs to new segments
> >>>> 4)   trigger checkpoint to persist all recovered data(fs' meta, file's 
> >>>> meta/data)
> >>>
> >>> Question, does the step 3) correspond to f2fs_allocate_new_segments(sbi) 
> >>> call
> >>> at the end of recover_data()? The f2fs_allocate_new_segments() function
> >>
> >> Yeah, I meant that function.
> >>
> >>> relocates new segments only for DATA cursegs, and it keeps NODE cursegs 
> >>> same as
> >>> before the fsync data recovery. Then I thought WARM NODE curseg would not 
> >>> change
> >>> by recovery (and still keeps pointing to the fsync recovery data).
> >>
> >> Yes, that's correct. WARM NODE curseg won't change until step 4).
> > 
> > Thanks. Following your idea "we can simply adjust to reset all invalid zone 
> > and
> > allocate new zone for curseg before data/meta writes" for fix by kernel, I 
> > think
> > kernel code change is required to allocate new zones for NODE cursegs also 
> > at
> > step 3). WARM NODE curseg should be kept untouched by step 2 completion to 
> > refer
> > fsynced dnodes at WARM NODE curseg's next_blkaddr. And at step 4, the fsyced
> > dnodes recovered and set dirty will be written back with one of NODE cursegs
> > (HOT NODE curseg?). At that time, we need to make sure the NODE curseg 
> > points to
> 
> Directory's dnode goes to hot node area, other file's dnode goes to warm node
> area, the left node goes to cold node area.
> 
> > the position consistent with its zone's write pointer.
> 
> Yes, before step 4), we should keep f2fs and zoned block device's write 
> pointer
> being consistent.

Ok, thanks.

> 
> > 
> >>>
> >>>>>
> >>>>> If this is the case, I think write pointer inconsistency will happen 
> >>>>> even if
> >>>>> fsck does "migrate all dnode blocks" (I assume that the warm node 
> >>>>> curseg's next
> >>>>
> >>>> Actually, I notice that scheme's problem is: we've changed zone's write 
> >>>> pointer
> >>>> during dnode blocks migration, however if kernel drops recovery, we will 
> >>>> still
> >>>> face inconsistent status in between .next_blkaddr and .write_pointer, 
> >>>> once we
> >>>> start to write data from .next_blkaddr. So in fsck, after migration, we 
> >>>> need to
> >>>> reset .write_pointer to .next_blkaddr.
> >>>>
> >>>> I guess we need to consider four cases:
> >>>>
> >>>> o: support .write_pointer recovery
> >>>> x: not support .write_pointer recovery
> >>>>
> >>>> 1) kernel: o, fsck: x, trigger recovery in kernel
> >>>> 2) kernel: o, fsck: x, not trigger recovery in kernel
> >>>> 3) kernel: x, fsck: o, trigger recovery in kernel
> >>>> 4) kernel: x, fsck: o, not trigger recovery in kernel
> >>>>
> >>>> For 1) and 2), we can simply adjust to reset all invalid zone and 
> >>>> allocate new
> >>>> zone for curseg before data/meta writes.
> >>>
> >>> Thanks for the clarification. This approach for case 1) and 2) is simple. 
> >>> Let me
> >>> try to create a patch for it.
> >>>
> >>>>
> >>>> For 3) and 4), I haven't figured out a way handling all cases perfectly.
> >>>
> >>> For 3), I suppose fsck cannot fix write pointer inconsistency without 
> >>> fsync data
> >>> loss, since recovery is judged and done by kernel. The write pointer 
> >>> consistency
> >>> fix after recovery can be done only by kernel.
> >>>
> >>> It is not a good one but one idea is to confirm fsck users to enforce 
> >>> fsync data
> >>> loss or not. This could be better than nothing.
> >>>
> >>> For 4), my understanding is that fsync data is not left in the file 
> >>> system. I
> >>> think fsck can check fsync data existence and fix write pointer 
> >>> consistency, as
> >>> my patch series shows.
> >>
> >> Yeah.
> >>
> >> Let's think about that whether there is a way to cover all cases.
> >>
> >> 1) For non-opened zones, we need to adjust all such zones' write pointer to
> >> zero. I assume that once write pointer is zero, we still can access valid 
> >> block
> >> in zone. (recovery may need to revalidate blocks in free segments)
> > 
> > When write pointer is reset to zero, all data in the zone gets lost. When we
> > read data in the zone beyond the write pointer, just zero data is read. 
> > When any
> > valid block or fsync data is left in a non-opened zone, I think the zone's 
> > write
> > pointer should be left as is. Otherwise, if the zone do not have valid 
> > block and
> > fsync data, the zone should be reset to avoid unaligned write error.
> 
> Okay, if data beyond write pointer is invalid, we should keep write pointer as
> it is if there are fsynced data in that zone.
> 
> > 
> > One additional check I can think of is to check the last valid block in the 
> > zone
> > and write pointer position of the zone. If .write_pointer >= 
> > .last_valid_block,
> > , it is ok. If .write_pointer < .last_valid_block, this is a bug of f2fs. In
> 
> Sounds reasonable, how can we find last valid block, as you said, content of
> block beyond write pointer is all zero... or you mean curseg's next_blkaddr?
> like the condition 2.c) described?

I think we can get each zone's last valid block referring each segment's valid
block bitmap in SIT. In other words, this is a consistency check between write
pointer and SIT. Is this feasible approach?

> 
> > this case, the data in the valid blocks beyond write pointer is just lost, 
> > and
> > there is no way to recover this. I think this zone will not be selected for
> > cursegs for further data write in the zone until the zone get discarded. No
> > need to fix write pointer position to avoid unaligned write errors. I wonder
> 
> Yes,
> 
> > if fsck or kernel should detect and report this case, because users still 
> > use
> > the f2fs partition without any action. May I ask your opinion?
> 
> If we can detect that, I think it should be reported.

I see, thanks for the comment.

> 
> > 
> >>
> >> 2) For opened zones (cursegs point to)
> >> 2.a) .write_pointer == .next_blkaddr, no need to handle
> >> 2.b) .write_pointer > .next_blkaddr, that should be the common state in 
> >> sudden
> >> power-off case. It needs to reset .write_pointer to .next_blkaddr.

Sorry but let me amend. I have just noticed that the fix above is not possible.
We cannot set .write_pointer at .next_blkaddr, because write pointers cannot be
reset to desired position. It only can be reset to zero (at the zone start).
Instead of resetting .write_pointer, how about to fix by allocating a new zone 
to
the curseg for 2.b) in same manner as 2.c)?

> > 
> > Agreed. To be more precise, if fsync data is recorded after .next_blkaddr, 
> > we
> > need to allocate a new zone to the curseg after fsync data recovery.
> > 
> >> 2.c) .write_pointer < .next_blkaddr, should be a bug of f2fs, we should 
> >> have
> >> lost datas in the range of [.write_pointer, .next_blkaddr] at least, if 
> >> we're
> >> sure about that there is no valid data existed out side of .write_pointer? 
> >> can we?
> > 
> > As you note, this is a bug and we lost the data in the range of 
> > [.write_pointer,
> > .next_blkaddr]. We should leave the write pointer as is to keep the data in 
> > the
> > zone, and allocate a new zone to the curseg to avoid unaligned write errors.
> 
> Okay.
> 
> > 
> >>
> >> 3) f2fs_allocate_new_segments() may reset data type cursegs' position, I'd 
> >> like
> >> to know what's the value of -z option we configured, if we configured as 
> >> one
> >> zone contains one section, no matter kernel triggers recovery or not (via 
> >> set
> >> disable_roll_forward option), the write pointer consistency should has 
> >> been kept
> >> in all cases. Otherwise (one zone contains multiple sections), if kernel 
> >> doesn't
> >> support .write_pointer recovery, f2fs_allocate_new_segments() can make
> >> .next_blkaddr larger than .write_pointer in the same zone.
> > 
> > "Single section per zone" is a key design of the f2fs zoned block device
> > support. For zoned block devices, mkfs.f2fs requires -m option. When -m
> > options is set, mkfs sets '1' to c.secs_per_zone regardless of the -z option
> > (refer f2fs_get_device_info() in lib/libf2fs.c). With this prerequisite,
> 
> Oh, correct, thanks for noticing that. I did miss that we only use -m option 
> for
> zoned block device.
> 
> > I think new zones are allocated for DATA cursegs in
> > f2fs_allocate_new_segment() and its sub-function call path.
> > 
> >>
> >> Let me know, if I'm missing sth.
> > 
> > Assuming fsync data is the only one exceptional data which is read beyond 
> > the
> > cursegs' next_blkoff, this discussion looks comprehensive for me:
> > 
> >  - with and without kernel write pointer fix
> >  - with and without fsync data recovery
> >  - open/non-open zones (zones cursegs point to, zones cursegs do not point 
> > to)
> > 
> > All fs meta data in front of main segments are stored in conventional zones
> > which do not have write pointers (refer f2fs_prepare_super_block() in mkfs/
> > f2fs_format.c). Just we need to care about write pointer consistency for the
> > main data area.
> 
> Agreed,

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to