>>>>>>>>>>>>>>>> When selecting a victim using next_victim_seg in a large >>>>>>>>>>>>>>>> section, the selected section might already have been >>>>>>>>>>>>>>>> cleared and designated as the new current section, >>>>>>>>>>>>>>>> making it actively in >>>>>>>>> use. >>>>>>>>>>>>>>>> This behavior causes inconsistency between the SIT and SSA. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, does this fix your issue? >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is an issue that arises when dividing a large section >>>>>>>>>>>>>> into segments for garbage collection. >>>>>>>>>>>>>> caused by the background GC (garbage collection) thread in >>>>>>>>>>>>>> large section >>>>>>>>>>>>>> f2fs_gc(victim_section) -> >>>>>>>>>>>>>> f2fs_clear_prefree_segments(victim_section)-> >>>>>>>>>>>>>> cursec(victim_section) -> f2fs_gc(victim_section by >>>>>>>>>>>>>> next_victim_seg) >>>>>>>>>>>>> >>>>>>>>>>>>> I didn't get it, why f2fs_get_victim() will return section >>>>>>>>>>>>> which is used by curseg? It should be avoided by checking >>>>>>>>>>>>> w/ >>>>>>> sec_usage_check(). >>>>>>>>>>>>> >>>>>>>>>>>>> Or we missed to check gcing section which next_victim_seg >>>>>>>>>>>>> points to during get_new_segment()? >>>>>>>>>>>>> >>>>>>>>>>>>> Can this happen? >>>>>>>>>>>>> >>>>>>>>>>>>> e.g. >>>>>>>>>>>>> - bggc selects sec #0 >>>>>>>>>>>>> - next_victim_seg: seg #0 >>>>>>>>>>>>> - migrate seg #0 and stop >>>>>>>>>>>>> - next_victim_seg: seg #1 >>>>>>>>>>>>> - checkpoint, set sec #0 free if sec #0 has no valid blocks >>>>>>>>>>>>> - allocate seg #0 in sec #0 for curseg >>>>>>>>>>>>> - curseg moves to seg #1 after allocation >>>>>>>>>>>>> - bggc tries to migrate seg #1 >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>> That's correct >>>>>>>>>>>> In f2fs_get_victim, we use next_victim_seg to directly jump >>>>>>>>>>>> to got_result, thereby bypassing sec_usage_check What do you >>>>>>>>>>>> think about this change? >>>>>>>>>>>> >>>>>>>>>>>> @@ -850,15 +850,20 @@ int f2fs_get_victim(struct >>>>>>>>>>>> f2fs_sb_info *sbi, >>>>>>>>>>> unsigned int *result, >>>>>>>>>>>> p.min_segno = >>>>>>>>>>>> sbi->next_victim_seg[BG_GC]; >>>>>>>>>>>> *result = p.min_segno; >>>>>>>>>>>> sbi->next_victim_seg[BG_GC] = >>>>>>>>>>>> NULL_SEGNO; >>>>>>>>>>>> - goto got_result; >>>>>>>>>>>> } >>>>>>>>>>>> if (gc_type == FG_GC && >>>>>>>>>>>> >>>>>>>>>>>> sbi->next_victim_seg[FG_GC] != NULL_SEGNO) >>>>>>> { >>>>>>>>>>>> p.min_segno = >>>>>>>>>>>> sbi->next_victim_seg[FG_GC]; >>>>>>>>>>>> *result = p.min_segno; >>>>>>>>>>>> sbi->next_victim_seg[FG_GC] = >>>>>>>>>>>> NULL_SEGNO; >>>>>>>>>>>> - goto got_result; >>>>>>>>>>>> } >>>>>>>>>>>> + >>>>>>>>>>>> + secno = GET_SEC_FROM_SEG(sbi, segno); >>>>>>>>>>>> + >>>>>>>>>>>> + if (sec_usage_check(sbi, secno)) >>>>>>>>>>>> + goto next; >>>>>>>>>>>> + >>>>>>>>>>>> + goto got_result; >>>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> But still allocator can assign this segment after >>>>>>>>>>> sec_usage_check() in race condition, right? >>>>>>>>>> Since the BG GC using next_victim takes place after the SIT >>>>>>>>>> update in do_checkpoint, it seems unlikely that a race >>>>>>>>>> condition with >>>>>>>>> sec_usage_check will occur. >>>>>>>>> >>>>>>>>> I mean this: >>>>>>>>> >>>>>>>>> - gc_thread >>>>>>>>> - f2fs_gc >>>>>>>>> - f2fs_get_victim >>>>>>>>> - sec_usage_check --- segno #1 is not used in any cursegs >>>>>>>>> - f2fs_allocate_data_block >>>>>>>>> - new_curseg >>>>>>>>> - get_new_segment find segno >>>>>>>>> #1 >>>>>>>>> >>>>>>>>> - do_garbage_collect >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>> >>>>>>>> do_checkpoint sec0 >>>>>>>> free >>>>>>>> If sec0 is not freed, >>>>>>>> then >>>>>>> segno1 within sec0 cannot be >>>>>>>> allocated >>>>>>>> - gc_thread >>>>>>>> - f2fs_gc >>>>>>>> - f2fs_get_victim >>>>>>>> - sec_usage_check --- segno #1 is not used in any cursegs >>>>>>>> (but >>>>>>>> sec0 >>>>>>> is already used) >>>>>>>> - >f2fs_allocate_data_block >>>>>>>> - new_curseg >>>>>>>> - >>>>>>>> get_new_segment find >>>>>>> segno #1 >>>>>>>> >>>>>>>> - do_garbage_collect >>>>>>>> >>>>>>>> I appreciate your patch, it is under testing. >>>>>>>> but I'm wondering if there's a risk of a race condition in this >>>>>>>> situation >>>>>>> >>>>>>> Oh, yes, I may missed that get_new_segment can return a free >>>>>>> segment in partial used section. >>>>>>> >>>>>>> So what do you think of this? >>>>>>> - check CURSEG() in do_garbage_collect() and get_victim() >>>>>>> - reset next_victim_seg[] in get_new_segment() and >>>>>>> __set_test_and_free() during checkpoint. >>>>>>> >>>>>>> Thanks, >>>>>> >>>>>> How about using victim_secmap? >>>>>> gc_thread >>>>>> mutex_lock(&DIRTY_I(sbi)->seglist_lock); >>>>>> __set_test_and_free >>>>>> check cur section next_victim clear >>>>>> mutex_unlock(&dirty_i->seglist_lock); >>>>>> >>>>>> mutex_lock(&dirty->seglist_lock); >>>>>> f2fs_get_victim >>>>>> mutex_unlock(&dirty_i->seglist_lock); >>>>>> >>>>>> static inline void __set_test_and_free(struct f2fs_sb_info *sbi, >>>>>> if (next >= start_segno + usable_segs) { >>>>>> if (test_and_clear_bit(secno, >>>>>> free_i->free_secmap)) >>>>>> free_i->free_sections++; >>>>>> + >>>>>> + if (test_and_clear_bit(secno, dirty_i- >>>> victim_secmap)) >>>>>> + sbi->next_victim_seg[BG_GC] = >>>>>> + NULL_SEGNO; >>>>> >>>>> Can this happen? >>>>> >>>>> segs_per_sec=2 >>>>> >>>>> - seg#0 and seg#1 are all dirty >>>>> - all valid blocks are removed in seg#1 >>>>> - checkpoint -> seg#1 becomes free >>>>> - gc select this sec and next_victim_seg=seg#0 >>>>> - migrate seg#0, next_victim_seg=seg#1 >>>>> - allocator assigns seg#1 to curseg >>>>> - gc tries to migrate seg#1 >>> >>> I meant for above case, below change still can not catch it, right? >>> since next_victim_seg[] was assigned after checkpoint. >>> >>> + if (test_and_clear_bit(secno, dirty_i->victim_secmap)) >>> + sbi->next_victim_seg[BG_GC] = NULL_SEGNO; >>> >>> Thanks, >>> >> I understood what you said. >> It seems that without a checkpoint, >> we won't be able to allocate seg#1 in sec0 because when requesting a >> segment allocation, if it's not within the same section, it checks in >> a new secmap. >> please let me know, if there's anything I've missed > >Oh, I see, it always try to allocate segments sequentially in a large >section-size image, thanks for your explanation. :) > >static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) >{ ... > if (__is_large_section(sbi)) { >... > return curseg->segno; >... >} > >static int new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) >{ ... > segno = __get_next_segno(sbi, type); > ret = get_new_segment(sbi, &segno, new_sec, pinning); ... >} > >BTW, I guess it could be more efficient if we can reuse free segment in >dirty section for conventional device.a It might cause file fragmentation, so we should consider it > >> Thanks >> >> static int get_new_segment(struct f2fs_sb_info *sbi, >> unsigned int *newseg, bool new_sec, bool pinning) >> >> if (!new_sec && ((*newseg + 1) % SEGS_PER_SEC(sbi))) { >> segno = find_next_zero_bit(free_i->free_segmap, >> GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1); >> if (segno < GET_SEG_FROM_SEC(sbi, hint + 1)) >> goto got_it; >> } >> >> find_other_zone: >> secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), >> hint); >> >>>>> >>>>> Thanks, >>>> The detailed scenario >>>> segs_per_sec=2 >>>> - seg#0 and seg#1 are all dirty >>>> - all valid blocks are removed in seg#1 >>>> - gc select this sec and next_victim_seg=seg#0 >>>> - migrate seg#0, next_victim_seg=seg#1 >>>> - checkpoint -> sec(seg#0, seg#1) becomes free > >So, for this case, we can handle it only in checkpoint. > >>> + if (test_and_clear_bit(secno, dirty_i->victim_secmap)) >>> + sbi->next_victim_seg[BG_GC] = NULL_SEGNO; > >Do we need to handle sbi->next_victim_seg[FG_GC] as well? I'll add it and submit the patch. Thanks. > >Thanks, > >>>> - allocator assigns sec(seg#0, seg#1) to curseg >>>> - gc tries to migrate seg#1 >>>>> >>>>>> } >>>>>> } >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> IMO, we can clear next_victim_seg[] once section is free in >>>>>>>>>>> __set_test_and_free()? something like this: >>>>>>>>>> I will test it according to your suggestion. >>>>>>>>>> If there are no issues, can I submit it again with the patch? >>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> --- >>>>>>>>>>> fs/f2fs/segment.h | 13 ++++++++++--- >>>>>>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index >>>>>>>>>>> 0465dc00b349..826e37999085 100644 >>>>>>>>>>> --- a/fs/f2fs/segment.h >>>>>>>>>>> +++ b/fs/f2fs/segment.h >>>>>>>>>>> @@ -473,9 +473,16 @@ static inline void >>>>>>>>>>> __set_test_and_free(struct f2fs_sb_info *sbi, >>>>>>>>>>> goto skip_free; >>>>>>>>>>> next = find_next_bit(free_i->free_segmap, >>>>>>>>>>> start_segno + SEGS_PER_SEC(sbi), >>>>>>> start_segno); >>>>>>>>>>> - if (next >= start_segno + usable_segs) { >>>>>>>>>>> - if (test_and_clear_bit(secno, free_i- >>>>>> free_secmap)) >>>>>>>>>>> - free_i->free_sections++; >>>>>>>>>>> + if ((next >= start_segno + usable_segs) && >>>>>>>>>>> + test_and_clear_bit(secno, free_i->free_secmap)) >>>>> { >>>>>>>>>>> + free_i->free_sections++; >>>>>>>>>>> + >>>>>>>>>>> + if >>>>>>>>>>> (GET_SEC_FROM_SEG(sbi->next_victim_seg[BG_GC]) >>>>> == >>>>>>>>>>> + >>>>>>>>>>> secno) >>>>>>>>>>> + sbi->next_victim_seg[BG_GC] = >>>>>>>>>>> NULL_SEGNO; >>>>>>>>>>> + if >>>>>>>>>>> (GET_SEC_FROM_SEG(sbi->next_victim_seg[FG_GC]) >>>>> == >>>>>>>>>>> + >>>>>>>>>>> secno) >>>>>>>>>>> + sbi->next_victim_seg[FG_GC] = >>>>>>>>>>> NULL_SEGNO; >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> skip_free: >>>>>>>>>>> -- >>>>>>>>>>> 2.40.1 >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Because the call stack is different, I think that in order >>>>>>>>>>>>>> to handle everything at once, we need to address it within >>>>>>>>>>>>>> do_garbage_collect, or otherwise include it on both sides. >>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>> >>>>>>>>>>>>>> [30146.337471][ T1300] F2FS-fs (dm-54): Inconsistent >>>>>>>>>>>>>> segment >>>>>>>>>>>>>> (70961) type [0, 1] in SSA and SIT [30146.346151][ T1300] >>>>>>>>>>>>>> Call >>>>>>> trace: >>>>>>>>>>>>>> [30146.346152][ T1300] dump_backtrace+0xe8/0x10c >>>>>>>>>>>>>> [30146.346157][ T1300] show_stack+0x18/0x28 >>>>>>>>>>>>>> [30146.346158][ T1300] dump_stack_lvl+0x50/0x6c >>>>>>>>>>>>>> [30146.346161][ T1300] >>>>>>>>>>>>>> dump_stack+0x18/0x28 [30146.346162][ T1300] >>>>>>>>>>>>>> f2fs_stop_checkpoint+0x1c/0x3c [30146.346165][ T1300] >>>>>>>>>>>>>> do_garbage_collect+0x41c/0x271c [30146.346167][ T1300] >>>>>>>>>>>>>> f2fs_gc+0x27c/0x828 [30146.346168][ T1300] >>>>>>>>>>>>>> gc_thread_func+0x290/0x88c [30146.346169][ T1300] >>>>>>>>>>>>>> kthread+0x11c/0x164 [30146.346172][ T1300] >>>>>>>>>>>>>> ret_from_fork+0x10/0x20 >>>>>>>>>>>>>> >>>>>>>>>>>>>> struct curseg_info : 0xffffff803f95e800 { >>>>>>>>>>>>>> segno : 0x11531 : 70961 >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> struct f2fs_sb_info : 0xffffff8811d12000 { >>>>>>>>>>>>>> next_victim_seg[0] : 0x11531 : 70961 } >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://lore.kernel.org/linux-f2fs-devel/20250325080646.3 >>>>>>>>>>>>>>> 29 >>>>>>>>>>>>>>> 19 >>>>>>>>>>>>>>> 47 >>>>>>>>>>>>>>> -2 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> c...@kernel.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Signed-off-by: Yohan Joung <yohan.jo...@sk.com> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>> fs/f2fs/gc.c | 4 ++++ >>>>>>>>>>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index >>>>>>>>>>>>>>>> 2b8f9239bede..4b5d18e395eb 100644 >>>>>>>>>>>>>>>> --- a/fs/f2fs/gc.c >>>>>>>>>>>>>>>> +++ b/fs/f2fs/gc.c >>>>>>>>>>>>>>>> @@ -1926,6 +1926,10 @@ int f2fs_gc(struct f2fs_sb_info >>>>>>>>>>>>>>>> *sbi, struct >>>>>>>>>>>>>>> f2fs_gc_control *gc_control) >>>>>>>>>>>>>>>> goto stop; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> + if (__is_large_section(sbi) && >>>>>>>>>>>>>>>> + IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, >>> segno))) >>>>>>>>>>>>>>>> + goto stop; >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> seg_freed = do_garbage_collect(sbi, segno, >>>>>>>>>>>>>>>> &gc_list, >>>>> gc_type, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> gc_control->should_migrate_blocks, >>>>>>>>>>>>>>>> gc_control->one_time); >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> >>>> >> >>
_______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel