>-----Original Message----- >From: Chao Yu <c...@kernel.org> >Sent: Tuesday, April 1, 2025 2:42 PM >To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; linux-f2fs- >de...@lists.sourceforge.net >Cc: c...@kernel.org; jaeg...@kernel.org; jyh...@gmail.com; linux- >ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE <pilhyun....@sk.com> >Subject: [External Mail] Re: [External Mail] Re: [f2fs-dev] [External Mail] >Re: [External Mail] Re: [PATCH] f2fs: prevent the current section from >being selected as a victim during garbage collection > >On 4/1/25 09:51, yohan.joung wrote: >>> From: Chao Yu <c...@kernel.org> >>> Sent: Monday, March 31, 2025 8:36 PM >>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; linux-f2fs- >>> de...@lists.sourceforge.net >>> Cc: c...@kernel.org; jaeg...@kernel.org; jyh...@gmail.com; linux- >>> ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE >>> <pilhyun....@sk.com> >>> Subject: [External Mail] Re: [External Mail] Re: [f2fs-dev] [External >>> Mail] >>> Re: [External Mail] Re: [PATCH] f2fs: prevent the current section >>> from being selected as a victim during garbage collection >>> >>> On 3/31/25 13:13, yohan.joung wrote: >>>>> On 2025/3/28 15:25, yohan.joung wrote: >>>>>>> On 2025/3/28 11:40, yohan.joung wrote: >>>>>>>>> From: Chao Yu <c...@kernel.org> >>>>>>>>> Sent: Thursday, March 27, 2025 10:48 PM >>>>>>>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; Yohan >>>>>>>>> Joung <jyh...@gmail.com>; jaeg...@kernel.org; daeh...@gmail.com >>>>>>>>> Cc: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; >>>>>>>>> linux- ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE >>>>>>>>> <pilhyun....@sk.com> >>>>>>>>> Subject: [External Mail] Re: [External Mail] Re: [External Mail] >Re: >>>>>>>>> [PATCH] f2fs: prevent the current section from being selected >>>>>>>>> as a victim during garbage collection >>>>>>>>> >>>>>>>>> On 2025/3/27 16:00, yohan.jo...@sk.com wrote: >>>>>>>>>>> From: Chao Yu <c...@kernel.org> >>>>>>>>>>> Sent: Thursday, March 27, 2025 4:30 PM >>>>>>>>>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; Yohan >>>>>>>>>>> Joung <jyh...@gmail.com>; jaeg...@kernel.org; >>>>>>>>>>> daeh...@gmail.com >>>>>>>>>>> Cc: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; >>>>>>>>>>> linux- ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE >>>>>>>>>>> <pilhyun....@sk.com> >>>>>>>>>>> Subject: [External Mail] Re: [External Mail] Re: [PATCH] f2fs: >>>>>>>>>>> prevent the current section from being selected as a victim >>>>>>>>>>> during garbage collection >>>>>>>>>>> >>>>>>>>>>> On 3/27/25 14:43, yohan.jo...@sk.com wrote: >>>>>>>>>>>>> From: Chao Yu <c...@kernel.org> >>>>>>>>>>>>> Sent: Thursday, March 27, 2025 3:02 PM >>>>>>>>>>>>> To: Yohan Joung <jyh...@gmail.com>; jaeg...@kernel.org; >>>>>>>>>>>>> daeh...@gmail.com >>>>>>>>>>>>> Cc: c...@kernel.org; >>>>>>>>>>>>> linux-f2fs-devel@lists.sourceforge.net; >>>>>>>>>>>>> linux- ker...@vger.kernel.org; 정요한(JOUNG YOHAN) Mobile AE >>>>>>>>>>>>> <yohan.jo...@sk.com> >>>>>>>>>>>>> Subject: [External Mail] Re: [PATCH] f2fs: prevent the >>>>>>>>>>>>> current section from being selected as a victim during >>>>>>>>>>>>> garbage collection >>>>>>>>>>>>> >>>>>>>>>>>>> On 3/26/25 22:14, Yohan Joung wrote: >>>>>>>>>>>>>> When selecting a victim using next_victim_seg in a large >>>>>>>>>>>>>> section, the selected section might already have been >>>>>>>>>>>>>> cleared and designated as the new current section, making >>>>>>>>>>>>>> it actively in >>>>>>> use. >>>>>>>>>>>>>> This behavior causes inconsistency between the SIT and SSA. >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, does this fix your issue? >>>>>>>>>>>> >>>>>>>>>>>> This is an issue that arises when dividing a large section >>>>>>>>>>>> into segments for garbage collection. >>>>>>>>>>>> caused by the background GC (garbage collection) thread in >>>>>>>>>>>> large section >>>>>>>>>>>> f2fs_gc(victim_section) -> >>>>>>>>>>>> f2fs_clear_prefree_segments(victim_section)-> >>>>>>>>>>>> cursec(victim_section) -> f2fs_gc(victim_section by >>>>>>>>>>>> next_victim_seg) >>>>>>>>>>> >>>>>>>>>>> I didn't get it, why f2fs_get_victim() will return section >>>>>>>>>>> which is used by curseg? It should be avoided by checking w/ >>>>> sec_usage_check(). >>>>>>>>>>> >>>>>>>>>>> Or we missed to check gcing section which next_victim_seg >>>>>>>>>>> points to during get_new_segment()? >>>>>>>>>>> >>>>>>>>>>> Can this happen? >>>>>>>>>>> >>>>>>>>>>> e.g. >>>>>>>>>>> - bggc selects sec #0 >>>>>>>>>>> - next_victim_seg: seg #0 >>>>>>>>>>> - migrate seg #0 and stop >>>>>>>>>>> - next_victim_seg: seg #1 >>>>>>>>>>> - checkpoint, set sec #0 free if sec #0 has no valid blocks >>>>>>>>>>> - allocate seg #0 in sec #0 for curseg >>>>>>>>>>> - curseg moves to seg #1 after allocation >>>>>>>>>>> - bggc tries to migrate seg #1 >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>> That's correct >>>>>>>>>> In f2fs_get_victim, we use next_victim_seg to directly jump to >>>>>>>>>> got_result, thereby bypassing sec_usage_check What do you >>>>>>>>>> think about this change? >>>>>>>>>> >>>>>>>>>> @@ -850,15 +850,20 @@ int f2fs_get_victim(struct f2fs_sb_info >>>>>>>>>> *sbi, >>>>>>>>> unsigned int *result, >>>>>>>>>> p.min_segno = sbi->next_victim_seg[BG_GC]; >>>>>>>>>> *result = p.min_segno; >>>>>>>>>> sbi->next_victim_seg[BG_GC] = NULL_SEGNO; >>>>>>>>>> - goto got_result; >>>>>>>>>> } >>>>>>>>>> if (gc_type == FG_GC && >>>>>>>>>> sbi->next_victim_seg[FG_GC] >>>>>>>>>> != NULL_SEGNO) >>>>> { >>>>>>>>>> p.min_segno = sbi->next_victim_seg[FG_GC]; >>>>>>>>>> *result = p.min_segno; >>>>>>>>>> sbi->next_victim_seg[FG_GC] = NULL_SEGNO; >>>>>>>>>> - goto got_result; >>>>>>>>>> } >>>>>>>>>> + >>>>>>>>>> + secno = GET_SEC_FROM_SEG(sbi, segno); >>>>>>>>>> + >>>>>>>>>> + if (sec_usage_check(sbi, secno)) >>>>>>>>>> + goto next; >>>>>>>>>> + >>>>>>>>>> + goto got_result; >>>>>>>>>> } >>>>>>>>> >>>>>>>>> But still allocator can assign this segment after >>>>>>>>> sec_usage_check() in race condition, right? >>>>>>>> Since the BG GC using next_victim takes place after the SIT >>>>>>>> update in do_checkpoint, it seems unlikely that a race condition >>>>>>>> with >>>>>>> sec_usage_check will occur. >>>>>>> >>>>>>> I mean this: >>>>>>> >>>>>>> - gc_thread >>>>>>> - f2fs_gc >>>>>>> - f2fs_get_victim >>>>>>> - sec_usage_check --- segno #1 is not used in any cursegs >>>>>>> - f2fs_allocate_data_block >>>>>>> - new_curseg >>>>>>> - get_new_segment find segno >>>>>>> #1 >>>>>>> >>>>>>> - do_garbage_collect >>>>>>> >>>>>>> Thanks, >>>>>> >>>>>> do_checkpoint sec0 free >>>>>> If sec0 is not freed, then >>>>> segno1 within sec0 cannot be >>>>>> allocated >>>>>> - gc_thread >>>>>> - f2fs_gc >>>>>> - f2fs_get_victim >>>>>> - sec_usage_check --- segno #1 is not used in any cursegs >>>>>> (but >>>>>> sec0 >>>>> is already used) >>>>>> - >>>>>> f2fs_allocate_data_block >>>>>> - new_curseg >>>>>> - get_new_segment find >>>>> segno #1 >>>>>> >>>>>> - do_garbage_collect >>>>>> >>>>>> I appreciate your patch, it is under testing. >>>>>> but I'm wondering if there's a risk of a race condition in this >>>>>> situation >>>>> >>>>> Oh, yes, I may missed that get_new_segment can return a free >>>>> segment in partial used section. >>>>> >>>>> So what do you think of this? >>>>> - check CURSEG() in do_garbage_collect() and get_victim() >>>>> - reset next_victim_seg[] in get_new_segment() and >>>>> __set_test_and_free() during checkpoint. >>>>> >>>>> Thanks, >>>> >>>> How about using victim_secmap? >>>> gc_thread >>>> mutex_lock(&DIRTY_I(sbi)->seglist_lock); >>>> __set_test_and_free >>>> check cur section next_victim clear >>>> mutex_unlock(&dirty_i->seglist_lock); >>>> >>>> mutex_lock(&dirty->seglist_lock); >>>> f2fs_get_victim >>>> mutex_unlock(&dirty_i->seglist_lock); >>>> >>>> static inline void __set_test_and_free(struct f2fs_sb_info *sbi, >>>> if (next >= start_segno + usable_segs) { >>>> if (test_and_clear_bit(secno, free_i->free_secmap)) >>>> free_i->free_sections++; >>>> + >>>> + if (test_and_clear_bit(secno, dirty_i- >>victim_secmap)) >>>> + sbi->next_victim_seg[BG_GC] = >>>> + NULL_SEGNO; >>> >>> Can this happen? >>> >>> segs_per_sec=2 >>> >>> - seg#0 and seg#1 are all dirty >>> - all valid blocks are removed in seg#1 >>> - checkpoint -> seg#1 becomes free >>> - gc select this sec and next_victim_seg=seg#0 >>> - migrate seg#0, next_victim_seg=seg#1 >>> - allocator assigns seg#1 to curseg >>> - gc tries to migrate seg#1 > >I meant for above case, below change still can not catch it, right? since >next_victim_seg[] was assigned after checkpoint. > >+ if (test_and_clear_bit(secno, dirty_i->victim_secmap)) >+ sbi->next_victim_seg[BG_GC] = NULL_SEGNO; > >Thanks, > I understood what you said. It seems that without a checkpoint, we won't be able to allocate seg#1 in sec0 because when requesting a segment allocation, if it's not within the same section, it checks in a new secmap. please let me know, if there's anything I've missed Thanks
static int get_new_segment(struct f2fs_sb_info *sbi, unsigned int *newseg, bool new_sec, bool pinning) if (!new_sec && ((*newseg + 1) % SEGS_PER_SEC(sbi))) { segno = find_next_zero_bit(free_i->free_segmap, GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1); if (segno < GET_SEG_FROM_SEC(sbi, hint + 1)) goto got_it; } find_other_zone: secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); >>> >>> Thanks, >> The detailed scenario >> segs_per_sec=2 >> - seg#0 and seg#1 are all dirty >> - all valid blocks are removed in seg#1 >> - gc select this sec and next_victim_seg=seg#0 >> - migrate seg#0, next_victim_seg=seg#1 >> - checkpoint -> sec(seg#0, seg#1) becomes free >> - allocator assigns sec(seg#0, seg#1) to curseg >> - gc tries to migrate seg#1 >>> >>>> } >>>> } >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> IMO, we can clear next_victim_seg[] once section is free in >>>>>>>>> __set_test_and_free()? something like this: >>>>>>>> I will test it according to your suggestion. >>>>>>>> If there are no issues, can I submit it again with the patch? >>>>>>>> Thanks >>>>>>>>> >>>>>>>>> --- >>>>>>>>> fs/f2fs/segment.h | 13 ++++++++++--- >>>>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index >>>>>>>>> 0465dc00b349..826e37999085 100644 >>>>>>>>> --- a/fs/f2fs/segment.h >>>>>>>>> +++ b/fs/f2fs/segment.h >>>>>>>>> @@ -473,9 +473,16 @@ static inline void >>>>>>>>> __set_test_and_free(struct f2fs_sb_info *sbi, >>>>>>>>> goto skip_free; >>>>>>>>> next = find_next_bit(free_i->free_segmap, >>>>>>>>> start_segno + SEGS_PER_SEC(sbi), >>>>> start_segno); >>>>>>>>> - if (next >= start_segno + usable_segs) { >>>>>>>>> - if (test_and_clear_bit(secno, free_i- >>>> free_secmap)) >>>>>>>>> - free_i->free_sections++; >>>>>>>>> + if ((next >= start_segno + usable_segs) && >>>>>>>>> + test_and_clear_bit(secno, free_i->free_secmap)) >>> { >>>>>>>>> + free_i->free_sections++; >>>>>>>>> + >>>>>>>>> + if >>>>>>>>> (GET_SEC_FROM_SEG(sbi->next_victim_seg[BG_GC]) >>> == >>>>>>>>> + >>>>>>>>> secno) >>>>>>>>> + sbi->next_victim_seg[BG_GC] = >>>>>>>>> NULL_SEGNO; >>>>>>>>> + if >>>>>>>>> (GET_SEC_FROM_SEG(sbi->next_victim_seg[FG_GC]) >>> == >>>>>>>>> + >>>>>>>>> secno) >>>>>>>>> + sbi->next_victim_seg[FG_GC] = >>>>>>>>> NULL_SEGNO; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> skip_free: >>>>>>>>> -- >>>>>>>>> 2.40.1 >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Because the call stack is different, I think that in order >>>>>>>>>>>> to handle everything at once, we need to address it within >>>>>>>>>>>> do_garbage_collect, or otherwise include it on both sides. >>>>>>>>>>>> What do you think? >>>>>>>>>>>> >>>>>>>>>>>> [30146.337471][ T1300] F2FS-fs (dm-54): Inconsistent segment >>>>>>>>>>>> (70961) type [0, 1] in SSA and SIT [30146.346151][ T1300] >>>>>>>>>>>> Call >>>>> trace: >>>>>>>>>>>> [30146.346152][ T1300] dump_backtrace+0xe8/0x10c >>>>>>>>>>>> [30146.346157][ T1300] show_stack+0x18/0x28 [30146.346158][ >>>>>>>>>>>> T1300] dump_stack_lvl+0x50/0x6c [30146.346161][ T1300] >>>>>>>>>>>> dump_stack+0x18/0x28 [30146.346162][ T1300] >>>>>>>>>>>> f2fs_stop_checkpoint+0x1c/0x3c [30146.346165][ T1300] >>>>>>>>>>>> do_garbage_collect+0x41c/0x271c [30146.346167][ T1300] >>>>>>>>>>>> f2fs_gc+0x27c/0x828 [30146.346168][ T1300] >>>>>>>>>>>> gc_thread_func+0x290/0x88c [30146.346169][ T1300] >>>>>>>>>>>> kthread+0x11c/0x164 [30146.346172][ T1300] >>>>>>>>>>>> ret_from_fork+0x10/0x20 >>>>>>>>>>>> >>>>>>>>>>>> struct curseg_info : 0xffffff803f95e800 { >>>>>>>>>>>> segno : 0x11531 : 70961 >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> struct f2fs_sb_info : 0xffffff8811d12000 { >>>>>>>>>>>> next_victim_seg[0] : 0x11531 : 70961 } >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://lore.kernel.org/linux-f2fs-devel/20250325080646.329 >>>>>>>>>>>>> 19 >>>>>>>>>>>>> 47 >>>>>>>>>>>>> -2 >>>>>>>>>>>>> - >>>>>>>>>>>>> c...@kernel.org >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Yohan Joung <yohan.jo...@sk.com> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> fs/f2fs/gc.c | 4 ++++ >>>>>>>>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index >>>>>>>>>>>>>> 2b8f9239bede..4b5d18e395eb 100644 >>>>>>>>>>>>>> --- a/fs/f2fs/gc.c >>>>>>>>>>>>>> +++ b/fs/f2fs/gc.c >>>>>>>>>>>>>> @@ -1926,6 +1926,10 @@ int f2fs_gc(struct f2fs_sb_info >>>>>>>>>>>>>> *sbi, struct >>>>>>>>>>>>> f2fs_gc_control *gc_control) >>>>>>>>>>>>>> goto stop; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> + if (__is_large_section(sbi) && >>>>>>>>>>>>>> + IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, >segno))) >>>>>>>>>>>>>> + goto stop; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> seg_freed = do_garbage_collect(sbi, segno, &gc_list, >>> gc_type, >>>>>>>>>>>>>> >>>>>>>>>>>>>> gc_control->should_migrate_blocks, >>>>>>>>>>>>>> gc_control->one_time); >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >> >> _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel