On 4/1/25 15:47, yohan.joung wrote: > > >> -----Original Message----- >> From: Chao Yu <c...@kernel.org> >> Sent: Tuesday, April 1, 2025 2:42 PM >> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; linux-f2fs- >> de...@lists.sourceforge.net >> Cc: c...@kernel.org; jaeg...@kernel.org; jyh...@gmail.com; linux- >> ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE <pilhyun....@sk.com> >> Subject: [External Mail] Re: [External Mail] Re: [f2fs-dev] [External Mail] >> Re: [External Mail] Re: [PATCH] f2fs: prevent the current section from >> being selected as a victim during garbage collection >> >> On 4/1/25 09:51, yohan.joung wrote: >>>> From: Chao Yu <c...@kernel.org> >>>> Sent: Monday, March 31, 2025 8:36 PM >>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; linux-f2fs- >>>> de...@lists.sourceforge.net >>>> Cc: c...@kernel.org; jaeg...@kernel.org; jyh...@gmail.com; linux- >>>> ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE >>>> <pilhyun....@sk.com> >>>> Subject: [External Mail] Re: [External Mail] Re: [f2fs-dev] [External >>>> Mail] >>>> Re: [External Mail] Re: [PATCH] f2fs: prevent the current section >>>> from being selected as a victim during garbage collection >>>> >>>> On 3/31/25 13:13, yohan.joung wrote: >>>>>> On 2025/3/28 15:25, yohan.joung wrote: >>>>>>>> On 2025/3/28 11:40, yohan.joung wrote: >>>>>>>>>> From: Chao Yu <c...@kernel.org> >>>>>>>>>> Sent: Thursday, March 27, 2025 10:48 PM >>>>>>>>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; Yohan >>>>>>>>>> Joung <jyh...@gmail.com>; jaeg...@kernel.org; daeh...@gmail.com >>>>>>>>>> Cc: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; >>>>>>>>>> linux- ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE >>>>>>>>>> <pilhyun....@sk.com> >>>>>>>>>> Subject: [External Mail] Re: [External Mail] Re: [External Mail] >> Re: >>>>>>>>>> [PATCH] f2fs: prevent the current section from being selected >>>>>>>>>> as a victim during garbage collection >>>>>>>>>> >>>>>>>>>> On 2025/3/27 16:00, yohan.jo...@sk.com wrote: >>>>>>>>>>>> From: Chao Yu <c...@kernel.org> >>>>>>>>>>>> Sent: Thursday, March 27, 2025 4:30 PM >>>>>>>>>>>> To: 정요한(JOUNG YOHAN) Mobile AE <yohan.jo...@sk.com>; Yohan >>>>>>>>>>>> Joung <jyh...@gmail.com>; jaeg...@kernel.org; >>>>>>>>>>>> daeh...@gmail.com >>>>>>>>>>>> Cc: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; >>>>>>>>>>>> linux- ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE >>>>>>>>>>>> <pilhyun....@sk.com> >>>>>>>>>>>> Subject: [External Mail] Re: [External Mail] Re: [PATCH] f2fs: >>>>>>>>>>>> prevent the current section from being selected as a victim >>>>>>>>>>>> during garbage collection >>>>>>>>>>>> >>>>>>>>>>>> On 3/27/25 14:43, yohan.jo...@sk.com wrote: >>>>>>>>>>>>>> From: Chao Yu <c...@kernel.org> >>>>>>>>>>>>>> Sent: Thursday, March 27, 2025 3:02 PM >>>>>>>>>>>>>> To: Yohan Joung <jyh...@gmail.com>; jaeg...@kernel.org; >>>>>>>>>>>>>> daeh...@gmail.com >>>>>>>>>>>>>> Cc: c...@kernel.org; >>>>>>>>>>>>>> linux-f2fs-devel@lists.sourceforge.net; >>>>>>>>>>>>>> linux- ker...@vger.kernel.org; 정요한(JOUNG YOHAN) Mobile AE >>>>>>>>>>>>>> <yohan.jo...@sk.com> >>>>>>>>>>>>>> Subject: [External Mail] Re: [PATCH] f2fs: prevent the >>>>>>>>>>>>>> current section from being selected as a victim during >>>>>>>>>>>>>> garbage collection >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 3/26/25 22:14, Yohan Joung wrote: >>>>>>>>>>>>>>> When selecting a victim using next_victim_seg in a large >>>>>>>>>>>>>>> section, the selected section might already have been >>>>>>>>>>>>>>> cleared and designated as the new current section, making >>>>>>>>>>>>>>> it actively in >>>>>>>> use. >>>>>>>>>>>>>>> This behavior causes inconsistency between the SIT and SSA. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, does this fix your issue? >>>>>>>>>>>>> >>>>>>>>>>>>> This is an issue that arises when dividing a large section >>>>>>>>>>>>> into segments for garbage collection. >>>>>>>>>>>>> caused by the background GC (garbage collection) thread in >>>>>>>>>>>>> large section >>>>>>>>>>>>> f2fs_gc(victim_section) -> >>>>>>>>>>>>> f2fs_clear_prefree_segments(victim_section)-> >>>>>>>>>>>>> cursec(victim_section) -> f2fs_gc(victim_section by >>>>>>>>>>>>> next_victim_seg) >>>>>>>>>>>> >>>>>>>>>>>> I didn't get it, why f2fs_get_victim() will return section >>>>>>>>>>>> which is used by curseg? It should be avoided by checking w/ >>>>>> sec_usage_check(). >>>>>>>>>>>> >>>>>>>>>>>> Or we missed to check gcing section which next_victim_seg >>>>>>>>>>>> points to during get_new_segment()? >>>>>>>>>>>> >>>>>>>>>>>> Can this happen? >>>>>>>>>>>> >>>>>>>>>>>> e.g. >>>>>>>>>>>> - bggc selects sec #0 >>>>>>>>>>>> - next_victim_seg: seg #0 >>>>>>>>>>>> - migrate seg #0 and stop >>>>>>>>>>>> - next_victim_seg: seg #1 >>>>>>>>>>>> - checkpoint, set sec #0 free if sec #0 has no valid blocks >>>>>>>>>>>> - allocate seg #0 in sec #0 for curseg >>>>>>>>>>>> - curseg moves to seg #1 after allocation >>>>>>>>>>>> - bggc tries to migrate seg #1 >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>> That's correct >>>>>>>>>>> In f2fs_get_victim, we use next_victim_seg to directly jump to >>>>>>>>>>> got_result, thereby bypassing sec_usage_check What do you >>>>>>>>>>> think about this change? >>>>>>>>>>> >>>>>>>>>>> @@ -850,15 +850,20 @@ int f2fs_get_victim(struct f2fs_sb_info >>>>>>>>>>> *sbi, >>>>>>>>>> unsigned int *result, >>>>>>>>>>> p.min_segno = >>>>>>>>>>> sbi->next_victim_seg[BG_GC]; >>>>>>>>>>> *result = p.min_segno; >>>>>>>>>>> sbi->next_victim_seg[BG_GC] = NULL_SEGNO; >>>>>>>>>>> - goto got_result; >>>>>>>>>>> } >>>>>>>>>>> if (gc_type == FG_GC && >>>>>>>>>>> sbi->next_victim_seg[FG_GC] >>>>>>>>>>> != NULL_SEGNO) >>>>>> { >>>>>>>>>>> p.min_segno = >>>>>>>>>>> sbi->next_victim_seg[FG_GC]; >>>>>>>>>>> *result = p.min_segno; >>>>>>>>>>> sbi->next_victim_seg[FG_GC] = NULL_SEGNO; >>>>>>>>>>> - goto got_result; >>>>>>>>>>> } >>>>>>>>>>> + >>>>>>>>>>> + secno = GET_SEC_FROM_SEG(sbi, segno); >>>>>>>>>>> + >>>>>>>>>>> + if (sec_usage_check(sbi, secno)) >>>>>>>>>>> + goto next; >>>>>>>>>>> + >>>>>>>>>>> + goto got_result; >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> But still allocator can assign this segment after >>>>>>>>>> sec_usage_check() in race condition, right? >>>>>>>>> Since the BG GC using next_victim takes place after the SIT >>>>>>>>> update in do_checkpoint, it seems unlikely that a race condition >>>>>>>>> with >>>>>>>> sec_usage_check will occur. >>>>>>>> >>>>>>>> I mean this: >>>>>>>> >>>>>>>> - gc_thread >>>>>>>> - f2fs_gc >>>>>>>> - f2fs_get_victim >>>>>>>> - sec_usage_check --- segno #1 is not used in any cursegs >>>>>>>> - f2fs_allocate_data_block >>>>>>>> - new_curseg >>>>>>>> - get_new_segment find segno >>>>>>>> #1 >>>>>>>> >>>>>>>> - do_garbage_collect >>>>>>>> >>>>>>>> Thanks, >>>>>>> >>>>>>> do_checkpoint sec0 >>>>>>> free >>>>>>> If sec0 is not freed, >>>>>>> then >>>>>> segno1 within sec0 cannot be >>>>>>> allocated >>>>>>> - gc_thread >>>>>>> - f2fs_gc >>>>>>> - f2fs_get_victim >>>>>>> - sec_usage_check --- segno #1 is not used in any cursegs >>>>>>> (but >>>>>>> sec0 >>>>>> is already used) >>>>>>> - >>>>>>> f2fs_allocate_data_block >>>>>>> - new_curseg >>>>>>> - >>>>>>> get_new_segment find >>>>>> segno #1 >>>>>>> >>>>>>> - do_garbage_collect >>>>>>> >>>>>>> I appreciate your patch, it is under testing. >>>>>>> but I'm wondering if there's a risk of a race condition in this >>>>>>> situation >>>>>> >>>>>> Oh, yes, I may missed that get_new_segment can return a free >>>>>> segment in partial used section. >>>>>> >>>>>> So what do you think of this? >>>>>> - check CURSEG() in do_garbage_collect() and get_victim() >>>>>> - reset next_victim_seg[] in get_new_segment() and >>>>>> __set_test_and_free() during checkpoint. >>>>>> >>>>>> Thanks, >>>>> >>>>> How about using victim_secmap? >>>>> gc_thread >>>>> mutex_lock(&DIRTY_I(sbi)->seglist_lock); >>>>> __set_test_and_free >>>>> check cur section next_victim clear >>>>> mutex_unlock(&dirty_i->seglist_lock); >>>>> >>>>> mutex_lock(&dirty->seglist_lock); >>>>> f2fs_get_victim >>>>> mutex_unlock(&dirty_i->seglist_lock); >>>>> >>>>> static inline void __set_test_and_free(struct f2fs_sb_info *sbi, >>>>> if (next >= start_segno + usable_segs) { >>>>> if (test_and_clear_bit(secno, >>>>> free_i->free_secmap)) >>>>> free_i->free_sections++; >>>>> + >>>>> + if (test_and_clear_bit(secno, dirty_i- >>> victim_secmap)) >>>>> + sbi->next_victim_seg[BG_GC] = >>>>> + NULL_SEGNO; >>>> >>>> Can this happen? >>>> >>>> segs_per_sec=2 >>>> >>>> - seg#0 and seg#1 are all dirty >>>> - all valid blocks are removed in seg#1 >>>> - checkpoint -> seg#1 becomes free >>>> - gc select this sec and next_victim_seg=seg#0 >>>> - migrate seg#0, next_victim_seg=seg#1 >>>> - allocator assigns seg#1 to curseg >>>> - gc tries to migrate seg#1 >> >> I meant for above case, below change still can not catch it, right? since >> next_victim_seg[] was assigned after checkpoint. >> >> + if (test_and_clear_bit(secno, dirty_i->victim_secmap)) >> + sbi->next_victim_seg[BG_GC] = NULL_SEGNO; >> >> Thanks, >> > I understood what you said. > It seems that without a checkpoint, > we won't be able to allocate seg#1 in sec0 > because when requesting a segment allocation, > if it's not within the same section, it checks in a new secmap. > please let me know, if there's anything I've missed
Oh, I see, it always try to allocate segments sequentially in a large section-size image, thanks for your explanation. :) static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) { ... if (__is_large_section(sbi)) { ... return curseg->segno; ... } static int new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) { ... segno = __get_next_segno(sbi, type); ret = get_new_segment(sbi, &segno, new_sec, pinning); ... } BTW, I guess it could be more efficient if we can reuse free segment in dirty section for conventional device. > Thanks > > static int get_new_segment(struct f2fs_sb_info *sbi, > unsigned int *newseg, bool new_sec, bool pinning) > > if (!new_sec && ((*newseg + 1) % SEGS_PER_SEC(sbi))) { > segno = find_next_zero_bit(free_i->free_segmap, > GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1); > if (segno < GET_SEG_FROM_SEC(sbi, hint + 1)) > goto got_it; > } > > find_other_zone: > secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); > >>>> >>>> Thanks, >>> The detailed scenario >>> segs_per_sec=2 >>> - seg#0 and seg#1 are all dirty >>> - all valid blocks are removed in seg#1 >>> - gc select this sec and next_victim_seg=seg#0 >>> - migrate seg#0, next_victim_seg=seg#1 >>> - checkpoint -> sec(seg#0, seg#1) becomes free So, for this case, we can handle it only in checkpoint. >> + if (test_and_clear_bit(secno, dirty_i->victim_secmap)) >> + sbi->next_victim_seg[BG_GC] = NULL_SEGNO; Do we need to handle sbi->next_victim_seg[FG_GC] as well? Thanks, >>> - allocator assigns sec(seg#0, seg#1) to curseg >>> - gc tries to migrate seg#1 >>>> >>>>> } >>>>> } >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> IMO, we can clear next_victim_seg[] once section is free in >>>>>>>>>> __set_test_and_free()? something like this: >>>>>>>>> I will test it according to your suggestion. >>>>>>>>> If there are no issues, can I submit it again with the patch? >>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> fs/f2fs/segment.h | 13 ++++++++++--- >>>>>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index >>>>>>>>>> 0465dc00b349..826e37999085 100644 >>>>>>>>>> --- a/fs/f2fs/segment.h >>>>>>>>>> +++ b/fs/f2fs/segment.h >>>>>>>>>> @@ -473,9 +473,16 @@ static inline void >>>>>>>>>> __set_test_and_free(struct f2fs_sb_info *sbi, >>>>>>>>>> goto skip_free; >>>>>>>>>> next = find_next_bit(free_i->free_segmap, >>>>>>>>>> start_segno + SEGS_PER_SEC(sbi), >>>>>> start_segno); >>>>>>>>>> - if (next >= start_segno + usable_segs) { >>>>>>>>>> - if (test_and_clear_bit(secno, free_i- >>>>> free_secmap)) >>>>>>>>>> - free_i->free_sections++; >>>>>>>>>> + if ((next >= start_segno + usable_segs) && >>>>>>>>>> + test_and_clear_bit(secno, free_i->free_secmap)) >>>> { >>>>>>>>>> + free_i->free_sections++; >>>>>>>>>> + >>>>>>>>>> + if >>>>>>>>>> (GET_SEC_FROM_SEG(sbi->next_victim_seg[BG_GC]) >>>> == >>>>>>>>>> + >>>>>>>>>> secno) >>>>>>>>>> + sbi->next_victim_seg[BG_GC] = >>>>>>>>>> NULL_SEGNO; >>>>>>>>>> + if >>>>>>>>>> (GET_SEC_FROM_SEG(sbi->next_victim_seg[FG_GC]) >>>> == >>>>>>>>>> + >>>>>>>>>> secno) >>>>>>>>>> + sbi->next_victim_seg[FG_GC] = >>>>>>>>>> NULL_SEGNO; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> skip_free: >>>>>>>>>> -- >>>>>>>>>> 2.40.1 >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Because the call stack is different, I think that in order >>>>>>>>>>>>> to handle everything at once, we need to address it within >>>>>>>>>>>>> do_garbage_collect, or otherwise include it on both sides. >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> [30146.337471][ T1300] F2FS-fs (dm-54): Inconsistent segment >>>>>>>>>>>>> (70961) type [0, 1] in SSA and SIT [30146.346151][ T1300] >>>>>>>>>>>>> Call >>>>>> trace: >>>>>>>>>>>>> [30146.346152][ T1300] dump_backtrace+0xe8/0x10c >>>>>>>>>>>>> [30146.346157][ T1300] show_stack+0x18/0x28 [30146.346158][ >>>>>>>>>>>>> T1300] dump_stack_lvl+0x50/0x6c [30146.346161][ T1300] >>>>>>>>>>>>> dump_stack+0x18/0x28 [30146.346162][ T1300] >>>>>>>>>>>>> f2fs_stop_checkpoint+0x1c/0x3c [30146.346165][ T1300] >>>>>>>>>>>>> do_garbage_collect+0x41c/0x271c [30146.346167][ T1300] >>>>>>>>>>>>> f2fs_gc+0x27c/0x828 [30146.346168][ T1300] >>>>>>>>>>>>> gc_thread_func+0x290/0x88c [30146.346169][ T1300] >>>>>>>>>>>>> kthread+0x11c/0x164 [30146.346172][ T1300] >>>>>>>>>>>>> ret_from_fork+0x10/0x20 >>>>>>>>>>>>> >>>>>>>>>>>>> struct curseg_info : 0xffffff803f95e800 { >>>>>>>>>>>>> segno : 0x11531 : 70961 >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> struct f2fs_sb_info : 0xffffff8811d12000 { >>>>>>>>>>>>> next_victim_seg[0] : 0x11531 : 70961 } >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://lore.kernel.org/linux-f2fs-devel/20250325080646.329 >>>>>>>>>>>>>> 19 >>>>>>>>>>>>>> 47 >>>>>>>>>>>>>> -2 >>>>>>>>>>>>>> - >>>>>>>>>>>>>> c...@kernel.org >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Yohan Joung <yohan.jo...@sk.com> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> fs/f2fs/gc.c | 4 ++++ >>>>>>>>>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index >>>>>>>>>>>>>>> 2b8f9239bede..4b5d18e395eb 100644 >>>>>>>>>>>>>>> --- a/fs/f2fs/gc.c >>>>>>>>>>>>>>> +++ b/fs/f2fs/gc.c >>>>>>>>>>>>>>> @@ -1926,6 +1926,10 @@ int f2fs_gc(struct f2fs_sb_info >>>>>>>>>>>>>>> *sbi, struct >>>>>>>>>>>>>> f2fs_gc_control *gc_control) >>>>>>>>>>>>>>> goto stop; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + if (__is_large_section(sbi) && >>>>>>>>>>>>>>> + IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, >> segno))) >>>>>>>>>>>>>>> + goto stop; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> seg_freed = do_garbage_collect(sbi, segno, &gc_list, >>>> gc_type, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> gc_control->should_migrate_blocks, >>>>>>>>>>>>>>> gc_control->one_time); >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>> >>> > > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel