On 12/20, Gaoxiang (OS) wrote:
> Hi Jaegeuk,
> 
> On 2017/12/20 10:31, Jaegeuk Kim wrote:
> > On 12/20, Gaoxiang (OS) wrote:
> >> Hi Jaegeuk,
> >>
> >> On 2017/12/20 7:40, Jaegeuk Kim wrote:
> >>> On 12/17, gaoxiang (P) wrote:
> >>>> clean up some redundant repeat code in build_sit_entries
> >>>> and seperate some logic into new functions:
> >>>>    - build_discard_entries
> >>>>    - build_sec_entries
> >>>
> >>> This patch makes another big loop redundantly?
> >>>
> >> "condition & jump" is the fundamental of "if" and loop.
> >> In the worst case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > 1
> >> are true, and I think, we have:
> >> - the original approach:
> >>    MAIN_SEGS   <-- loop condition
> >>    MAIN_SEGS   <-- f2fs_discard_en(sbi)
> >>    MAIN_SEGS   <-- sbi->segs_per_sec > 1
> >>
> >>    sits_in_cursum(journal) <-- loop_condition
> >>    sits_in_cursum(journal) <-- f2fs_discard_en(sbi)
> >>    sits_in_cursum(journal) <-- sbi->segs_per_sec > 1
> >>      it takes about 3*(MAIN_SEGS + sits_in_cursum(journal)) jumps effort.
> >> - this patch:
> >>    MAIN_SEGS   <-- loop condition
> >>    sits_in_cursum(journal) <-- loop_condition
> >>
> >>    MAIN_SEGS   <-- f2fs_discard_en(sbi)
> >>    MAIN_SEGS   <-- sbi->segs_per_sec > 1
> >>      it takes abourt 3*MAIN_SEGS + sits_in_cursum(journal) jumps effort.
> >>
> >> and in the best case, both f2fs_discard_en(sbi) and sbi->segs_per_sec >
> >> 1 are false, and we have:
> >> - this patch
> >>    MAIN_SEGS   <-- loop condition
> >>
> >>    sits_in_cursum(journal) <-- loop_condition
> >> See
> >> https://stackoverflow.com/questions/1462710/can-c-compilers-optimize-if-statements-inside-for-loops
> > 
> > Interesting!
> > 
> > What I can think of the worst case looks like:
> > 
> > In current code,
> >     for (N) {
> >             do_something;
> >             if (f2fs_discard_en())
> >                     do _something;
> >             if (sbi->segs_per_sec > 1)
> >                     do_something;
> >     }
> >     for (sits_in_cursum()) {
> >             do_something;
> >             if (f2fs_discard_en())
> >                     do _something;
> >             if (sbi->segs_per_sec > 1)
> >                     do_something;
> >     }
> > => O(N + sits_in_cursum())
> 
> In the worst case Its 3(1 -- exit condition < MAIN_SEG, 2 --  
> f2fs_discard_en(), 3 -- 
> sbi->segs_per_sec > 1) *N + 3*sits_in_cursum() if condition & jump 
> instuctions at runtime.
> 
> If we use Big O notion, yes I think O(N + sits_in_cursum())
> 
> > 
> > Once compiler unrolled the loop, we can expect most of jumps could be hit by
> > branch predictor, since the loop does not have many branches.
> 
> The current code, if the compiler decides to unroll the loop, it could unroll 
> it into, I guess
>       for (N) {
>               do_something;
>       }
>       if (f2fs_discard_en()) {
>               for(N) {
>                       do _something;
>               }
>       }
>       if (sbi->segs_per_sec > 1) {
>               for(N) {
>                       do_something;
>               }
>       }
>       ...
>       for (sits_in_cursum()) {
>               do_something;
>       }
>       if (f2fs_discard_en()) {
>               for (N) {
>                       do _something;
>               }
>       }
>       if (sbi->segs_per_sec > 1) {
>               for (N) {
>                       do_something;
>               }
>               }

Unrolled loop would be something like:
        for (N/n) {
                do_something #1;
                if (f2fs_discard_en())
                        do_something;
                if (sbi->segs_per_sec > 1)
                        do_something;
                ...
                do_something #n;
                if (f2fs_discard_en())
                        do_something;
                if (sbi->segs_per_sec > 1)
                        do_something;
        }

And, branch predictor can avoid three branch conditions by BTB.

do_something #1;
  if (f2fs_discard_en())
    do _something;
      if (sbi->segs_per_sec > 1)
        do_something;
          do_something #2;
            if (f2fs_discard_en())
              do _something;
                if (sbi->segs_per_sec > 1)
                  do_something;
                    ...
                      do_something #N;
                        if (f2fs_discard_en())
                          do _something;
                            if (sbi->segs_per_sec > 1)
                              do_something;

Anyway, I don't see huge benefit regarding to performance and code readability,
but do worry about potential patch conflicts when backporting this.

> Taking branch predictor into consideration, I think the unrolled one is more 
> easier to predict
> than the current rolled one. (as you say, the current has more conditions in 
> the loop
> and a exit condition to predict)
> 
> > 
> > In the patch,
> >     for (N)
> >             do_something;
> >     for (sits_in_cursum())
> >             do_something;
> > 
> >     if (f2fs_discard_en()) {
> >             for (N)
> >                     do_something;
> >     }
> >     if (sbi->segs_per_sec > 1) {
> >             for (N)
> >                     do_something;
> >     }
> > => O(3*N + sits_in_cursum())
> 
> If we use Big O notion, I think O(N + sits_in_cursum()) too.
> > 
> > BTW, build_discard_entries(struct f2fs_sb_info *sbi) is more likely to split
> > the loops, as you describe.
> > 
> > In build_discard_entries(),
> >     for (N) {
> >             if (CP_TRIMMED_FLAG)
> >                     do_something;
> >             else
> >                     do_semething_else;
> >     }
> > Thanks,
> > 
> 
> I think so, yet CP_TRIMMED_FLAG comes from a function argument, so I think 
> compiler
> cannot decide whether it is constant.
> 
> I think if we count the runtime condition & jump instruction at runtime. 
> the unrolls above are beneficial, I think.
> 
> Thanks.
> 
> >>
> >> In addtion, this approach is easier to read than the original
> >> one, so splitting the loop would be beneficial.
> >>
> >> Thanks
> >>
> >>>>
> >>>> Note that "if (f2fs_discard_en(sbi))" and "if (sbi->segs_per_sec > 1)"
> >>>> are all taken out of the loops they are unchangable.
> >>>>
> >>>> Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
> >>>> ---
> >>>>    fs/f2fs/segment.c | 80 
> >>>> +++++++++++++++++++++++++------------------------------
> >>>>    1 file changed, 37 insertions(+), 43 deletions(-)
> >>>>
> >>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>>> index 40e1d20..bcd8a40 100644
> >>>> --- a/fs/f2fs/segment.c
> >>>> +++ b/fs/f2fs/segment.c
> >>>> @@ -3476,12 +3476,39 @@ static int build_curseg(struct f2fs_sb_info *sbi)
> >>>>          return restore_curseg_summaries(sbi);
> >>>>    }
> >>>>    
> >>>> +static void build_discard_entries(struct f2fs_sb_info *sbi)
> >>>> +{
> >>>> +        unsigned segno;
> >>>> +
> >>>> +        for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) {
> >>>> +                struct seg_entry *se = get_seg_entry(sbi, segno);
> >>>> +
> >>>> +                if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG))
> >>>> +                        memset(se->discard_map, 0xff, 
> >>>> SIT_VBLOCK_MAP_SIZE);
> >>>> +                else {
> >>>> +                        memcpy(se->discard_map, se->cur_valid_map,
> >>>> +                                SIT_VBLOCK_MAP_SIZE);
> >>>> +
> >>>> +                        sbi->discard_blks += sbi->blocks_per_seg -
> >>>> +                                se->valid_blocks;
> >>>> +                }
> >>>> +        }
> >>>> +}
> >>>> +
> >>>> +static void build_sec_entries(struct f2fs_sb_info *sbi)
> >>>> +{
> >>>> +        unsigned segno;
> >>>> +
> >>>> +        for (segno = 0; segno < MAIN_SEGS(sbi); ++segno)
> >>>> +                get_sec_entry(sbi, segno)->valid_blocks +=
> >>>> +                        get_seg_entry(sbi, segno)->valid_blocks;
> >>>> +}
> >>>> +
> >>>>    static void build_sit_entries(struct f2fs_sb_info *sbi)
> >>>>    {
> >>>>          struct sit_info *sit_i = SIT_I(sbi);
> >>>>          struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
> >>>>          struct f2fs_journal *journal = curseg->journal;
> >>>> -        struct seg_entry *se;
> >>>>          struct f2fs_sit_entry sit;
> >>>>          int sit_blk_cnt = SIT_BLK_CNT(sbi);
> >>>>          unsigned int i, start, end;
> >>>> @@ -3498,67 +3525,34 @@ static void build_sit_entries(struct 
> >>>> f2fs_sb_info *sbi)
> >>>>                          struct f2fs_sit_block *sit_blk;
> >>>>                          struct page *page;
> >>>>    
> >>>> -                        se = &sit_i->sentries[start];
> >>>>                          page = get_current_sit_page(sbi, start);
> >>>>                          sit_blk = (struct f2fs_sit_block 
> >>>> *)page_address(page);
> >>>>                          sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, 
> >>>> start)];
> >>>>                          f2fs_put_page(page, 1);
> >>>>    
> >>>>                          check_block_count(sbi, start, &sit);
> >>>> -                        seg_info_from_raw_sit(se, &sit);
> >>>> -
> >>>> -                        /* build discard map only one time */
> >>>> -                        if (f2fs_discard_en(sbi)) {
> >>>> -                                if (is_set_ckpt_flags(sbi, 
> >>>> CP_TRIMMED_FLAG)) {
> >>>> -                                        memset(se->discard_map, 0xff,
> >>>> -                                                SIT_VBLOCK_MAP_SIZE);
> >>>> -                                } else {
> >>>> -                                        memcpy(se->discard_map,
> >>>> -                                                se->cur_valid_map,
> >>>> -                                                SIT_VBLOCK_MAP_SIZE);
> >>>> -                                        sbi->discard_blks +=
> >>>> -                                                sbi->blocks_per_seg -
> >>>> -                                                se->valid_blocks;
> >>>> -                                }
> >>>> -                        }
> >>>>    
> >>>> -                        if (sbi->segs_per_sec > 1)
> >>>> -                                get_sec_entry(sbi, start)->valid_blocks 
> >>>> +=
> >>>> -                                                        
> >>>> se->valid_blocks;
> >>>> +                        seg_info_from_raw_sit(&sit_i->sentries[start], 
> >>>> &sit);
> >>>>                  }
> >>>>                  start_blk += readed;
> >>>>          } while (start_blk < sit_blk_cnt);
> >>>>    
> >>>>          down_read(&curseg->journal_rwsem);
> >>>>          for (i = 0; i < sits_in_cursum(journal); i++) {
> >>>> -                unsigned int old_valid_blocks;
> >>>> -
> >>>>                  start = le32_to_cpu(segno_in_journal(journal, i));
> >>>> -                se = &sit_i->sentries[start];
> >>>>                  sit = sit_in_journal(journal, i);
> >>>>    
> >>>> -                old_valid_blocks = se->valid_blocks;
> >>>> -
> >>>>                  check_block_count(sbi, start, &sit);
> >>>> -                seg_info_from_raw_sit(se, &sit);
> >>>> -
> >>>> -                if (f2fs_discard_en(sbi)) {
> >>>> -                        if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
> >>>> -                                memset(se->discard_map, 0xff,
> >>>> -                                                        
> >>>> SIT_VBLOCK_MAP_SIZE);
> >>>> -                        } else {
> >>>> -                                memcpy(se->discard_map, 
> >>>> se->cur_valid_map,
> >>>> -                                                        
> >>>> SIT_VBLOCK_MAP_SIZE);
> >>>> -                                sbi->discard_blks += old_valid_blocks -
> >>>> -                                                        
> >>>> se->valid_blocks;
> >>>> -                        }
> >>>> -                }
> >>>> -
> >>>> -                if (sbi->segs_per_sec > 1)
> >>>> -                        get_sec_entry(sbi, start)->valid_blocks +=
> >>>> -                                se->valid_blocks - old_valid_blocks;
> >>>> +                seg_info_from_raw_sit(&sit_i->sentries[start], &sit);
> >>>>          }
> >>>>          up_read(&curseg->journal_rwsem);
> >>>> +
> >>>> +        /* build discard map only one time */
> >>>> +        if (f2fs_discard_en(sbi))
> >>>> +                build_discard_entries(sbi);
> >>>> +
> >>>> +        if (sbi->segs_per_sec > 1)
> >>>> +                build_sec_entries(sbi);
> >>>>    }
> >>>>    
> >>>>    static void init_free_segmap(struct f2fs_sb_info *sbi)
> >>>> -- 
> >>>> 2.1.4

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to