On 08/14, Chao Yu wrote: > On 2018/8/14 12:04, Jaegeuk Kim wrote: > > On 08/14, Chao Yu wrote: > >> On 2018/8/14 4:11, Jaegeuk Kim wrote: > >>> On 08/13, Chao Yu wrote: > >>>> Hi Jaegeuk, > >>>> > >>>> On 2018/8/11 2:56, Jaegeuk Kim wrote: > >>>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" > >>>>> to fix the drop in sequential read throughput. > >>>>> > >>>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L > >>>>> device: UFS > >>>>> > >>>>> Before - > >>>>> read throughput: 185 MB/s > >>>>> total read requests: 85177 (of these ~80000 are 4KB size requests). > >>>>> total write requests: 2546 (of these ~2208 requests are written in > >>>>> 512KB). > >>>>> > >>>>> After - > >>>>> read throughput: 758 MB/s > >>>>> total read requests: 2417 (of these ~2042 are 512KB reads). > >>>>> total write requests: 2701 (of these ~2034 requests are written in > >>>>> 512KB). > >>>> > >>>> IMO, it only impact sequential read performance in a large file which > >>>> may be > >>>> fragmented during multi-thread writing. > >>>> > >>>> In android environment, mostly, the large file should be cold type, such > >>>> as apk, > >>>> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() > >>>> for cold > >>>> data area writer. > >>>> > >>>> So how about adding a mount option to serialize writepage() for > >>>> different type > >>>> of log, e.g. in android, using serialize=4; by default, using serialize=7 > >>>> HOT_DATA 1 > >>>> WARM_DATA 2 > >>>> COLD_DATA 4 > >>> > >>> Well, I don't think we need to give too many mount options for this > >>> fragmented > >>> case. How about doing this for the large files only like this? > >> > >> Thread A write 512 pages Thread B write 8 pages > >> > >> - writepages() > >> - mutex_lock(&sbi->writepages); > >> - writepage(); > >> ... > >> - writepages() > >> - writepage() > >> .... > >> - writepage(); > >> ... > >> - mutex_unlock(&sbi->writepages); > >> > >> Above case will also cause fragmentation since we didn't serialize all > >> concurrent IO with the lock. > >> > >> Do we need to consider such case? > > > > We can simply allow 512 and 8 in the same segment, which would not a big > > deal, > > when considering starvation of Thread B. > > Yeah, but in reality, there would be more threads competing in same log > header, > so I worry that the effect of defragmenting will not so good as we expect, > anyway, for benchmark, it's enough.
Basically, I think this is not a benchmark issue. :) It just reveals the issue much easily. Let me think about three cases: 1) WB_SYNC_NONE & WB_SYNC_NONE -> can simply use mutex_lock 2) WB_SYNC_ALL & WB_SYNC_NONE -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE will skip writing blocks 3) WB_SYNC_ALL & WB_SYNC_ALL -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid starvation. I've been testing the below. if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL || get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) { mutex_lock(&sbi->writepages); locked = true; } Thanks, > > Thanks, > > > > >> > >> Thanks, > >> > >>> > >>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001 > >>> From: Jaegeuk Kim <jaeg...@kernel.org> > >>> Date: Thu, 9 Aug 2018 17:53:34 -0700 > >>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread > >>> sequential read > >>> > >>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" > >>> to fix the drop in sequential read throughput. > >>> > >>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L > >>> device: UFS > >>> > >>> Before - > >>> read throughput: 185 MB/s > >>> total read requests: 85177 (of these ~80000 are 4KB size requests). > >>> total write requests: 2546 (of these ~2208 requests are written in 512KB). > >>> > >>> After - > >>> read throughput: 758 MB/s > >>> total read requests: 2417 (of these ~2042 are 512KB reads). > >>> total write requests: 2701 (of these ~2034 requests are written in 512KB). > >>> > >>> Signed-off-by: Sahitya Tummala <stumm...@codeaurora.org> > >>> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org> > >>> --- > >>> Documentation/ABI/testing/sysfs-fs-f2fs | 8 ++++++++ > >>> fs/f2fs/data.c | 10 ++++++++++ > >>> fs/f2fs/f2fs.h | 2 ++ > >>> fs/f2fs/segment.c | 1 + > >>> fs/f2fs/super.c | 1 + > >>> fs/f2fs/sysfs.c | 2 ++ > >>> 6 files changed, 24 insertions(+) > >>> > >>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs > >>> b/Documentation/ABI/testing/sysfs-fs-f2fs > >>> index 9b0123388f18..94a24aedcdb2 100644 > >>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs > >>> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs > >>> @@ -51,6 +51,14 @@ Description: > >>> Controls the dirty page count condition for the in-place-update > >>> policies. > >>> > >>> +What: /sys/fs/f2fs/<disk>/min_seq_blocks > >>> +Date: August 2018 > >>> +Contact: "Jaegeuk Kim" <jaeg...@kernel.org> > >>> +Description: > >>> + Controls the dirty page count condition for batched sequential > >>> + writes in ->writepages. > >>> + > >>> + > >>> What: /sys/fs/f2fs/<disk>/min_hot_blocks > >>> Date: March 2017 > >>> Contact: "Jaegeuk Kim" <jaeg...@kernel.org> > >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > >>> index 45f043ee48bd..f09231b1cc74 100644 > >>> --- a/fs/f2fs/data.c > >>> +++ b/fs/f2fs/data.c > >>> @@ -2132,6 +2132,7 @@ static int __f2fs_write_data_pages(struct > >>> address_space *mapping, > >>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); > >>> struct blk_plug plug; > >>> int ret; > >>> + bool locked = false; > >>> > >>> /* deal with chardevs and other special file */ > >>> if (!mapping->a_ops->writepage) > >>> @@ -2162,10 +2163,19 @@ static int __f2fs_write_data_pages(struct > >>> address_space *mapping, > >>> else if (atomic_read(&sbi->wb_sync_req[DATA])) > >>> goto skip_write; > >>> > >>> + if (!S_ISDIR(inode->i_mode) && > >>> + get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks) { > >>> + mutex_lock(&sbi->writepages); > >>> + locked = true; > >>> + } > >>> + > >>> blk_start_plug(&plug); > >>> ret = f2fs_write_cache_pages(mapping, wbc, io_type); > >>> blk_finish_plug(&plug); > >>> > >>> + if (locked) > >>> + mutex_unlock(&sbi->writepages); > >>> + > >>> if (wbc->sync_mode == WB_SYNC_ALL) > >>> atomic_dec(&sbi->wb_sync_req[DATA]); > >>> /* > >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > >>> index 375aa9f30cfa..098bdedc28bf 100644 > >>> --- a/fs/f2fs/f2fs.h > >>> +++ b/fs/f2fs/f2fs.h > >>> @@ -913,6 +913,7 @@ struct f2fs_sm_info { > >>> unsigned int ipu_policy; /* in-place-update policy */ > >>> unsigned int min_ipu_util; /* in-place-update threshold */ > >>> unsigned int min_fsync_blocks; /* threshold for fsync */ > >>> + unsigned int min_seq_blocks; /* threshold for sequential blocks */ > >>> unsigned int min_hot_blocks; /* threshold for hot block allocation */ > >>> unsigned int min_ssr_sections; /* threshold to trigger SSR allocation > >>> */ > >>> > >>> @@ -1133,6 +1134,7 @@ struct f2fs_sb_info { > >>> struct rw_semaphore sb_lock; /* lock for raw super block */ > >>> int valid_super_block; /* valid super block no */ > >>> unsigned long s_flag; /* flags for sbi */ > >>> + struct mutex writepages; /* mutex for writepages() */ > >>> > >>> #ifdef CONFIG_BLK_DEV_ZONED > >>> unsigned int blocks_per_blkz; /* F2FS blocks per zone */ > >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > >>> index 63fc647f9ac2..ffea2d1303bd 100644 > >>> --- a/fs/f2fs/segment.c > >>> +++ b/fs/f2fs/segment.c > >>> @@ -4131,6 +4131,7 @@ int f2fs_build_segment_manager(struct f2fs_sb_info > >>> *sbi) > >>> sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC; > >>> sm_info->min_ipu_util = DEF_MIN_IPU_UTIL; > >>> sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS; > >>> + sm_info->min_seq_blocks = sbi->blocks_per_seg * sbi->segs_per_sec; > >>> sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS; > >>> sm_info->min_ssr_sections = reserved_sections(sbi); > >>> > >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > >>> index be41dbd7b261..53d70b64fea1 100644 > >>> --- a/fs/f2fs/super.c > >>> +++ b/fs/f2fs/super.c > >>> @@ -2842,6 +2842,7 @@ static int f2fs_fill_super(struct super_block *sb, > >>> void *data, int silent) > >>> /* init f2fs-specific super block info */ > >>> sbi->valid_super_block = valid_super_block; > >>> mutex_init(&sbi->gc_mutex); > >>> + mutex_init(&sbi->writepages); > >>> mutex_init(&sbi->cp_mutex); > >>> init_rwsem(&sbi->node_write); > >>> init_rwsem(&sbi->node_change); > >>> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c > >>> index cd2e030e47b8..81c0e5337443 100644 > >>> --- a/fs/f2fs/sysfs.c > >>> +++ b/fs/f2fs/sysfs.c > >>> @@ -397,6 +397,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, > >>> batched_trim_sections, trim_sections); > >>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy); > >>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util); > >>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks); > >>> +F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_seq_blocks, min_seq_blocks); > >>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_hot_blocks, min_hot_blocks); > >>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ssr_sections, min_ssr_sections); > >>> F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh); > >>> @@ -449,6 +450,7 @@ static struct attribute *f2fs_attrs[] = { > >>> ATTR_LIST(ipu_policy), > >>> ATTR_LIST(min_ipu_util), > >>> ATTR_LIST(min_fsync_blocks), > >>> + ATTR_LIST(min_seq_blocks), > >>> ATTR_LIST(min_hot_blocks), > >>> ATTR_LIST(min_ssr_sections), > >>> ATTR_LIST(max_victim_search), > >>> > > > > . > >