[PATCH] f2fs: remove redundant comment of unused wio_mutex
Commit 089842de ("f2fs: remove codes of unused wio_mutex") removes codes of unused wio_mutex, but missing the comment, so delete it. Signed-off-by: Yunlong Song --- fs/f2fs/f2fs.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 7cec897..03e7b37 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1175,7 +1175,6 @@ struct f2fs_sb_info { /* for bio operations */ struct f2fs_bio_info *write_io[NR_PAGE_TYPE]; /* for write bios */ - /* bio ordering for NODE/DATA */ /* keep migration IO order for LFS mode */ struct rw_semaphore io_order_lock; mempool_t *write_io_dummy; /* Dummy pages */ -- 1.8.5.2
[PATCH v2] f2fs: remove codes of unused wio_mutex
v1->v2: delete comments in f2fs.h: "/* bio ordering for NODE/DATA */" Signed-off-by: Yunlong Song Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim --- fs/f2fs/f2fs.h | 2 -- fs/f2fs/super.c | 5 + 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 1e03197..195850e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1170,8 +1170,6 @@ struct f2fs_sb_info { /* for bio operations */ struct f2fs_bio_info *write_io[NR_PAGE_TYPE]; /* for write bios */ - struct mutex wio_mutex[NR_PAGE_TYPE - 1][NR_TEMP_TYPE]; - /* bio ordering for NODE/DATA */ /* keep migration IO order for LFS mode */ struct rw_semaphore io_order_lock; mempool_t *write_io_dummy; /* Dummy pages */ diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index af58b2c..2d18de5 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2674,7 +2674,7 @@ int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi) static void init_sb_info(struct f2fs_sb_info *sbi) { struct f2fs_super_block *raw_super = sbi->raw_super; - int i, j; + int i; sbi->log_sectors_per_block = le32_to_cpu(raw_super->log_sectors_per_block); @@ -2710,9 +2710,6 @@ static void init_sb_info(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>s_list); mutex_init(>umount_mutex); - for (i = 0; i < NR_PAGE_TYPE - 1; i++) - for (j = HOT; j < NR_TEMP_TYPE; j++) - mutex_init(>wio_mutex[i][j]); init_rwsem(>io_order_lock); spin_lock_init(>cp_lock); -- 1.8.5.2
[PATCH] f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take section:segment = 5 for example, if the section prefree_map is ...previous section | current section (1 1 0 1 1) | next section..., then the start = x, end = x + 1, after start = start_segno + sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but their bitmap is still set, which will cause duplicated f2fs_issue_discard of this same section in the next write_checkpoint: round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0 then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0 write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0, prefree of NO.2 is cleared, and no discard issued round 2: rm data block NO.0, NO.1, NO.3, NO.4 all invalid, but prefree bit of NO.2 is set and cleared in round 1, then prefree_map: 1 1 0 1 1 write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no valid blocks of this section, so discard issued, but this time prefree bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec; round 3: write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 -> 0 0 0 0 0, no valid blocks of this section, so discard issued, this time prefree bit of NO.3 and NO.4 is cleared, but the discard of this section is sent again... To fix this problem, we can align the start and end value to section boundary for fstrim and real-time discard operation, and decide to issue discard only when the whole section is invalid, which can issue discard aligned to section size as much as possible and avoid redundant discard. Signed-off-by: Yunlong Song Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index cfff7cf..ff5de34 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1710,6 +1710,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, unsigned int start = 0, end = -1; unsigned int secno, start_segno; bool force = (cpc->reason & CP_DISCARD); + bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1; mutex_lock(_i->seglist_lock); @@ -1721,14 +1722,19 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi), start + 1); - for (i = start; i < end; i++) - clear_bit(i, prefree_map); - - dirty_i->nr_dirty[PRE] -= end - start; - if (!test_opt(sbi, DISCARD)) continue; + if (need_align) { + start = rounddown(start, sbi->segs_per_sec); + end = roundup(end, sbi->segs_per_sec); + } + + for (i = start; i < end; i++) { + if (test_and_clear_bit(i, prefree_map)) + dirty_i->nr_dirty[PRE]--; + } + if (force && start >= cpc->trim_start && (end - 1) <= cpc->trim_end) continue; @@ -2511,6 +2517,7 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) struct discard_policy dpolicy; unsigned long long trimmed = 0; int err = 0; + bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1; if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize) return -EINVAL; @@ -2528,6 +2535,13 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) start_segno = (start <= MAIN_BLKADDR(sbi)) ? 0 : GET_SEGNO(sbi, start); end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 : GET_SEGNO(sbi, end); + if (need_align) { + start_segno = rounddown(start_segno, sbi->segs_per_sec); + end_segno = roundup(end_segno, sbi->segs_per_sec); + if (start_segno == end_segno) + end_segno += sbi->segs_per_sec; + end_segno--; + } cpc.reason = CP_DISCARD; cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen)); -- 1.8.5.2
[PATCH] f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take section:segment = 5 for example, if the section prefree_map is ...previous section | current section (1 1 0 1 1) | next section..., then the start = x, end = x + 1, after start = start_segno + sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but their bitmap is still set, which will cause duplicated f2fs_issue_discard of this same section in the next write_checkpoint: round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0 then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0 write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0, prefree of NO.2 is cleared, and no discard issued round 2: rm data block NO.0, NO.1, NO.3, NO.4 all invalid, but prefree bit of NO.2 is set and cleared in round 1, then prefree_map: 1 1 0 1 1 write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no valid blocks of this section, so discard issued, but this time prefree bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec; round 3: write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 -> 0 0 0 0 0, no valid blocks of this section, so discard issued, this time prefree bit of NO.3 and NO.4 is cleared, but the discard of this section is sent again... To fix this problem, we can align the start and end value to section boundary for fstrim and real-time discard operation, and decide to issue discard only when the whole section is invalid, which can issue discard aligned to section size as much as possible and avoid redundant discard. Signed-off-by: Yunlong Song Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index cfff7cf..ff5de34 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1710,6 +1710,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, unsigned int start = 0, end = -1; unsigned int secno, start_segno; bool force = (cpc->reason & CP_DISCARD); + bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1; mutex_lock(_i->seglist_lock); @@ -1721,14 +1722,19 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi), start + 1); - for (i = start; i < end; i++) - clear_bit(i, prefree_map); - - dirty_i->nr_dirty[PRE] -= end - start; - if (!test_opt(sbi, DISCARD)) continue; + if (need_align) { + start = rounddown(start, sbi->segs_per_sec); + end = roundup(end, sbi->segs_per_sec); + } + + for (i = start; i < end; i++) { + if (test_and_clear_bit(i, prefree_map)) + dirty_i->nr_dirty[PRE]--; + } + if (force && start >= cpc->trim_start && (end - 1) <= cpc->trim_end) continue; @@ -2511,6 +2517,7 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) struct discard_policy dpolicy; unsigned long long trimmed = 0; int err = 0; + bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1; if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize) return -EINVAL; @@ -2528,6 +2535,13 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) start_segno = (start <= MAIN_BLKADDR(sbi)) ? 0 : GET_SEGNO(sbi, start); end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 : GET_SEGNO(sbi, end); + if (need_align) { + start_segno = rounddown(start_segno, sbi->segs_per_sec); + end_segno = roundup(end_segno, sbi->segs_per_sec); + if (start_segno == end_segno) + end_segno += sbi->segs_per_sec; + end_segno--; + } cpc.reason = CP_DISCARD; cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen)); -- 1.8.5.2
Re: [f2fs-dev] [PATCH] f2fs-tools: fix overflow bug of start_sector when computing zone_align_start_offset
Just keep it same with u_int64_t defined in mkfs/f2fs_format.c, and c.start_sector * c.sector_size may be u32 overflow, so add (u_int64_t) before c.start_sector * c.sector_size and change the target value zone_align_start_offset to (u_int64_t). On 2018/5/26 19:27, Junling Zheng wrote: No neet to change zone_align_start_offset to u64, because zone_align_start_offset is always smaller than zone_size_bytes, which is u32. Thanks, Junling On 2018/5/26 16:09, Yunlong Song wrote: zone_align_start_offset should be u64, but config.start_sector is u32, so it may be overflow when computing zone_align_start_offset. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fsck/resize.c | 7 --- mkfs/f2fs_format.c | 4 ++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..8ac7d45 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -11,7 +11,8 @@ static int get_new_sb(struct f2fs_super_block *sb) { - u_int32_t zone_size_bytes, zone_align_start_offset; + u_int32_t zone_size_bytes; + u_int64_t zone_align_start_offset; u_int32_t blocks_for_sit, blocks_for_nat, blocks_for_ssa; u_int32_t sit_segments, nat_segments, diff, total_meta_segments; u_int32_t total_valid_blks_available; @@ -27,10 +28,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..f045e23 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,10 +212,10 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; if (c.start_sector % c.sectors_per_blk) { MSG(1, "\t%s: Align start sector number to the page unit\n", . -- Thanks, Yunlong Song
Re: [f2fs-dev] [PATCH] f2fs-tools: fix overflow bug of start_sector when computing zone_align_start_offset
Just keep it same with u_int64_t defined in mkfs/f2fs_format.c, and c.start_sector * c.sector_size may be u32 overflow, so add (u_int64_t) before c.start_sector * c.sector_size and change the target value zone_align_start_offset to (u_int64_t). On 2018/5/26 19:27, Junling Zheng wrote: No neet to change zone_align_start_offset to u64, because zone_align_start_offset is always smaller than zone_size_bytes, which is u32. Thanks, Junling On 2018/5/26 16:09, Yunlong Song wrote: zone_align_start_offset should be u64, but config.start_sector is u32, so it may be overflow when computing zone_align_start_offset. Signed-off-by: Yunlong Song --- fsck/resize.c | 7 --- mkfs/f2fs_format.c | 4 ++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..8ac7d45 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -11,7 +11,8 @@ static int get_new_sb(struct f2fs_super_block *sb) { - u_int32_t zone_size_bytes, zone_align_start_offset; + u_int32_t zone_size_bytes; + u_int64_t zone_align_start_offset; u_int32_t blocks_for_sit, blocks_for_nat, blocks_for_ssa; u_int32_t sit_segments, nat_segments, diff, total_meta_segments; u_int32_t total_valid_blks_available; @@ -27,10 +28,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..f045e23 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,10 +212,10 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; if (c.start_sector % c.sectors_per_blk) { MSG(1, "\t%s: Align start sector number to the page unit\n", . -- Thanks, Yunlong Song
Re: [PATCH v2] f2fs-tools: fix to match with the start_sector
ping... On 2018/5/7 10:15, Yunlong Song wrote: f2fs-tools uses ioctl BLKSSZGET to get sector_size, however, this ioctl will return a value which may be larger than 512 (according to the value of q->limits.logical_block_size), then this will be inconsistent with the start_sector, since start_sector is got from ioctl HDIO_GETGEO and is always in 512 size unit for a sector. To fix this problem, just change the sector_size to the default value when computing with start_sector. And fix sectors_per_blk as well. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fsck/resize.c | 4 ++-- mkfs/f2fs_format.c | 12 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..ada2155 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -27,10 +27,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..ced5fea 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,18 +212,18 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; - if (c.start_sector % c.sectors_per_blk) { + if (c.start_sector % DEFAULT_SECTORS_PER_BLOCK) { MSG(1, "\t%s: Align start sector number to the page unit\n", c.zoned_mode ? "FAIL" : "WARN"); MSG(1, "\ti.e., start sector: %d, ofs:%d (sects/page: %d)\n", c.start_sector, - c.start_sector % c.sectors_per_blk, - c.sectors_per_blk); + c.start_sector % DEFAULT_SECTORS_PER_BLOCK, + DEFAULT_SECTORS_PER_BLOCK); if (c.zoned_mode) return -1; } @@ -235,7 +235,7 @@ static int f2fs_prepare_super_block(void) get_sb(segment0_blkaddr)); if (c.zoned_mode && (get_sb(segment0_blkaddr) + c.start_sector / - c.sectors_per_blk) % c.zone_blocks) { + DEFAULT_SECTORS_PER_BLOCK) % c.zone_blocks) { MSG(1, "\tError: Unaligned segment0 block address %u\n", get_sb(segment0_blkaddr)); return -1; -- Thanks, Yunlong Song
Re: [PATCH v2] f2fs-tools: fix to match with the start_sector
ping... On 2018/5/7 10:15, Yunlong Song wrote: f2fs-tools uses ioctl BLKSSZGET to get sector_size, however, this ioctl will return a value which may be larger than 512 (according to the value of q->limits.logical_block_size), then this will be inconsistent with the start_sector, since start_sector is got from ioctl HDIO_GETGEO and is always in 512 size unit for a sector. To fix this problem, just change the sector_size to the default value when computing with start_sector. And fix sectors_per_blk as well. Signed-off-by: Yunlong Song --- fsck/resize.c | 4 ++-- mkfs/f2fs_format.c | 12 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..ada2155 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -27,10 +27,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..ced5fea 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,18 +212,18 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; - if (c.start_sector % c.sectors_per_blk) { + if (c.start_sector % DEFAULT_SECTORS_PER_BLOCK) { MSG(1, "\t%s: Align start sector number to the page unit\n", c.zoned_mode ? "FAIL" : "WARN"); MSG(1, "\ti.e., start sector: %d, ofs:%d (sects/page: %d)\n", c.start_sector, - c.start_sector % c.sectors_per_blk, - c.sectors_per_blk); + c.start_sector % DEFAULT_SECTORS_PER_BLOCK, + DEFAULT_SECTORS_PER_BLOCK); if (c.zoned_mode) return -1; } @@ -235,7 +235,7 @@ static int f2fs_prepare_super_block(void) get_sb(segment0_blkaddr)); if (c.zoned_mode && (get_sb(segment0_blkaddr) + c.start_sector / - c.sectors_per_blk) % c.zone_blocks) { + DEFAULT_SECTORS_PER_BLOCK) % c.zone_blocks) { MSG(1, "\tError: Unaligned segment0 block address %u\n", get_sb(segment0_blkaddr)); return -1; -- Thanks, Yunlong Song
[PATCH] f2fs-tools: fix overflow bug of start_sector when computing zone_align_start_offset
zone_align_start_offset should be u64, but config.start_sector is u32, so it may be overflow when computing zone_align_start_offset. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fsck/resize.c | 7 --- mkfs/f2fs_format.c | 4 ++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..8ac7d45 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -11,7 +11,8 @@ static int get_new_sb(struct f2fs_super_block *sb) { - u_int32_t zone_size_bytes, zone_align_start_offset; + u_int32_t zone_size_bytes; + u_int64_t zone_align_start_offset; u_int32_t blocks_for_sit, blocks_for_nat, blocks_for_ssa; u_int32_t sit_segments, nat_segments, diff, total_meta_segments; u_int32_t total_valid_blks_available; @@ -27,10 +28,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..f045e23 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,10 +212,10 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; if (c.start_sector % c.sectors_per_blk) { MSG(1, "\t%s: Align start sector number to the page unit\n", -- 1.8.5.2
[PATCH] f2fs-tools: fix overflow bug of start_sector when computing zone_align_start_offset
zone_align_start_offset should be u64, but config.start_sector is u32, so it may be overflow when computing zone_align_start_offset. Signed-off-by: Yunlong Song --- fsck/resize.c | 7 --- mkfs/f2fs_format.c | 4 ++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..8ac7d45 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -11,7 +11,8 @@ static int get_new_sb(struct f2fs_super_block *sb) { - u_int32_t zone_size_bytes, zone_align_start_offset; + u_int32_t zone_size_bytes; + u_int64_t zone_align_start_offset; u_int32_t blocks_for_sit, blocks_for_nat, blocks_for_ssa; u_int32_t sit_segments, nat_segments, diff, total_meta_segments; u_int32_t total_valid_blks_available; @@ -27,10 +28,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..f045e23 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,10 +212,10 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + ((u_int64_t) c.start_sector * c.sector_size + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + (u_int64_t) c.start_sector * c.sector_size; if (c.start_sector % c.sectors_per_blk) { MSG(1, "\t%s: Align start sector number to the page unit\n", -- 1.8.5.2
[PATCH v2] f2fs-tools: fix to match with the start_sector
f2fs-tools uses ioctl BLKSSZGET to get sector_size, however, this ioctl will return a value which may be larger than 512 (according to the value of q->limits.logical_block_size), then this will be inconsistent with the start_sector, since start_sector is got from ioctl HDIO_GETGEO and is always in 512 size unit for a sector. To fix this problem, just change the sector_size to the default value when computing with start_sector. And fix sectors_per_blk as well. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fsck/resize.c | 4 ++-- mkfs/f2fs_format.c | 12 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..ada2155 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -27,10 +27,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..ced5fea 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,18 +212,18 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; - if (c.start_sector % c.sectors_per_blk) { + if (c.start_sector % DEFAULT_SECTORS_PER_BLOCK) { MSG(1, "\t%s: Align start sector number to the page unit\n", c.zoned_mode ? "FAIL" : "WARN"); MSG(1, "\ti.e., start sector: %d, ofs:%d (sects/page: %d)\n", c.start_sector, - c.start_sector % c.sectors_per_blk, - c.sectors_per_blk); + c.start_sector % DEFAULT_SECTORS_PER_BLOCK, + DEFAULT_SECTORS_PER_BLOCK); if (c.zoned_mode) return -1; } @@ -235,7 +235,7 @@ static int f2fs_prepare_super_block(void) get_sb(segment0_blkaddr)); if (c.zoned_mode && (get_sb(segment0_blkaddr) + c.start_sector / - c.sectors_per_blk) % c.zone_blocks) { + DEFAULT_SECTORS_PER_BLOCK) % c.zone_blocks) { MSG(1, "\tError: Unaligned segment0 block address %u\n", get_sb(segment0_blkaddr)); return -1; -- 1.8.5.2
[PATCH v2] f2fs-tools: fix to match with the start_sector
f2fs-tools uses ioctl BLKSSZGET to get sector_size, however, this ioctl will return a value which may be larger than 512 (according to the value of q->limits.logical_block_size), then this will be inconsistent with the start_sector, since start_sector is got from ioctl HDIO_GETGEO and is always in 512 size unit for a sector. To fix this problem, just change the sector_size to the default value when computing with start_sector. And fix sectors_per_blk as well. Signed-off-by: Yunlong Song --- fsck/resize.c | 4 ++-- mkfs/f2fs_format.c | 12 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fsck/resize.c b/fsck/resize.c index d285dd7..ada2155 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -27,10 +27,10 @@ static int get_new_sb(struct f2fs_super_block *sb) zone_size_bytes = segment_size_bytes * segs_per_zone; zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; set_sb(segment_count, (c.target_sectors * c.sector_size - zone_align_start_offset) / segment_size_bytes / diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a99a77..ced5fea 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -212,18 +212,18 @@ static int f2fs_prepare_super_block(void) set_sb(block_count, c.total_sectors >> log_sectors_per_block); zone_align_start_offset = - (c.start_sector * c.sector_size + + (c.start_sector * DEFAULT_SECTOR_SIZE + 2 * F2FS_BLKSIZE + zone_size_bytes - 1) / zone_size_bytes * zone_size_bytes - - c.start_sector * c.sector_size; + c.start_sector * DEFAULT_SECTOR_SIZE; - if (c.start_sector % c.sectors_per_blk) { + if (c.start_sector % DEFAULT_SECTORS_PER_BLOCK) { MSG(1, "\t%s: Align start sector number to the page unit\n", c.zoned_mode ? "FAIL" : "WARN"); MSG(1, "\ti.e., start sector: %d, ofs:%d (sects/page: %d)\n", c.start_sector, - c.start_sector % c.sectors_per_blk, - c.sectors_per_blk); + c.start_sector % DEFAULT_SECTORS_PER_BLOCK, + DEFAULT_SECTORS_PER_BLOCK); if (c.zoned_mode) return -1; } @@ -235,7 +235,7 @@ static int f2fs_prepare_super_block(void) get_sb(segment0_blkaddr)); if (c.zoned_mode && (get_sb(segment0_blkaddr) + c.start_sector / - c.sectors_per_blk) % c.zone_blocks) { + DEFAULT_SECTORS_PER_BLOCK) % c.zone_blocks) { MSG(1, "\tError: Unaligned segment0 block address %u\n", get_sb(segment0_blkaddr)); return -1; -- 1.8.5.2
[PATCH] f2fs-tools: fix the sector_size to default value
f2fs-tools uses ioctl BLKSSZGET to get sector_size, however, this ioctl will return a value which may be larger than 512 (according to the value of q->limits.logical_block_size), then this will be inconsistent with the start_sector, since start_sector is got from ioctl HDIO_GETGEO and is always in 512 size unit for a sector. To fix this problem, just set the sector_size to the default value 512. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- lib/libf2fs.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/lib/libf2fs.c b/lib/libf2fs.c index 102e579..e160f2a 100644 --- a/lib/libf2fs.c +++ b/lib/libf2fs.c @@ -768,7 +768,6 @@ void get_kernel_uname_version(__u8 *version) int get_device_info(int i) { int32_t fd = 0; - uint32_t sector_size; #ifndef BLKGETSIZE64 uint32_t total_sectors; #endif @@ -822,12 +821,6 @@ int get_device_info(int i) } else if (S_ISREG(stat_buf->st_mode)) { dev->total_sectors = stat_buf->st_size / dev->sector_size; } else if (S_ISBLK(stat_buf->st_mode)) { -#ifdef BLKSSZGET - if (ioctl(fd, BLKSSZGET, _size) < 0) - MSG(0, "\tError: Using the default sector size\n"); - else if (dev->sector_size < sector_size) - dev->sector_size = sector_size; -#endif #ifdef BLKGETSIZE64 if (ioctl(fd, BLKGETSIZE64, >total_sectors) < 0) { MSG(0, "\tError: Cannot get the device size\n"); -- 1.8.5.2
[PATCH] f2fs-tools: fix the sector_size to default value
f2fs-tools uses ioctl BLKSSZGET to get sector_size, however, this ioctl will return a value which may be larger than 512 (according to the value of q->limits.logical_block_size), then this will be inconsistent with the start_sector, since start_sector is got from ioctl HDIO_GETGEO and is always in 512 size unit for a sector. To fix this problem, just set the sector_size to the default value 512. Signed-off-by: Yunlong Song --- lib/libf2fs.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/lib/libf2fs.c b/lib/libf2fs.c index 102e579..e160f2a 100644 --- a/lib/libf2fs.c +++ b/lib/libf2fs.c @@ -768,7 +768,6 @@ void get_kernel_uname_version(__u8 *version) int get_device_info(int i) { int32_t fd = 0; - uint32_t sector_size; #ifndef BLKGETSIZE64 uint32_t total_sectors; #endif @@ -822,12 +821,6 @@ int get_device_info(int i) } else if (S_ISREG(stat_buf->st_mode)) { dev->total_sectors = stat_buf->st_size / dev->sector_size; } else if (S_ISBLK(stat_buf->st_mode)) { -#ifdef BLKSSZGET - if (ioctl(fd, BLKSSZGET, _size) < 0) - MSG(0, "\tError: Using the default sector size\n"); - else if (dev->sector_size < sector_size) - dev->sector_size = sector_size; -#endif #ifdef BLKGETSIZE64 if (ioctl(fd, BLKGETSIZE64, >total_sectors) < 0) { MSG(0, "\tError: Cannot get the device size\n"); -- 1.8.5.2
Re: [PATCH] f2fs: fix the way to wake up issue_flush thread
Please avoid this patch, I make a mistake. On 2018/5/3 15:45, Yunlong Song wrote: Commit 6f890df0 ("f2fs: fix out-of-order execution in f2fs_issue_flush") uses waitqueue_active to wake up issue_flush thread, but there is no wait entry to wake in this queue, so change it back to use the original fcc->dispatch_list to wake up issue_flush thread. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/segment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 503a98a..5aa5ee4e 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -612,7 +612,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino) /* update issue_list before we wake up issue_flush thread */ smp_mb(); - if (waitqueue_active(>flush_wait_queue)) + if (!fcc->dispatch_list) wake_up(>flush_wait_queue); if (fcc->f2fs_issue_flush) { -- Thanks, Yunlong Song
Re: [PATCH] f2fs: fix the way to wake up issue_flush thread
Please avoid this patch, I make a mistake. On 2018/5/3 15:45, Yunlong Song wrote: Commit 6f890df0 ("f2fs: fix out-of-order execution in f2fs_issue_flush") uses waitqueue_active to wake up issue_flush thread, but there is no wait entry to wake in this queue, so change it back to use the original fcc->dispatch_list to wake up issue_flush thread. Signed-off-by: Yunlong Song --- fs/f2fs/segment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 503a98a..5aa5ee4e 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -612,7 +612,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino) /* update issue_list before we wake up issue_flush thread */ smp_mb(); - if (waitqueue_active(>flush_wait_queue)) + if (!fcc->dispatch_list) wake_up(>flush_wait_queue); if (fcc->f2fs_issue_flush) { -- Thanks, Yunlong Song
[PATCH] f2fs: fix the way to wake up issue_flush thread
Commit 6f890df0 ("f2fs: fix out-of-order execution in f2fs_issue_flush") uses waitqueue_active to wake up issue_flush thread, but there is no wait entry to wake in this queue, so change it back to use the original fcc->dispatch_list to wake up issue_flush thread. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/segment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 503a98a..5aa5ee4e 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -612,7 +612,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino) /* update issue_list before we wake up issue_flush thread */ smp_mb(); - if (waitqueue_active(>flush_wait_queue)) + if (!fcc->dispatch_list) wake_up(>flush_wait_queue); if (fcc->f2fs_issue_flush) { -- 1.8.5.2
[PATCH] f2fs: fix the way to wake up issue_flush thread
Commit 6f890df0 ("f2fs: fix out-of-order execution in f2fs_issue_flush") uses waitqueue_active to wake up issue_flush thread, but there is no wait entry to wake in this queue, so change it back to use the original fcc->dispatch_list to wake up issue_flush thread. Signed-off-by: Yunlong Song --- fs/f2fs/segment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 503a98a..5aa5ee4e 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -612,7 +612,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino) /* update issue_list before we wake up issue_flush thread */ smp_mb(); - if (waitqueue_active(>flush_wait_queue)) + if (!fcc->dispatch_list) wake_up(>flush_wait_queue); if (fcc->f2fs_issue_flush) { -- 1.8.5.2
[PATCH] f2fs: remove unmatched zero_user_segment when convert inline dentry
Since the layout of regular dentry block is different from inline dentry block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not correct for regular dentry block, besides, bitmap is already copied and used, so there is no necessary to zero page at all, so just remove the zero_user_segment is OK. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/inline.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 3b77d64..573ec2f 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -367,7 +367,6 @@ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, goto out; f2fs_wait_on_page_writeback(page, DATA, true); - zero_user_segment(page, MAX_INLINE_DATA(dir), PAGE_SIZE); dentry_blk = page_address(page); -- 1.8.5.2
[PATCH] f2fs: remove unmatched zero_user_segment when convert inline dentry
Since the layout of regular dentry block is different from inline dentry block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not correct for regular dentry block, besides, bitmap is already copied and used, so there is no necessary to zero page at all, so just remove the zero_user_segment is OK. Signed-off-by: Yunlong Song --- fs/f2fs/inline.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 3b77d64..573ec2f 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -367,7 +367,6 @@ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, goto out; f2fs_wait_on_page_writeback(page, DATA, true); - zero_user_segment(page, MAX_INLINE_DATA(dir), PAGE_SIZE); dentry_blk = page_address(page); -- 1.8.5.2
[PATCH] f2fs: make assignment of t->dentry_bitmap more readable
In make_dentry_ptr_block, it is confused with "&" for t->dentry_bitmap but without "&" for t->dentry, so delete "&" to make code more readable. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/f2fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 191ee57..474b9e9 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -471,7 +471,7 @@ static inline void make_dentry_ptr_block(struct inode *inode, d->inode = inode; d->max = NR_DENTRY_IN_BLOCK; d->nr_bitmap = SIZE_OF_DENTRY_BITMAP; - d->bitmap = >dentry_bitmap; + d->bitmap = t->dentry_bitmap; d->dentry = t->dentry; d->filename = t->filename; } -- 1.8.5.2
[PATCH] f2fs: make assignment of t->dentry_bitmap more readable
In make_dentry_ptr_block, it is confused with "&" for t->dentry_bitmap but without "&" for t->dentry, so delete "&" to make code more readable. Signed-off-by: Yunlong Song --- fs/f2fs/f2fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 191ee57..474b9e9 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -471,7 +471,7 @@ static inline void make_dentry_ptr_block(struct inode *inode, d->inode = inode; d->max = NR_DENTRY_IN_BLOCK; d->nr_bitmap = SIZE_OF_DENTRY_BITMAP; - d->bitmap = >dentry_bitmap; + d->bitmap = t->dentry_bitmap; d->dentry = t->dentry; d->filename = t->filename; } -- 1.8.5.2
[PATCH v2] f2fs: no need to initialize zero value for GFP_F2FS_ZERO
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not need to initialize zero value for its member any more. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/super.c | 5 - 1 file changed, 5 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0c1fe9b..42d564c 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -827,7 +827,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) /* Initialize f2fs-specific inode info */ atomic_set(>dirty_pages, 0); fi->i_current_depth = 1; - fi->i_advise = 0; init_rwsem(>i_sem); INIT_LIST_HEAD(>dirty_list); INIT_LIST_HEAD(>gdirty_list); @@ -839,10 +838,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) init_rwsem(>i_mmap_sem); init_rwsem(>i_xattr_sem); -#ifdef CONFIG_QUOTA - memset(>i_dquot, 0, sizeof(fi->i_dquot)); - fi->i_reserved_quota = 0; -#endif /* Will be used by directory only */ fi->i_dir_level = F2FS_SB(sb)->dir_level; -- 1.8.5.2
[PATCH v2] f2fs: no need to initialize zero value for GFP_F2FS_ZERO
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not need to initialize zero value for its member any more. Signed-off-by: Yunlong Song --- fs/f2fs/super.c | 5 - 1 file changed, 5 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0c1fe9b..42d564c 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -827,7 +827,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) /* Initialize f2fs-specific inode info */ atomic_set(>dirty_pages, 0); fi->i_current_depth = 1; - fi->i_advise = 0; init_rwsem(>i_sem); INIT_LIST_HEAD(>dirty_list); INIT_LIST_HEAD(>gdirty_list); @@ -839,10 +838,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) init_rwsem(>i_mmap_sem); init_rwsem(>i_xattr_sem); -#ifdef CONFIG_QUOTA - memset(>i_dquot, 0, sizeof(fi->i_dquot)); - fi->i_reserved_quota = 0; -#endif /* Will be used by directory only */ fi->i_dir_level = F2FS_SB(sb)->dir_level; -- 1.8.5.2
[PATCH] f2fs: no need to initialize zero value for GFP_F2FS_ZERO
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not need to initialize zero value for its member any more. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/super.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0c1fe9b..3a7fa03 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -825,9 +825,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) init_once((void *) fi); /* Initialize f2fs-specific inode info */ - atomic_set(>dirty_pages, 0); fi->i_current_depth = 1; - fi->i_advise = 0; init_rwsem(>i_sem); INIT_LIST_HEAD(>dirty_list); INIT_LIST_HEAD(>gdirty_list); @@ -839,10 +837,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) init_rwsem(>i_mmap_sem); init_rwsem(>i_xattr_sem); -#ifdef CONFIG_QUOTA - memset(>i_dquot, 0, sizeof(fi->i_dquot)); - fi->i_reserved_quota = 0; -#endif /* Will be used by directory only */ fi->i_dir_level = F2FS_SB(sb)->dir_level; -- 1.8.5.2
[PATCH] f2fs: no need to initialize zero value for GFP_F2FS_ZERO
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not need to initialize zero value for its member any more. Signed-off-by: Yunlong Song --- fs/f2fs/super.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0c1fe9b..3a7fa03 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -825,9 +825,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) init_once((void *) fi); /* Initialize f2fs-specific inode info */ - atomic_set(>dirty_pages, 0); fi->i_current_depth = 1; - fi->i_advise = 0; init_rwsem(>i_sem); INIT_LIST_HEAD(>dirty_list); INIT_LIST_HEAD(>gdirty_list); @@ -839,10 +837,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) init_rwsem(>i_mmap_sem); init_rwsem(>i_xattr_sem); -#ifdef CONFIG_QUOTA - memset(>i_dquot, 0, sizeof(fi->i_dquot)); - fi->i_reserved_quota = 0; -#endif /* Will be used by directory only */ fi->i_dir_level = F2FS_SB(sb)->dir_level; -- 1.8.5.2
Re: [PATCH] f2fs: don't put dentry page in pagecache into highmem
OK, got it. On 2018/3/19 14:23, Jaegeuk Kim wrote: On 03/19, Yunlong Song wrote: Hi, Jaegeuk, I find this patch is removed from current branch of dev-test recently, why? Any bugs? Moved into the beginning of the tree for cherry-picking into f2fs-stable. Thanks, On 2018/2/28 20:31, Yunlong Song wrote: Previous dentry page uses highmem, which will cause panic in platforms using highmem (such as arm), since the address space of dentry pages from highmem directly goes into the decryption path via the function fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not from highmem, and then cause panic since it doesn't call kmap_high but kunmap_high is triggered at the end. To fix this problem in a simple way, this patch avoids to put dentry page in pagecache into highmem. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/dir.c | 23 +-- fs/f2fs/f2fs.h | 6 -- fs/f2fs/inline.c| 3 +-- fs/f2fs/inode.c | 2 +- fs/f2fs/namei.c | 14 +- fs/f2fs/recovery.c | 11 +-- include/linux/f2fs_fs.h | 1 - 7 files changed, 13 insertions(+), 47 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..797eb05 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -94,14 +94,12 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; - dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page); + dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); de = find_target_dentry(fname, namehash, max_slots, ); if (de) *res_page = dentry_page; - else - kunmap(dentry_page); return de; } @@ -287,7 +285,6 @@ ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, de = f2fs_find_entry(dir, qstr, page); if (de) { res = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, *page); f2fs_put_page(*page, 0); } @@ -302,7 +299,6 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type, true); de->ino = cpu_to_le32(inode->i_ino); set_de_type(de, inode->i_mode); - f2fs_dentry_kunmap(dir, page); set_page_dirty(page); dir->i_mtime = dir->i_ctime = current_time(dir); @@ -350,13 +346,11 @@ static int make_empty_dir(struct inode *inode, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); do_make_empty_dir(inode, parent, ); - kunmap_atomic(dentry_blk); - set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; @@ -547,13 +541,12 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap(dentry_page); + dentry_blk = page_address(dentry_page); bit_pos = room_for_filename(_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } @@ -588,7 +581,6 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (inode) up_write(_I(inode)->i_sem); - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -642,7 +634,6 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, F2FS_I(dir)->task = NULL; } if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); err = -EEXIST; } else if (IS_ERR(page)) { @@ -730,7 +721,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); - kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); dir->i_ctime = dir->i_mtime = current_time(dir); @@ -775,7 +765,7 @@ bool f2fs_empty_dir(struct inode *dir) return false; } - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); if (bidx == 0) bit_pos = 2; else @@ -783,7 +773,6 @@ bool f2fs_empty_dir(struct inode *dir) b
Re: [PATCH] f2fs: don't put dentry page in pagecache into highmem
OK, got it. On 2018/3/19 14:23, Jaegeuk Kim wrote: On 03/19, Yunlong Song wrote: Hi, Jaegeuk, I find this patch is removed from current branch of dev-test recently, why? Any bugs? Moved into the beginning of the tree for cherry-picking into f2fs-stable. Thanks, On 2018/2/28 20:31, Yunlong Song wrote: Previous dentry page uses highmem, which will cause panic in platforms using highmem (such as arm), since the address space of dentry pages from highmem directly goes into the decryption path via the function fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not from highmem, and then cause panic since it doesn't call kmap_high but kunmap_high is triggered at the end. To fix this problem in a simple way, this patch avoids to put dentry page in pagecache into highmem. Signed-off-by: Yunlong Song --- fs/f2fs/dir.c | 23 +-- fs/f2fs/f2fs.h | 6 -- fs/f2fs/inline.c| 3 +-- fs/f2fs/inode.c | 2 +- fs/f2fs/namei.c | 14 +- fs/f2fs/recovery.c | 11 +-- include/linux/f2fs_fs.h | 1 - 7 files changed, 13 insertions(+), 47 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..797eb05 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -94,14 +94,12 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; - dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page); + dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); de = find_target_dentry(fname, namehash, max_slots, ); if (de) *res_page = dentry_page; - else - kunmap(dentry_page); return de; } @@ -287,7 +285,6 @@ ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, de = f2fs_find_entry(dir, qstr, page); if (de) { res = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, *page); f2fs_put_page(*page, 0); } @@ -302,7 +299,6 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type, true); de->ino = cpu_to_le32(inode->i_ino); set_de_type(de, inode->i_mode); - f2fs_dentry_kunmap(dir, page); set_page_dirty(page); dir->i_mtime = dir->i_ctime = current_time(dir); @@ -350,13 +346,11 @@ static int make_empty_dir(struct inode *inode, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); do_make_empty_dir(inode, parent, ); - kunmap_atomic(dentry_blk); - set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; @@ -547,13 +541,12 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap(dentry_page); + dentry_blk = page_address(dentry_page); bit_pos = room_for_filename(_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } @@ -588,7 +581,6 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (inode) up_write(_I(inode)->i_sem); - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -642,7 +634,6 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, F2FS_I(dir)->task = NULL; } if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); err = -EEXIST; } else if (IS_ERR(page)) { @@ -730,7 +721,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); - kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); dir->i_ctime = dir->i_mtime = current_time(dir); @@ -775,7 +765,7 @@ bool f2fs_empty_dir(struct inode *dir) return false; } - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); if (bidx == 0) bit_pos = 2; else @@ -783,7 +773,6 @@ bool f2fs_empty_dir(struct inode *dir) bit_pos
Re: [PATCH] f2fs: don't put dentry page in pagecache into highmem
Hi, Jaegeuk, I find this patch is removed from current branch of dev-test recently, why? Any bugs? On 2018/2/28 20:31, Yunlong Song wrote: Previous dentry page uses highmem, which will cause panic in platforms using highmem (such as arm), since the address space of dentry pages from highmem directly goes into the decryption path via the function fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not from highmem, and then cause panic since it doesn't call kmap_high but kunmap_high is triggered at the end. To fix this problem in a simple way, this patch avoids to put dentry page in pagecache into highmem. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/dir.c | 23 +-- fs/f2fs/f2fs.h | 6 -- fs/f2fs/inline.c| 3 +-- fs/f2fs/inode.c | 2 +- fs/f2fs/namei.c | 14 +- fs/f2fs/recovery.c | 11 +-- include/linux/f2fs_fs.h | 1 - 7 files changed, 13 insertions(+), 47 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..797eb05 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -94,14 +94,12 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; - dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page); + dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); de = find_target_dentry(fname, namehash, max_slots, ); if (de) *res_page = dentry_page; - else - kunmap(dentry_page); return de; } @@ -287,7 +285,6 @@ ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, de = f2fs_find_entry(dir, qstr, page); if (de) { res = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, *page); f2fs_put_page(*page, 0); } @@ -302,7 +299,6 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type, true); de->ino = cpu_to_le32(inode->i_ino); set_de_type(de, inode->i_mode); - f2fs_dentry_kunmap(dir, page); set_page_dirty(page); dir->i_mtime = dir->i_ctime = current_time(dir); @@ -350,13 +346,11 @@ static int make_empty_dir(struct inode *inode, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); do_make_empty_dir(inode, parent, ); - kunmap_atomic(dentry_blk); - set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; @@ -547,13 +541,12 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap(dentry_page); + dentry_blk = page_address(dentry_page); bit_pos = room_for_filename(_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } @@ -588,7 +581,6 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (inode) up_write(_I(inode)->i_sem); - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -642,7 +634,6 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, F2FS_I(dir)->task = NULL; } if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); err = -EEXIST; } else if (IS_ERR(page)) { @@ -730,7 +721,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); - kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); dir->i_ctime = dir->i_mtime = current_time(dir); @@ -775,7 +765,7 @@ bool f2fs_empty_dir(struct inode *dir) return false; } - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); if (bidx == 0) bit_pos = 2; else @@ -783,7 +773,6 @@ bool f2fs_empty_dir(struct inode *dir) bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, bit_pos); - kunmap_atomic(dentry_blk); f2fs
Re: [PATCH] f2fs: don't put dentry page in pagecache into highmem
Hi, Jaegeuk, I find this patch is removed from current branch of dev-test recently, why? Any bugs? On 2018/2/28 20:31, Yunlong Song wrote: Previous dentry page uses highmem, which will cause panic in platforms using highmem (such as arm), since the address space of dentry pages from highmem directly goes into the decryption path via the function fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not from highmem, and then cause panic since it doesn't call kmap_high but kunmap_high is triggered at the end. To fix this problem in a simple way, this patch avoids to put dentry page in pagecache into highmem. Signed-off-by: Yunlong Song --- fs/f2fs/dir.c | 23 +-- fs/f2fs/f2fs.h | 6 -- fs/f2fs/inline.c| 3 +-- fs/f2fs/inode.c | 2 +- fs/f2fs/namei.c | 14 +- fs/f2fs/recovery.c | 11 +-- include/linux/f2fs_fs.h | 1 - 7 files changed, 13 insertions(+), 47 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..797eb05 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -94,14 +94,12 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; - dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page); + dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); de = find_target_dentry(fname, namehash, max_slots, ); if (de) *res_page = dentry_page; - else - kunmap(dentry_page); return de; } @@ -287,7 +285,6 @@ ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, de = f2fs_find_entry(dir, qstr, page); if (de) { res = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, *page); f2fs_put_page(*page, 0); } @@ -302,7 +299,6 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type, true); de->ino = cpu_to_le32(inode->i_ino); set_de_type(de, inode->i_mode); - f2fs_dentry_kunmap(dir, page); set_page_dirty(page); dir->i_mtime = dir->i_ctime = current_time(dir); @@ -350,13 +346,11 @@ static int make_empty_dir(struct inode *inode, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); do_make_empty_dir(inode, parent, ); - kunmap_atomic(dentry_blk); - set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; @@ -547,13 +541,12 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap(dentry_page); + dentry_blk = page_address(dentry_page); bit_pos = room_for_filename(_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } @@ -588,7 +581,6 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (inode) up_write(_I(inode)->i_sem); - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -642,7 +634,6 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, F2FS_I(dir)->task = NULL; } if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); err = -EEXIST; } else if (IS_ERR(page)) { @@ -730,7 +721,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); - kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); dir->i_ctime = dir->i_mtime = current_time(dir); @@ -775,7 +765,7 @@ bool f2fs_empty_dir(struct inode *dir) return false; } - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); if (bidx == 0) bit_pos = 2; else @@ -783,7 +773,6 @@ bool f2fs_empty_dir(struct inode *dir) bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, bit_pos); - kunmap_atomic(dentry_blk); f2fs_put_page(dentry_page, 1); @@
Re: [PATCH v4] Revert "f2fs crypto: avoid unneeded memory allocation in ->readdir"
On 2018/2/28 13:48, Jaegeuk Kim wrote: Hi Yunlong, As Eric pointed out, how do you think using nohighmem for directory likewise ext4, which looks like more efficient? OK, I have sent out another patch like this. Actually, we don't need to do this in most of recent kernels, right? Why? I have got this panic using arm with recent kernel.
Re: [PATCH v4] Revert "f2fs crypto: avoid unneeded memory allocation in ->readdir"
On 2018/2/28 13:48, Jaegeuk Kim wrote: Hi Yunlong, As Eric pointed out, how do you think using nohighmem for directory likewise ext4, which looks like more efficient? OK, I have sent out another patch like this. Actually, we don't need to do this in most of recent kernels, right? Why? I have got this panic using arm with recent kernel.
[PATCH] f2fs: don't put dentry page in pagecache into highmem
Previous dentry page uses highmem, which will cause panic in platforms using highmem (such as arm), since the address space of dentry pages from highmem directly goes into the decryption path via the function fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not from highmem, and then cause panic since it doesn't call kmap_high but kunmap_high is triggered at the end. To fix this problem in a simple way, this patch avoids to put dentry page in pagecache into highmem. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/dir.c | 23 +-- fs/f2fs/f2fs.h | 6 -- fs/f2fs/inline.c| 3 +-- fs/f2fs/inode.c | 2 +- fs/f2fs/namei.c | 14 +- fs/f2fs/recovery.c | 11 +-- include/linux/f2fs_fs.h | 1 - 7 files changed, 13 insertions(+), 47 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..797eb05 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -94,14 +94,12 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; - dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page); + dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); de = find_target_dentry(fname, namehash, max_slots, ); if (de) *res_page = dentry_page; - else - kunmap(dentry_page); return de; } @@ -287,7 +285,6 @@ ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, de = f2fs_find_entry(dir, qstr, page); if (de) { res = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, *page); f2fs_put_page(*page, 0); } @@ -302,7 +299,6 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type, true); de->ino = cpu_to_le32(inode->i_ino); set_de_type(de, inode->i_mode); - f2fs_dentry_kunmap(dir, page); set_page_dirty(page); dir->i_mtime = dir->i_ctime = current_time(dir); @@ -350,13 +346,11 @@ static int make_empty_dir(struct inode *inode, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); do_make_empty_dir(inode, parent, ); - kunmap_atomic(dentry_blk); - set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; @@ -547,13 +541,12 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap(dentry_page); + dentry_blk = page_address(dentry_page); bit_pos = room_for_filename(_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } @@ -588,7 +581,6 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (inode) up_write(_I(inode)->i_sem); - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -642,7 +634,6 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, F2FS_I(dir)->task = NULL; } if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); err = -EEXIST; } else if (IS_ERR(page)) { @@ -730,7 +721,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); - kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); dir->i_ctime = dir->i_mtime = current_time(dir); @@ -775,7 +765,7 @@ bool f2fs_empty_dir(struct inode *dir) return false; } - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); if (bidx == 0) bit_pos = 2; else @@ -783,7 +773,6 @@ bool f2fs_empty_dir(struct inode *dir) bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, bit_pos); - kunmap_atomic(dentry_blk); f2fs_put_page(dentry_page, 1); @@ -901,19 +890,17 @@ static int f2fs_rea
[PATCH] f2fs: don't put dentry page in pagecache into highmem
Previous dentry page uses highmem, which will cause panic in platforms using highmem (such as arm), since the address space of dentry pages from highmem directly goes into the decryption path via the function fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not from highmem, and then cause panic since it doesn't call kmap_high but kunmap_high is triggered at the end. To fix this problem in a simple way, this patch avoids to put dentry page in pagecache into highmem. Signed-off-by: Yunlong Song --- fs/f2fs/dir.c | 23 +-- fs/f2fs/f2fs.h | 6 -- fs/f2fs/inline.c| 3 +-- fs/f2fs/inode.c | 2 +- fs/f2fs/namei.c | 14 +- fs/f2fs/recovery.c | 11 +-- include/linux/f2fs_fs.h | 1 - 7 files changed, 13 insertions(+), 47 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..797eb05 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -94,14 +94,12 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; - dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page); + dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); de = find_target_dentry(fname, namehash, max_slots, ); if (de) *res_page = dentry_page; - else - kunmap(dentry_page); return de; } @@ -287,7 +285,6 @@ ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, de = f2fs_find_entry(dir, qstr, page); if (de) { res = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, *page); f2fs_put_page(*page, 0); } @@ -302,7 +299,6 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type, true); de->ino = cpu_to_le32(inode->i_ino); set_de_type(de, inode->i_mode); - f2fs_dentry_kunmap(dir, page); set_page_dirty(page); dir->i_mtime = dir->i_ctime = current_time(dir); @@ -350,13 +346,11 @@ static int make_empty_dir(struct inode *inode, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); make_dentry_ptr_block(NULL, , dentry_blk); do_make_empty_dir(inode, parent, ); - kunmap_atomic(dentry_blk); - set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; @@ -547,13 +541,12 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap(dentry_page); + dentry_blk = page_address(dentry_page); bit_pos = room_for_filename(_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } @@ -588,7 +581,6 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (inode) up_write(_I(inode)->i_sem); - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -642,7 +634,6 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, F2FS_I(dir)->task = NULL; } if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); err = -EEXIST; } else if (IS_ERR(page)) { @@ -730,7 +721,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); - kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); dir->i_ctime = dir->i_mtime = current_time(dir); @@ -775,7 +765,7 @@ bool f2fs_empty_dir(struct inode *dir) return false; } - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); if (bidx == 0) bit_pos = 2; else @@ -783,7 +773,6 @@ bool f2fs_empty_dir(struct inode *dir) bit_pos = find_next_bit_le(_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, bit_pos); - kunmap_atomic(dentry_blk); f2fs_put_page(dentry_page, 1); @@ -901,19 +890,17 @@ static int f2fs_readdir(s
[PATCH v4] Revert "f2fs crypto: avoid unneeded memory allocation in ->readdir"
This reverts commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b. Conflicts: fs/f2fs/dir.c In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patch: commit e06f86e61d7a ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so fix it back to the old version. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> Reviewed-by: Chao Yu <yuch...@huawei.com> --- fs/f2fs/dir.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..de2e295 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = f2fs_kmalloc(sbi, de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; -- 1.8.5.2
[PATCH v4] Revert "f2fs crypto: avoid unneeded memory allocation in ->readdir"
This reverts commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b. Conflicts: fs/f2fs/dir.c In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patch: commit e06f86e61d7a ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so fix it back to the old version. Signed-off-by: Yunlong Song Reviewed-by: Chao Yu --- fs/f2fs/dir.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..de2e295 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = f2fs_kmalloc(sbi, de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; -- 1.8.5.2
[PATCH v3] f2fs: allocate buffer for decrypting filename to avoid panic
In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add part of the old patch again for dentry page. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/dir.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..de2e295 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = f2fs_kmalloc(sbi, de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; -- 1.8.5.2
[PATCH v3] f2fs: allocate buffer for decrypting filename to avoid panic
In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add part of the old patch again for dentry page. Signed-off-by: Yunlong Song --- fs/f2fs/dir.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..de2e295 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = f2fs_kmalloc(sbi, de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; -- 1.8.5.2
Re: [PATCH] f2fs: allocate buffer for decrypting filename to avoid panic
Hi, Eric, Thanks for your review, I have removed the symlink part and use kmemdup instead. As for directory pages restricted to lowmem, I am not sure whether it is proper for f2fs, since the directory structures of f2fs are different from ext4. So I just keep its old kmap. On 2018/2/25 2:32, Eric Biggers wrote: Hi Yunlong, On Sat, Feb 24, 2018 at 05:14:58PM +0800, Yunlong Song wrote: In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit 922ec355f86365388203672119b5bca346a45085 ("f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink") commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add the old patch again. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/dir.c | 7 +++ fs/f2fs/namei.c | 10 +- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..c0cf3e7b 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = kmalloc(de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index c4c94c7..2cb70c1 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -1170,8 +1170,13 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, /* Symlink is encrypted */ sd = (struct fscrypt_symlink_data *)caddr; - cstr.name = sd->encrypted_path; cstr.len = le16_to_cpu(sd->len); + cstr.name = kmalloc(cstr.len, GFP_NOFS); + if (!cstr.name) { + res = -ENOMEM; + goto errout; + } + memcpy(cstr.name, sd->encrypted_path, cstr.len); /* this is broken symlink case */ if (unlikely(cstr.len == 0)) { @@ -1198,6 +1203,8 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, goto errout; } + kfree(cstr.name); + paddr = pstr.name; /* Null-terminate the name */ @@ -1207,6 +1214,7 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, set_delayed_call(done, kfree_link, paddr); return paddr; errout: + kfree(cstr.name); fscrypt_fname_free_buffer(); put_page(cpage); return ERR_PTR(res); -- 1.8.5.2 The pagecache for symlinks in f2fs already uses lowmem only, so the change to ->get_link() isn't needed. Also that part of the patch doesn't apply to upstream. For directories we may need your fix. Note: kmalloc + memcpy should be replaced with kmemdup. But alternatively, directory pages could be restricted to lowmem which would match ext4. Eric . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: allocate buffer for decrypting filename to avoid panic
Hi, Eric, Thanks for your review, I have removed the symlink part and use kmemdup instead. As for directory pages restricted to lowmem, I am not sure whether it is proper for f2fs, since the directory structures of f2fs are different from ext4. So I just keep its old kmap. On 2018/2/25 2:32, Eric Biggers wrote: Hi Yunlong, On Sat, Feb 24, 2018 at 05:14:58PM +0800, Yunlong Song wrote: In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit 922ec355f86365388203672119b5bca346a45085 ("f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink") commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add the old patch again. Signed-off-by: Yunlong Song --- fs/f2fs/dir.c | 7 +++ fs/f2fs/namei.c | 10 +- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..c0cf3e7b 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = kmalloc(de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index c4c94c7..2cb70c1 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -1170,8 +1170,13 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, /* Symlink is encrypted */ sd = (struct fscrypt_symlink_data *)caddr; - cstr.name = sd->encrypted_path; cstr.len = le16_to_cpu(sd->len); + cstr.name = kmalloc(cstr.len, GFP_NOFS); + if (!cstr.name) { + res = -ENOMEM; + goto errout; + } + memcpy(cstr.name, sd->encrypted_path, cstr.len); /* this is broken symlink case */ if (unlikely(cstr.len == 0)) { @@ -1198,6 +1203,8 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, goto errout; } + kfree(cstr.name); + paddr = pstr.name; /* Null-terminate the name */ @@ -1207,6 +1214,7 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, set_delayed_call(done, kfree_link, paddr); return paddr; errout: + kfree(cstr.name); fscrypt_fname_free_buffer(); put_page(cpage); return ERR_PTR(res); -- 1.8.5.2 The pagecache for symlinks in f2fs already uses lowmem only, so the change to ->get_link() isn't needed. Also that part of the patch doesn't apply to upstream. For directories we may need your fix. Note: kmalloc + memcpy should be replaced with kmemdup. But alternatively, directory pages could be restricted to lowmem which would match ext4. Eric . -- Thanks, Yunlong Song
[PATCH v2] f2fs: allocate buffer for decrypting filename to avoid panic
In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add part of the old patch again for dentry page. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/dir.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..23fff48 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,15 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = kmemdup(d->filename[bit_pos], + de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; -- 1.8.5.2
[PATCH v2] f2fs: allocate buffer for decrypting filename to avoid panic
In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add part of the old patch again for dentry page. Signed-off-by: Yunlong Song --- fs/f2fs/dir.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..23fff48 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,15 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = kmemdup(d->filename[bit_pos], + de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; -- 1.8.5.2
[PATCH] f2fs: allocate buffer for decrypting filename to avoid panic
In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit 922ec355f86365388203672119b5bca346a45085 ("f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink") commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add the old patch again. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/dir.c | 7 +++ fs/f2fs/namei.c | 10 +- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..c0cf3e7b 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = kmalloc(de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index c4c94c7..2cb70c1 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -1170,8 +1170,13 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, /* Symlink is encrypted */ sd = (struct fscrypt_symlink_data *)caddr; - cstr.name = sd->encrypted_path; cstr.len = le16_to_cpu(sd->len); + cstr.name = kmalloc(cstr.len, GFP_NOFS); + if (!cstr.name) { + res = -ENOMEM; + goto errout; + } + memcpy(cstr.name, sd->encrypted_path, cstr.len); /* this is broken symlink case */ if (unlikely(cstr.len == 0)) { @@ -1198,6 +1203,8 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, goto errout; } + kfree(cstr.name); + paddr = pstr.name; /* Null-terminate the name */ @@ -1207,6 +1214,7 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, set_delayed_call(done, kfree_link, paddr); return paddr; errout: + kfree(cstr.name); fscrypt_fname_free_buffer(); put_page(cpage); return ERR_PTR(res); -- 1.8.5.2
[PATCH] f2fs: allocate buffer for decrypting filename to avoid panic
In some platforms (such as arm), high memory is used, then the decrypting filename will cause panic, the reason see commit 569cf1876a32e574ba8a7fb825cd91bafd003882 ("f2fs crypto: allocate buffer for decrypting filename"): We got dentry pages from high_mem, and its address space directly goes into the decryption path via f2fs_fname_disk_to_usr. But, sg_init_one assumes the address is not from high_mem, so we can get this panic since it doesn't call kmap_high but kunmap_high is triggered at the end. kernel BUG at ../../../../../../kernel/mm/highmem.c:290! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kunmap_high+0xb0/0xb8) from [] (__kunmap_atomic+0xa0/0xa4) (__kunmap_atomic+0xa0/0xa4) from [] (blkcipher_walk_done+0x128/0x1ec) (blkcipher_walk_done+0x128/0x1ec) from [] (crypto_cbc_decrypt+0xc0/0x170) (crypto_cbc_decrypt+0xc0/0x170) from [] (crypto_cts_decrypt+0xc0/0x114) (crypto_cts_decrypt+0xc0/0x114) from [] (async_decrypt+0x40/0x48) (async_decrypt+0x40/0x48) from [] (f2fs_fname_disk_to_usr+0x124/0x304) (f2fs_fname_disk_to_usr+0x124/0x304) from [] (f2fs_fill_dentries+0xac/0x188) (f2fs_fill_dentries+0xac/0x188) from [] (f2fs_readdir+0x1f0/0x300) (f2fs_readdir+0x1f0/0x300) from [] (vfs_readdir+0x90/0xb4) (vfs_readdir+0x90/0xb4) from [] (SyS_getdents64+0x64/0xcc) (SyS_getdents64+0x64/0xcc) from [] (ret_fast_syscall+0x0/0x30) Howerver, later patches: commit 922ec355f86365388203672119b5bca346a45085 ("f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink") commit e06f86e61d7a67fe6e826010f57aa39c674f4b1b ("f2fs crypto: avoid unneeded memory allocation in ->readdir") reverts the codes, which causes panic again in arm, so let's add the old patch again. Signed-off-by: Yunlong Song --- fs/f2fs/dir.c | 7 +++ fs/f2fs/namei.c | 10 +- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index f00b5ed..c0cf3e7b 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -825,9 +825,16 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, int save_len = fstr->len; int err; + de_name.name = kmalloc(de_name.len, GFP_NOFS); + if (!de_name.name) + return -ENOMEM; + + memcpy(de_name.name, d->filename[bit_pos], de_name.len); + err = fscrypt_fname_disk_to_usr(d->inode, (u32)de->hash_code, 0, _name, fstr); + kfree(de_name.name); if (err) return err; diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index c4c94c7..2cb70c1 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -1170,8 +1170,13 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, /* Symlink is encrypted */ sd = (struct fscrypt_symlink_data *)caddr; - cstr.name = sd->encrypted_path; cstr.len = le16_to_cpu(sd->len); + cstr.name = kmalloc(cstr.len, GFP_NOFS); + if (!cstr.name) { + res = -ENOMEM; + goto errout; + } + memcpy(cstr.name, sd->encrypted_path, cstr.len); /* this is broken symlink case */ if (unlikely(cstr.len == 0)) { @@ -1198,6 +1203,8 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, goto errout; } + kfree(cstr.name); + paddr = pstr.name; /* Null-terminate the name */ @@ -1207,6 +1214,7 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, set_delayed_call(done, kfree_link, paddr); return paddr; errout: + kfree(cstr.name); fscrypt_fname_free_buffer(); put_page(cpage); return ERR_PTR(res); -- 1.8.5.2
Re: [PATCH] f2fs: set_code_data in move_data_block
OK, Got it. On 2018/2/11 11:50, Chao Yu wrote: On 2018/2/11 11:34, Yunlong Song wrote: Ping... move_data_block misses set_cold_data, then the F2FS_WB_CP_DATA will lack these data pages in move_data_block, and write_checkpoint can not make sure this pages committed to the flash. Hmm.. data block migration is running based on meta inode, so it will be safe since checkpoint will flush all meta pages including encrypted pages cached in meta inode? Thanks, On 2018/2/8 20:33, Yunlong Song wrote: Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..2095630 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -692,6 +692,7 @@ static void move_data_block(struct inode *inode, block_t bidx, fio.op = REQ_OP_WRITE; fio.op_flags = REQ_SYNC; fio.new_blkaddr = newaddr; + set_cold_data(fio.page); err = f2fs_submit_page_write(); if (err) { if (PageWriteback(fio.encrypted_page)) . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: set_code_data in move_data_block
OK, Got it. On 2018/2/11 11:50, Chao Yu wrote: On 2018/2/11 11:34, Yunlong Song wrote: Ping... move_data_block misses set_cold_data, then the F2FS_WB_CP_DATA will lack these data pages in move_data_block, and write_checkpoint can not make sure this pages committed to the flash. Hmm.. data block migration is running based on meta inode, so it will be safe since checkpoint will flush all meta pages including encrypted pages cached in meta inode? Thanks, On 2018/2/8 20:33, Yunlong Song wrote: Signed-off-by: Yunlong Song --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..2095630 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -692,6 +692,7 @@ static void move_data_block(struct inode *inode, block_t bidx, fio.op = REQ_OP_WRITE; fio.op_flags = REQ_SYNC; fio.new_blkaddr = newaddr; + set_cold_data(fio.page); err = f2fs_submit_page_write(); if (err) { if (PageWriteback(fio.encrypted_page)) . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: set_code_data in move_data_block
Ping... move_data_block misses set_cold_data, then the F2FS_WB_CP_DATA will lack these data pages in move_data_block, and write_checkpoint can not make sure this pages committed to the flash. On 2018/2/8 20:33, Yunlong Song wrote: Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..2095630 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -692,6 +692,7 @@ static void move_data_block(struct inode *inode, block_t bidx, fio.op = REQ_OP_WRITE; fio.op_flags = REQ_SYNC; fio.new_blkaddr = newaddr; + set_cold_data(fio.page); err = f2fs_submit_page_write(); if (err) { if (PageWriteback(fio.encrypted_page)) -- Thanks, Yunlong Song
Re: [PATCH] f2fs: set_code_data in move_data_block
Ping... move_data_block misses set_cold_data, then the F2FS_WB_CP_DATA will lack these data pages in move_data_block, and write_checkpoint can not make sure this pages committed to the flash. On 2018/2/8 20:33, Yunlong Song wrote: Signed-off-by: Yunlong Song --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..2095630 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -692,6 +692,7 @@ static void move_data_block(struct inode *inode, block_t bidx, fio.op = REQ_OP_WRITE; fio.op_flags = REQ_SYNC; fio.new_blkaddr = newaddr; + set_cold_data(fio.page); err = f2fs_submit_page_write(); if (err) { if (PageWriteback(fio.encrypted_page)) -- Thanks, Yunlong Song
Re: [f2fs-dev] [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
The problem is that you can not find a proper value of the threshold time, when f2fs_gc select the GCed data page of the atomic file (which has atomic started but not atomic committed yet), then f2fs_gc will run into loop, and all the f2fs ops will be blocked in f2fs_balane_fs. If the threshold time is set larger (e.g. 5s? Then all the f2fs ops will block 5s, which will cause unexpected bad result of user experience). And if the threshold time is set smaller (e.g. 500ms? Then the atomic ops will probably fail frequently). BTW, some more patches are needed to notify the atomic ops itself that it has run time out, and should handle the inmem pages Back to these two patches, why not use them to separate inmem pages and GCed data pages in such a simple way. On 2018/2/9 21:38, Chao Yu wrote: On 2018/2/9 21:29, Yunlong Song wrote: Back to the problem, if we skip out, then the f2fs_gc will go into dead loop if the apps only atomic start but never atomic That's another issue, which I have suggest to set a threshold time to release atomic/volatile pages by balance_fs_bg. Thanks, commit. The main aim of my two patches is to remove the skip action to avoid the dead loop. On 2018/2/9 21:26, Chao Yu wrote: On 2018/2/9 20:56, Yunlong Song wrote: As what I point in last mail, if the atomic file is not committed yet, gc_data_segment will register_inmem_page the GCed data pages. We will skip GCing that page as below check: - move_data_{page,block} - f2fs_is_atomic_file() skip out; No? Thanks, This will cause these data pages written twice, the first write happens in move_data_page->do_write_data_page, and the second write happens in later __commit_inmem_pages->do_write_data_page. On 2018/2/9 20:44, Chao Yu wrote: On 2018/2/8 11:11, Yunlong Song wrote: Then the GCed data pages are totally mixed with the inmem atomic pages, If we add dio_rwsem, GC flow is exclude with atomic write flow. There will be not race case to mix GCed page into atomic pages. Or you mean: - gc_data_segment - move_data_page - f2fs_is_atomic_file - f2fs_ioc_start_atomic_write - set_inode_flag(inode, FI_ATOMIC_FILE); - f2fs_set_data_page_dirty - register_inmem_page In this case, GCed page can be mixed into database transaction, but could it cause any problem except break rule of isolation for transaction. this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? Could we fix the race case first, based on that fixing, and then find the place that we can improve? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync:GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); +down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: +up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Seems that we need this check. - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno,
Re: [f2fs-dev] [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
The problem is that you can not find a proper value of the threshold time, when f2fs_gc select the GCed data page of the atomic file (which has atomic started but not atomic committed yet), then f2fs_gc will run into loop, and all the f2fs ops will be blocked in f2fs_balane_fs. If the threshold time is set larger (e.g. 5s? Then all the f2fs ops will block 5s, which will cause unexpected bad result of user experience). And if the threshold time is set smaller (e.g. 500ms? Then the atomic ops will probably fail frequently). BTW, some more patches are needed to notify the atomic ops itself that it has run time out, and should handle the inmem pages Back to these two patches, why not use them to separate inmem pages and GCed data pages in such a simple way. On 2018/2/9 21:38, Chao Yu wrote: On 2018/2/9 21:29, Yunlong Song wrote: Back to the problem, if we skip out, then the f2fs_gc will go into dead loop if the apps only atomic start but never atomic That's another issue, which I have suggest to set a threshold time to release atomic/volatile pages by balance_fs_bg. Thanks, commit. The main aim of my two patches is to remove the skip action to avoid the dead loop. On 2018/2/9 21:26, Chao Yu wrote: On 2018/2/9 20:56, Yunlong Song wrote: As what I point in last mail, if the atomic file is not committed yet, gc_data_segment will register_inmem_page the GCed data pages. We will skip GCing that page as below check: - move_data_{page,block} - f2fs_is_atomic_file() skip out; No? Thanks, This will cause these data pages written twice, the first write happens in move_data_page->do_write_data_page, and the second write happens in later __commit_inmem_pages->do_write_data_page. On 2018/2/9 20:44, Chao Yu wrote: On 2018/2/8 11:11, Yunlong Song wrote: Then the GCed data pages are totally mixed with the inmem atomic pages, If we add dio_rwsem, GC flow is exclude with atomic write flow. There will be not race case to mix GCed page into atomic pages. Or you mean: - gc_data_segment - move_data_page - f2fs_is_atomic_file - f2fs_ioc_start_atomic_write - set_inode_flag(inode, FI_ATOMIC_FILE); - f2fs_set_data_page_dirty - register_inmem_page In this case, GCed page can be mixed into database transaction, but could it cause any problem except break rule of isolation for transaction. this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? Could we fix the race case first, based on that fixing, and then find the place that we can improve? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync:GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); +down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: +up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Seems that we need this check. - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno,
Re: [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
Back to the problem, if we skip out, then the f2fs_gc will go into dead loop if the apps only atomic start but never atomic commit. The main aim of my two patches is to remove the skip action to avoid the dead loop. On 2018/2/9 21:26, Chao Yu wrote: On 2018/2/9 20:56, Yunlong Song wrote: As what I point in last mail, if the atomic file is not committed yet, gc_data_segment will register_inmem_page the GCed data pages. We will skip GCing that page as below check: - move_data_{page,block} - f2fs_is_atomic_file() skip out; No? Thanks, This will cause these data pages written twice, the first write happens in move_data_page->do_write_data_page, and the second write happens in later __commit_inmem_pages->do_write_data_page. On 2018/2/9 20:44, Chao Yu wrote: On 2018/2/8 11:11, Yunlong Song wrote: Then the GCed data pages are totally mixed with the inmem atomic pages, If we add dio_rwsem, GC flow is exclude with atomic write flow. There will be not race case to mix GCed page into atomic pages. Or you mean: - gc_data_segment - move_data_page - f2fs_is_atomic_file - f2fs_ioc_start_atomic_write - set_inode_flag(inode, FI_ATOMIC_FILE); - f2fs_set_data_page_dirty - register_inmem_page In this case, GCed page can be mixed into database transaction, but could it cause any problem except break rule of isolation for transaction. this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? Could we fix the race case first, based on that fixing, and then find the place that we can improve? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync:GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); +down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: +up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Seems that we need this check. - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Ditto. Thanks, if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); . . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
Back to the problem, if we skip out, then the f2fs_gc will go into dead loop if the apps only atomic start but never atomic commit. The main aim of my two patches is to remove the skip action to avoid the dead loop. On 2018/2/9 21:26, Chao Yu wrote: On 2018/2/9 20:56, Yunlong Song wrote: As what I point in last mail, if the atomic file is not committed yet, gc_data_segment will register_inmem_page the GCed data pages. We will skip GCing that page as below check: - move_data_{page,block} - f2fs_is_atomic_file() skip out; No? Thanks, This will cause these data pages written twice, the first write happens in move_data_page->do_write_data_page, and the second write happens in later __commit_inmem_pages->do_write_data_page. On 2018/2/9 20:44, Chao Yu wrote: On 2018/2/8 11:11, Yunlong Song wrote: Then the GCed data pages are totally mixed with the inmem atomic pages, If we add dio_rwsem, GC flow is exclude with atomic write flow. There will be not race case to mix GCed page into atomic pages. Or you mean: - gc_data_segment - move_data_page - f2fs_is_atomic_file - f2fs_ioc_start_atomic_write - set_inode_flag(inode, FI_ATOMIC_FILE); - f2fs_set_data_page_dirty - register_inmem_page In this case, GCed page can be mixed into database transaction, but could it cause any problem except break rule of isolation for transaction. this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? Could we fix the race case first, based on that fixing, and then find the place that we can improve? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync:GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); +down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: +up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Seems that we need this check. - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Ditto. Thanks, if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); . . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
As what I point in last mail, if the atomic file is not committed yet, gc_data_segment will register_inmem_page the GCed data pages. This will cause these data pages written twice, the first write happens in move_data_page->do_write_data_page, and the second write happens in later __commit_inmem_pages->do_write_data_page. On 2018/2/9 20:44, Chao Yu wrote: On 2018/2/8 11:11, Yunlong Song wrote: Then the GCed data pages are totally mixed with the inmem atomic pages, If we add dio_rwsem, GC flow is exclude with atomic write flow. There will be not race case to mix GCed page into atomic pages. Or you mean: - gc_data_segment - move_data_page - f2fs_is_atomic_file - f2fs_ioc_start_atomic_write - set_inode_flag(inode, FI_ATOMIC_FILE); - f2fs_set_data_page_dirty - register_inmem_page In this case, GCed page can be mixed into database transaction, but could it cause any problem except break rule of isolation for transaction. this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? Could we fix the race case first, based on that fixing, and then find the place that we can improve? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync:GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); +down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: +up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Seems that we need this check. - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Ditto. Thanks, if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
As what I point in last mail, if the atomic file is not committed yet, gc_data_segment will register_inmem_page the GCed data pages. This will cause these data pages written twice, the first write happens in move_data_page->do_write_data_page, and the second write happens in later __commit_inmem_pages->do_write_data_page. On 2018/2/9 20:44, Chao Yu wrote: On 2018/2/8 11:11, Yunlong Song wrote: Then the GCed data pages are totally mixed with the inmem atomic pages, If we add dio_rwsem, GC flow is exclude with atomic write flow. There will be not race case to mix GCed page into atomic pages. Or you mean: - gc_data_segment - move_data_page - f2fs_is_atomic_file - f2fs_ioc_start_atomic_write - set_inode_flag(inode, FI_ATOMIC_FILE); - f2fs_set_data_page_dirty - register_inmem_page In this case, GCed page can be mixed into database transaction, but could it cause any problem except break rule of isolation for transaction. this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? Could we fix the race case first, based on that fixing, and then find the place that we can improve? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync:GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); +down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: +up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Seems that we need this check. - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; -if (f2fs_is_atomic_file(inode)) -goto out; Ditto. Thanks, if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); . -- Thanks, Yunlong Song
[PATCH] f2fs: set_code_data in move_data_block
Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..2095630 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -692,6 +692,7 @@ static void move_data_block(struct inode *inode, block_t bidx, fio.op = REQ_OP_WRITE; fio.op_flags = REQ_SYNC; fio.new_blkaddr = newaddr; + set_cold_data(fio.page); err = f2fs_submit_page_write(); if (err) { if (PageWriteback(fio.encrypted_page)) -- 1.8.5.2
[PATCH] f2fs: set_code_data in move_data_block
Signed-off-by: Yunlong Song --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..2095630 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -692,6 +692,7 @@ static void move_data_block(struct inode *inode, block_t bidx, fio.op = REQ_OP_WRITE; fio.op_flags = REQ_SYNC; fio.new_blkaddr = newaddr; + set_cold_data(fio.page); err = f2fs_submit_page_write(); if (err) { if (PageWriteback(fio.encrypted_page)) -- 1.8.5.2
Re: [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
Then the GCed data pages are totally mixed with the inmem atomic pages, this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync: GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); + down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: + up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) - goto out; - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) - goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); -- Thanks, Yunlong Song
Re: [PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
Then the GCed data pages are totally mixed with the inmem atomic pages, this will cause the atomic commit ops write the GCed data pages twice (the first write happens in GC). How about using the early two patches to separate the inmem data pages and GCed data pages, and use dio_rwsem instead of this patch to fix the dnode page problem (dnode page commited but data page are not committed for the GCed page)? On 2018/2/7 20:16, Chao Yu wrote: On 2018/2/6 11:49, Yunlong Song wrote: This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. do_fsync: GC: - mutex_lock(>commit_lock); - lock_page() - mutex_lock(>commit_lock); - lock_page() Well, please consider lock dependency & code complexity, IMO, reuse fi->dio_rwsem[WRITE] will be enough as below: --- fs/f2fs/file.c | 3 +++ fs/f2fs/gc.c | 5 - 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542e5464..1bdc11feb8d0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1711,6 +1711,8 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); + down_write(_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_volatile_file(inode)) goto err_out; @@ -1729,6 +1731,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: + up_write(_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd532a9..e49416283563 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,9 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) - goto out; - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -729,8 +726,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) - goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); -- Thanks, Yunlong Song
[PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 15 -- fs/f2fs/gc.c| 61 + fs/f2fs/super.c | 1 + 4 files changed, 55 insertions(+), 23 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index dbe87c7..b58a8f2 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -628,6 +628,7 @@ struct f2fs_inode_info { struct list_head inmem_pages; /* inmemory pages managed by f2fs */ struct task_struct *inmem_task; /* store inmemory task */ struct mutex inmem_lock;/* lock for inmemory pages */ + struct mutex commit_lock; /* lock for commit GCed pages */ struct extent_tree *extent_tree;/* cached extent_tree entry */ struct rw_semaphore dio_rwsem[2];/* avoid racing between dio and gc */ struct rw_semaphore i_mmap_sem; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542..7e14724 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -202,6 +202,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, { struct inode *inode = file->f_mapping->host; struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct f2fs_inode_info *fi = F2FS_I(inode); nid_t ino = inode->i_ino; int ret = 0; enum cp_reason_type cp_reason = 0; @@ -219,11 +220,13 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, /* if fdatasync is triggered, let's do in-place-update */ if (datasync || get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks) set_inode_flag(inode, FI_NEED_IPU); + mutex_lock(>commit_lock); ret = file_write_and_wait_range(file, start, end); clear_inode_flag(inode, FI_NEED_IPU); if (ret) { trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret); + mutex_unlock(>commit_lock); return ret; } @@ -244,8 +247,11 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, goto go_write; if (is_inode_flag_set(inode, FI_UPDATE_WRITE) || - exist_written_data(sbi, ino, UPDATE_INO)) + exist_written_data(sbi, ino, UPDATE_INO)) { + mutex_unlock(>commit_lock); goto flush_out; + } + mutex_unlock(>commit_lock); goto out; } go_write: @@ -268,16 +274,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, try_to_fix_pino(inode); clear_inode_flag(inode, FI_APPEND_WRITE); clear_inode_flag(inode, FI_UPDATE_WRITE); + mutex_unlock(>commit_lock); goto out; } sync_nodes: ret = fsync_node_pages(sbi, inode, , atomic); - if (ret) + if (ret) { + mutex_unlock(>commit_lock); goto out; + } /* if cp_error was enabled, we should avoid infinite loop */ if (unlikely(f2fs_cp_error(sbi))) { ret = -EIO; + mutex_unlock(>commit_lock); goto out; } @@ -286,6 +296,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, f2fs_write_inode(inode, NULL); goto sync_nodes; } + mutex_unlock(>commit_lock); /* * If it's atomic_write, it's just fine to keep write ordering. So diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 9d54ddb..b98aff5 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -676,13 +676,20 @@ static void move_data_block(struct inode *inode, block_t bidx, goto put_page_out; } - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode) && - !IS_GC_WRITTEN_PAGE(fio.encrypted_page)) { - set_page_private(fio.encrypted_page, (unsigned long)GC_WRITTEN_PAGE); - SetPagePrivate(fio.encrypted_page); - } - set_page_dirty(fio.encrypted_page); + if (f2fs_is_atomic_file(inode)) { + struct f2fs_inode_info *fi = F2FS_I(inode); + + mutex_lock(>commit_lock); + if (!f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(fio.encrypted_page)) { + set_page_private(fio.encrypted_page, + (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(fio.encrypted
[PATCH] f2fs: add fi->commit_lock to protect commit GCed pages
This patch adds fi->commit_lock to avoid the case that GCed node pages are committed but GCed data pages are not committed. This can avoid the db file run into inconsistent state when sudden-power-off happens if data pages of atomic file is allowed to be GCed before. Signed-off-by: Yunlong Song --- fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 15 -- fs/f2fs/gc.c| 61 + fs/f2fs/super.c | 1 + 4 files changed, 55 insertions(+), 23 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index dbe87c7..b58a8f2 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -628,6 +628,7 @@ struct f2fs_inode_info { struct list_head inmem_pages; /* inmemory pages managed by f2fs */ struct task_struct *inmem_task; /* store inmemory task */ struct mutex inmem_lock;/* lock for inmemory pages */ + struct mutex commit_lock; /* lock for commit GCed pages */ struct extent_tree *extent_tree;/* cached extent_tree entry */ struct rw_semaphore dio_rwsem[2];/* avoid racing between dio and gc */ struct rw_semaphore i_mmap_sem; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 672a542..7e14724 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -202,6 +202,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, { struct inode *inode = file->f_mapping->host; struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct f2fs_inode_info *fi = F2FS_I(inode); nid_t ino = inode->i_ino; int ret = 0; enum cp_reason_type cp_reason = 0; @@ -219,11 +220,13 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, /* if fdatasync is triggered, let's do in-place-update */ if (datasync || get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks) set_inode_flag(inode, FI_NEED_IPU); + mutex_lock(>commit_lock); ret = file_write_and_wait_range(file, start, end); clear_inode_flag(inode, FI_NEED_IPU); if (ret) { trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret); + mutex_unlock(>commit_lock); return ret; } @@ -244,8 +247,11 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, goto go_write; if (is_inode_flag_set(inode, FI_UPDATE_WRITE) || - exist_written_data(sbi, ino, UPDATE_INO)) + exist_written_data(sbi, ino, UPDATE_INO)) { + mutex_unlock(>commit_lock); goto flush_out; + } + mutex_unlock(>commit_lock); goto out; } go_write: @@ -268,16 +274,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, try_to_fix_pino(inode); clear_inode_flag(inode, FI_APPEND_WRITE); clear_inode_flag(inode, FI_UPDATE_WRITE); + mutex_unlock(>commit_lock); goto out; } sync_nodes: ret = fsync_node_pages(sbi, inode, , atomic); - if (ret) + if (ret) { + mutex_unlock(>commit_lock); goto out; + } /* if cp_error was enabled, we should avoid infinite loop */ if (unlikely(f2fs_cp_error(sbi))) { ret = -EIO; + mutex_unlock(>commit_lock); goto out; } @@ -286,6 +296,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, f2fs_write_inode(inode, NULL); goto sync_nodes; } + mutex_unlock(>commit_lock); /* * If it's atomic_write, it's just fine to keep write ordering. So diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 9d54ddb..b98aff5 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -676,13 +676,20 @@ static void move_data_block(struct inode *inode, block_t bidx, goto put_page_out; } - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode) && - !IS_GC_WRITTEN_PAGE(fio.encrypted_page)) { - set_page_private(fio.encrypted_page, (unsigned long)GC_WRITTEN_PAGE); - SetPagePrivate(fio.encrypted_page); - } - set_page_dirty(fio.encrypted_page); + if (f2fs_is_atomic_file(inode)) { + struct f2fs_inode_info *fi = F2FS_I(inode); + + mutex_lock(>commit_lock); + if (!f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(fio.encrypted_page)) { + set_page_private(fio.encrypted_page, + (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(fio.encrypted_page)
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
OK, now I got it, thanks for the explanation. Then the point is to avoid set_page_dirty between file_write_and_wait_range and fsync_node_pages, so we can lock before file_write_and_wait_range and unlock after fsync_node_pages, and lock before set_page_dirty and unlock after set_page_dirty. These patches and the locks can make sure the GCed data pages are all committed to nand flash with their nodes. On 2018/2/5 19:10, Chao Yu wrote: On 2018/2/5 17:37, Yunlong Song wrote: OK, details as I explained before: atomic_commit GC - file_write_and_wait_range - move_data_block - f2fs_submit_page_write - f2fs_update_data_blkaddr - set_page_dirty - fsync_node_pages 1. atomic writes data page #1 & update node #1 2. GC data page #2 & update node #2 3. page #1 & node #1 & #2 can be committed into nand flash before page #2 be committed. After a sudden pow-cut, database transaction will be inconsistent. So I think there will be better to exclude gc/atomic_write to each other, with a lock instead of flag checking. I do not understand why this transaction is inconsistent, is it a problem that page #2 is not committed into nand flash? Since normal Yes, node #2 contains newly updated LBAx of page #2, but if page #2 is not committed to LBAx, after recovery, page #2 's block address in node #2 will point to LBAx which contains random data, result in corrupted db file. gc also has this problem: Suppose that there is db file A, f2fs_gc moves data page #1 of db file A. But if write checkpoint only commit node page #1 and then a sudden f2fs will ensure GCed data being persisted during checkpoint, so migrated page #1 and updated node #1 will both be committed in this checkpoint. Please check WB_DATA_TYPE macro to see how we define data type that cp guarantees to writeback. power-cut happens. Data page #1 is not committed to nand flash, but node page #1 is committed. Is the db transaction broken and inconsistent? Come back to your example, I think data page 2 of atomic file does not belong to this transaction, so even node page 2 is committed, it is just If node #2 is committed only, it will be harmful to db transaction due to the reason I said above. Thanks, the same problem as what I have listed above(db file A), and it does not break this transaction. Thanks, So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . . . . -- Thanks, Yunlong Song
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
OK, now I got it, thanks for the explanation. Then the point is to avoid set_page_dirty between file_write_and_wait_range and fsync_node_pages, so we can lock before file_write_and_wait_range and unlock after fsync_node_pages, and lock before set_page_dirty and unlock after set_page_dirty. These patches and the locks can make sure the GCed data pages are all committed to nand flash with their nodes. On 2018/2/5 19:10, Chao Yu wrote: On 2018/2/5 17:37, Yunlong Song wrote: OK, details as I explained before: atomic_commit GC - file_write_and_wait_range - move_data_block - f2fs_submit_page_write - f2fs_update_data_blkaddr - set_page_dirty - fsync_node_pages 1. atomic writes data page #1 & update node #1 2. GC data page #2 & update node #2 3. page #1 & node #1 & #2 can be committed into nand flash before page #2 be committed. After a sudden pow-cut, database transaction will be inconsistent. So I think there will be better to exclude gc/atomic_write to each other, with a lock instead of flag checking. I do not understand why this transaction is inconsistent, is it a problem that page #2 is not committed into nand flash? Since normal Yes, node #2 contains newly updated LBAx of page #2, but if page #2 is not committed to LBAx, after recovery, page #2 's block address in node #2 will point to LBAx which contains random data, result in corrupted db file. gc also has this problem: Suppose that there is db file A, f2fs_gc moves data page #1 of db file A. But if write checkpoint only commit node page #1 and then a sudden f2fs will ensure GCed data being persisted during checkpoint, so migrated page #1 and updated node #1 will both be committed in this checkpoint. Please check WB_DATA_TYPE macro to see how we define data type that cp guarantees to writeback. power-cut happens. Data page #1 is not committed to nand flash, but node page #1 is committed. Is the db transaction broken and inconsistent? Come back to your example, I think data page 2 of atomic file does not belong to this transaction, so even node page 2 is committed, it is just If node #2 is committed only, it will be harmful to db transaction due to the reason I said above. Thanks, the same problem as what I have listed above(db file A), and it does not break this transaction. Thanks, So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . . . . -- Thanks, Yunlong Song
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
OK, details as I explained before: atomic_commit GC - file_write_and_wait_range - move_data_block - f2fs_submit_page_write - f2fs_update_data_blkaddr - set_page_dirty - fsync_node_pages 1. atomic writes data page #1 & update node #1 2. GC data page #2 & update node #2 3. page #1 & node #1 & #2 can be committed into nand flash before page #2 be committed. After a sudden pow-cut, database transaction will be inconsistent. So I think there will be better to exclude gc/atomic_write to each other, with a lock instead of flag checking. I do not understand why this transaction is inconsistent, is it a problem that page #2 is not committed into nand flash? Since normal gc also has this problem: Suppose that there is db file A, f2fs_gc moves data page #1 of db file A. But if write checkpoint only commit node page #1 and then a sudden power-cut happens. Data page #1 is not committed to nand flash, but node page #1 is committed. Is the db transaction broken and inconsistent? Come back to your example, I think data page 2 of atomic file does not belong to this transaction, so even node page 2 is committed, it is just the same problem as what I have listed above(db file A), and it does not break this transaction. Thanks, So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . . . -- Thanks, Yunlong Song
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
OK, details as I explained before: atomic_commit GC - file_write_and_wait_range - move_data_block - f2fs_submit_page_write - f2fs_update_data_blkaddr - set_page_dirty - fsync_node_pages 1. atomic writes data page #1 & update node #1 2. GC data page #2 & update node #2 3. page #1 & node #1 & #2 can be committed into nand flash before page #2 be committed. After a sudden pow-cut, database transaction will be inconsistent. So I think there will be better to exclude gc/atomic_write to each other, with a lock instead of flag checking. I do not understand why this transaction is inconsistent, is it a problem that page #2 is not committed into nand flash? Since normal gc also has this problem: Suppose that there is db file A, f2fs_gc moves data page #1 of db file A. But if write checkpoint only commit node page #1 and then a sudden power-cut happens. Data page #1 is not committed to nand flash, but node page #1 is committed. Is the db transaction broken and inconsistent? Come back to your example, I think data page 2 of atomic file does not belong to this transaction, so even node page 2 is committed, it is just the same problem as what I have listed above(db file A), and it does not break this transaction. Thanks, So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . . . -- Thanks, Yunlong Song
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
Is it necessary to make atomic commit fail? What's the problem of this patch (no lock at all and does not make atomic fail)? These two patches aims to provide ability to gc old blocks of opened atomic file, with no affection to original atomic commit and no mix with inmem pages. On 2018/2/5 14:29, Chao Yu wrote: On 2018/2/5 10:53, Yunlong Song wrote: Is it necessary to add a lock here? What's the problem of this patch (no lock at all)? Anyway, the problem is expected to be fixed asap, since attackers can easily write an app with only atomic start and no atomic commit, which will cause f2fs run into loop gc if the disk layout is much fragmented, since f2fs_gc will select the same target victim all the time (e.g. one block of target victim belongs to the opened atomic file, and it will not be moved and do_garbage_collect will finally return 0, and that victim is selected again next time) and goto gc_more time and time again, which will block all the fs ops (all the fs ops will hang up in f2fs_balance_fs). Hmm.. w/ original commit log and implementation, I supposed that the patch intended to fix to make atomic write be isolated from other IOs like GC triggered writes... Alright, we have discuss the problem before in below link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1571951.html I meant, for example: f2fs_ioc_start_atomic_write() inode->atomic_open_time = get_mtime(); f2fs_ioc_commit_atomic_write() inode->atomic_open_time = 0; f2fs_balance_fs_bg() for_each_atomic_open_file() if (inode->atomic_open_time && inode->atomic_open_time > threshold) { drop_inmem_pages(); f2fs_msg(); } threshold = 30s Any thoughts? Thanks, On 2018/2/4 22:56, Chao Yu wrote: On 2018/2/3 10:47, Yunlong Song wrote: If inode has already started to atomic commit, then set_page_dirty will not mix the gc pages with the inmem atomic pages, so the page can be gced safely. Let's avoid Ccing fs mailing list if the patch didn't change vfs common codes. As you know, the problem here is mixed dnode block flushing w/o writebacking gced data block, result in making transaction unintegrated. So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . . -- Thanks, Yunlong Song
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
Is it necessary to make atomic commit fail? What's the problem of this patch (no lock at all and does not make atomic fail)? These two patches aims to provide ability to gc old blocks of opened atomic file, with no affection to original atomic commit and no mix with inmem pages. On 2018/2/5 14:29, Chao Yu wrote: On 2018/2/5 10:53, Yunlong Song wrote: Is it necessary to add a lock here? What's the problem of this patch (no lock at all)? Anyway, the problem is expected to be fixed asap, since attackers can easily write an app with only atomic start and no atomic commit, which will cause f2fs run into loop gc if the disk layout is much fragmented, since f2fs_gc will select the same target victim all the time (e.g. one block of target victim belongs to the opened atomic file, and it will not be moved and do_garbage_collect will finally return 0, and that victim is selected again next time) and goto gc_more time and time again, which will block all the fs ops (all the fs ops will hang up in f2fs_balance_fs). Hmm.. w/ original commit log and implementation, I supposed that the patch intended to fix to make atomic write be isolated from other IOs like GC triggered writes... Alright, we have discuss the problem before in below link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1571951.html I meant, for example: f2fs_ioc_start_atomic_write() inode->atomic_open_time = get_mtime(); f2fs_ioc_commit_atomic_write() inode->atomic_open_time = 0; f2fs_balance_fs_bg() for_each_atomic_open_file() if (inode->atomic_open_time && inode->atomic_open_time > threshold) { drop_inmem_pages(); f2fs_msg(); } threshold = 30s Any thoughts? Thanks, On 2018/2/4 22:56, Chao Yu wrote: On 2018/2/3 10:47, Yunlong Song wrote: If inode has already started to atomic commit, then set_page_dirty will not mix the gc pages with the inmem atomic pages, so the page can be gced safely. Let's avoid Ccing fs mailing list if the patch didn't change vfs common codes. As you know, the problem here is mixed dnode block flushing w/o writebacking gced data block, result in making transaction unintegrated. So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . . -- Thanks, Yunlong Song
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
Is it necessary to add a lock here? What's the problem of this patch (no lock at all)? Anyway, the problem is expected to be fixed asap, since attackers can easily write an app with only atomic start and no atomic commit, which will cause f2fs run into loop gc if the disk layout is much fragmented, since f2fs_gc will select the same target victim all the time (e.g. one block of target victim belongs to the opened atomic file, and it will not be moved and do_garbage_collect will finally return 0, and that victim is selected again next time) and goto gc_more time and time again, which will block all the fs ops (all the fs ops will hang up in f2fs_balance_fs). On 2018/2/4 22:56, Chao Yu wrote: On 2018/2/3 10:47, Yunlong Song wrote: If inode has already started to atomic commit, then set_page_dirty will not mix the gc pages with the inmem atomic pages, so the page can be gced safely. Let's avoid Ccing fs mailing list if the patch didn't change vfs common codes. As you know, the problem here is mixed dnode block flushing w/o writebacking gced data block, result in making transaction unintegrated. So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . -- Thanks, Yunlong Song
Re: [PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
Is it necessary to add a lock here? What's the problem of this patch (no lock at all)? Anyway, the problem is expected to be fixed asap, since attackers can easily write an app with only atomic start and no atomic commit, which will cause f2fs run into loop gc if the disk layout is much fragmented, since f2fs_gc will select the same target victim all the time (e.g. one block of target victim belongs to the opened atomic file, and it will not be moved and do_garbage_collect will finally return 0, and that victim is selected again next time) and goto gc_more time and time again, which will block all the fs ops (all the fs ops will hang up in f2fs_balance_fs). On 2018/2/4 22:56, Chao Yu wrote: On 2018/2/3 10:47, Yunlong Song wrote: If inode has already started to atomic commit, then set_page_dirty will not mix the gc pages with the inmem atomic pages, so the page can be gced safely. Let's avoid Ccing fs mailing list if the patch didn't change vfs common codes. As you know, the problem here is mixed dnode block flushing w/o writebacking gced data block, result in making transaction unintegrated. So how about just using dio_rwsem[WRITE] during atomic committing to exclude GCing data block of atomic opened file? Thanks, Signed-off-by: Yunlong Song --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) . -- Thanks, Yunlong Song
[PATCH 2/2] f2fs: add GC_WRITTEN_PAGE to gc atomic file
This patch enables to gc atomic file by adding GC_WRITTEN_PAGE to identify the gced pages of atomic file, which can avoid register_inmem_page in set_page_dirty, so the gced pages will not mix with the inmem pages. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/data.c| 7 ++- fs/f2fs/gc.c | 25 ++--- fs/f2fs/segment.h | 3 +++ 3 files changed, 27 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index edafcb6..5e1fc5d 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -120,6 +120,10 @@ static void f2fs_write_end_io(struct bio *bio) dec_page_count(sbi, type); clear_cold_data(page); + if (IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, 0); + ClearPagePrivate(page); + } end_page_writeback(page); } if (!get_pages(sbi, F2FS_WB_CP_DATA) && @@ -2418,7 +2422,8 @@ static int f2fs_set_data_page_dirty(struct page *page) if (!PageUptodate(page)) SetPageUptodate(page); - if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) { + if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode) + && !IS_GC_WRITTEN_PAGE(page)) { if (!IS_ATOMIC_WRITTEN_PAGE(page)) { register_inmem_page(inode, page); return 1; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 84ab3ff..9d54ddb 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,10 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode)) - goto out; - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -680,6 +676,12 @@ static void move_data_block(struct inode *inode, block_t bidx, goto put_page_out; } + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(fio.encrypted_page)) { + set_page_private(fio.encrypted_page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(fio.encrypted_page); + } set_page_dirty(fio.encrypted_page); f2fs_wait_on_page_writeback(fio.encrypted_page, DATA, true); if (clear_page_dirty_for_io(fio.encrypted_page)) @@ -730,9 +732,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode)) - goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); @@ -742,6 +741,12 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (gc_type == BG_GC) { if (PageWriteback(page)) goto out; + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(page); + } set_page_dirty(page); set_cold_data(page); } else { @@ -762,6 +767,12 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, int err; retry: + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(page); + } set_page_dirty(page); f2fs_wait_on_page_writeback(page, DATA, true); if (clear_page_dirty_for_io(page)) { diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index f11c4bc..f0a6432 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -203,11 +203,14 @@ struct segment_allocation { */ #define ATOMIC_WRITTEN_PAGE((unsigned long)-1) #define DUMMY_WRITTEN_PAGE ((unsigned long)-2) +#define GC_WRITTEN_PAGE((unsigned long)-3) #define IS_ATOMIC_WRITTEN_PAGE(page) \ (page_private(page) == (unsigned long)ATOMIC_WRITTEN_PAGE) #define IS_DUMMY_WRITTEN_PAGE(page)\ (page_private(page) == (unsigned long)DUMMY_WRITTEN_PAGE) +#define
[PATCH 2/2] f2fs: add GC_WRITTEN_PAGE to gc atomic file
This patch enables to gc atomic file by adding GC_WRITTEN_PAGE to identify the gced pages of atomic file, which can avoid register_inmem_page in set_page_dirty, so the gced pages will not mix with the inmem pages. Signed-off-by: Yunlong Song --- fs/f2fs/data.c| 7 ++- fs/f2fs/gc.c | 25 ++--- fs/f2fs/segment.h | 3 +++ 3 files changed, 27 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index edafcb6..5e1fc5d 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -120,6 +120,10 @@ static void f2fs_write_end_io(struct bio *bio) dec_page_count(sbi, type); clear_cold_data(page); + if (IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, 0); + ClearPagePrivate(page); + } end_page_writeback(page); } if (!get_pages(sbi, F2FS_WB_CP_DATA) && @@ -2418,7 +2422,8 @@ static int f2fs_set_data_page_dirty(struct page *page) if (!PageUptodate(page)) SetPageUptodate(page); - if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) { + if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode) + && !IS_GC_WRITTEN_PAGE(page)) { if (!IS_ATOMIC_WRITTEN_PAGE(page)) { register_inmem_page(inode, page); return 1; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 84ab3ff..9d54ddb 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,10 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode)) - goto out; - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -680,6 +676,12 @@ static void move_data_block(struct inode *inode, block_t bidx, goto put_page_out; } + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(fio.encrypted_page)) { + set_page_private(fio.encrypted_page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(fio.encrypted_page); + } set_page_dirty(fio.encrypted_page); f2fs_wait_on_page_writeback(fio.encrypted_page, DATA, true); if (clear_page_dirty_for_io(fio.encrypted_page)) @@ -730,9 +732,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode)) - goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); @@ -742,6 +741,12 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (gc_type == BG_GC) { if (PageWriteback(page)) goto out; + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(page); + } set_page_dirty(page); set_cold_data(page); } else { @@ -762,6 +767,12 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, int err; retry: + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(page); + } set_page_dirty(page); f2fs_wait_on_page_writeback(page, DATA, true); if (clear_page_dirty_for_io(page)) { diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index f11c4bc..f0a6432 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -203,11 +203,14 @@ struct segment_allocation { */ #define ATOMIC_WRITTEN_PAGE((unsigned long)-1) #define DUMMY_WRITTEN_PAGE ((unsigned long)-2) +#define GC_WRITTEN_PAGE((unsigned long)-3) #define IS_ATOMIC_WRITTEN_PAGE(page) \ (page_private(page) == (unsigned long)ATOMIC_WRITTEN_PAGE) #define IS_DUMMY_WRITTEN_PAGE(page)\ (page_private(page) == (unsigned long)DUMMY_WRITTEN_PAGE) +#define IS_GC_WRITTEN_PAGE(page)
[PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
If inode has already started to atomic commit, then set_page_dirty will not mix the gc pages with the inmem atomic pages, so the page can be gced safely. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) -- 1.8.5.2
[PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
If inode has already started to atomic commit, then set_page_dirty will not mix the gc pages with the inmem atomic pages, so the page can be gced safely. Signed-off-by: Yunlong Song --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) -- 1.8.5.2
Re: [PATCH] f2fs: fix heap mode to reset it back
The old commit allocates hot data & nodes in the beginning of partition both for heap and noheap mode. But from the commit message, the heap mode should be like before, i.e., allocate hot data & nodes from curseg to left. On 2018/1/29 16:12, Chao Yu wrote: Hi Yunlong, On 2018/1/29 11:37, Yunlong Song wrote: Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node and hot data in the beginning of partition") introduces another mount option, heap, to reset it back. But it does not do anything for heap mode, so fix it. I think Jaegeuk did three things in that patch: a) add missing heap mount option handling in ->show_options. b) set noheap by default. c) change allocation policy to the one that allocate hotdata & nodes in the front of main are intensively. They could be separated, independent, and I don't see such intention that we can only use c) the new introduced allocation policy in noheap mode. Anyway, I think Jaegeuk can help to double check that. Thanks, Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/gc.c | 5 +++-- fs/f2fs/segment.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index aa720cc..b9d93fd 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -191,8 +191,9 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, if (gc_type != FG_GC && p->max_search > sbi->max_victim_search) p->max_search = sbi->max_victim_search; - /* let's select beginning hot/small space first */ - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + /* let's select beginning hot/small space first in no_heap mode*/ + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) p->offset = 0; else p->offset = SIT_I(sbi)->last_victim[p->gc_mode]; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e5739ce..77a48c4 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2167,7 +2167,8 @@ static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) if (sbi->segs_per_sec != 1) return CURSEG_I(sbi, type)->segno; - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) return 0; if (SIT_I(sbi)->last_victim[ALLOC_NEXT]) . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: fix heap mode to reset it back
The old commit allocates hot data & nodes in the beginning of partition both for heap and noheap mode. But from the commit message, the heap mode should be like before, i.e., allocate hot data & nodes from curseg to left. On 2018/1/29 16:12, Chao Yu wrote: Hi Yunlong, On 2018/1/29 11:37, Yunlong Song wrote: Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node and hot data in the beginning of partition") introduces another mount option, heap, to reset it back. But it does not do anything for heap mode, so fix it. I think Jaegeuk did three things in that patch: a) add missing heap mount option handling in ->show_options. b) set noheap by default. c) change allocation policy to the one that allocate hotdata & nodes in the front of main are intensively. They could be separated, independent, and I don't see such intention that we can only use c) the new introduced allocation policy in noheap mode. Anyway, I think Jaegeuk can help to double check that. Thanks, Signed-off-by: Yunlong Song --- fs/f2fs/gc.c | 5 +++-- fs/f2fs/segment.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index aa720cc..b9d93fd 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -191,8 +191,9 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, if (gc_type != FG_GC && p->max_search > sbi->max_victim_search) p->max_search = sbi->max_victim_search; - /* let's select beginning hot/small space first */ - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + /* let's select beginning hot/small space first in no_heap mode*/ + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) p->offset = 0; else p->offset = SIT_I(sbi)->last_victim[p->gc_mode]; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e5739ce..77a48c4 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2167,7 +2167,8 @@ static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) if (sbi->segs_per_sec != 1) return CURSEG_I(sbi, type)->segno; - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) return 0; if (SIT_I(sbi)->last_victim[ALLOC_NEXT]) . -- Thanks, Yunlong Song
[PATCH] f2fs: fix heap mode to reset it back
Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node and hot data in the beginning of partition") introduces another mount option, heap, to reset it back. But it does not do anything for heap mode, so fix it. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/gc.c | 5 +++-- fs/f2fs/segment.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index aa720cc..b9d93fd 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -191,8 +191,9 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, if (gc_type != FG_GC && p->max_search > sbi->max_victim_search) p->max_search = sbi->max_victim_search; - /* let's select beginning hot/small space first */ - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + /* let's select beginning hot/small space first in no_heap mode*/ + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) p->offset = 0; else p->offset = SIT_I(sbi)->last_victim[p->gc_mode]; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e5739ce..77a48c4 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2167,7 +2167,8 @@ static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) if (sbi->segs_per_sec != 1) return CURSEG_I(sbi, type)->segno; - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) return 0; if (SIT_I(sbi)->last_victim[ALLOC_NEXT]) -- 1.8.5.2
[PATCH] f2fs: fix heap mode to reset it back
Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node and hot data in the beginning of partition") introduces another mount option, heap, to reset it back. But it does not do anything for heap mode, so fix it. Signed-off-by: Yunlong Song --- fs/f2fs/gc.c | 5 +++-- fs/f2fs/segment.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index aa720cc..b9d93fd 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -191,8 +191,9 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, if (gc_type != FG_GC && p->max_search > sbi->max_victim_search) p->max_search = sbi->max_victim_search; - /* let's select beginning hot/small space first */ - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + /* let's select beginning hot/small space first in no_heap mode*/ + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) p->offset = 0; else p->offset = SIT_I(sbi)->last_victim[p->gc_mode]; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e5739ce..77a48c4 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2167,7 +2167,8 @@ static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) if (sbi->segs_per_sec != 1) return CURSEG_I(sbi, type)->segno; - if (type == CURSEG_HOT_DATA || IS_NODESEG(type)) + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) return 0; if (SIT_I(sbi)->last_victim[ALLOC_NEXT]) -- 1.8.5.2
Re: [PATCH v2] f2fs: handle newly created page when revoking inmem pages
Should it be "When committing inmem pages is not successful" ? On 2018/1/11 8:17, Daeho Jeong wrote: When committing inmem pages is successful, we revoke already committed blocks in __revoke_inmem_pages() and finally replace the committed ones with the old blocks using f2fs_replace_block(). However, if the committed block was newly created one, the address of the old block is NEW_ADDR and __f2fs_replace_block() cannot handle NEW_ADDR as new_blkaddr properly and a kernel panic occurrs. Signed-off-by: Daeho Jeong <daeho.je...@samsung.com> Tested-by: Shu Tan <shu@samsung.com> Reviewed-by: Chao Yu <yuch...@huawei.com> --- fs/f2fs/segment.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index c117e09..0673d08 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -248,7 +248,11 @@ static int __revoke_inmem_pages(struct inode *inode, goto next; } get_node_info(sbi, dn.nid, ); - f2fs_replace_block(sbi, , dn.data_blkaddr, + if (cur->old_addr == NEW_ADDR) { + invalidate_blocks(sbi, dn.data_blkaddr); + f2fs_update_data_blkaddr(, NEW_ADDR); + } else + f2fs_replace_block(sbi, , dn.data_blkaddr, cur->old_addr, ni.version, true, true); f2fs_put_dnode(); } -- Thanks, Yunlong Song
Re: [PATCH v2] f2fs: handle newly created page when revoking inmem pages
Should it be "When committing inmem pages is not successful" ? On 2018/1/11 8:17, Daeho Jeong wrote: When committing inmem pages is successful, we revoke already committed blocks in __revoke_inmem_pages() and finally replace the committed ones with the old blocks using f2fs_replace_block(). However, if the committed block was newly created one, the address of the old block is NEW_ADDR and __f2fs_replace_block() cannot handle NEW_ADDR as new_blkaddr properly and a kernel panic occurrs. Signed-off-by: Daeho Jeong Tested-by: Shu Tan Reviewed-by: Chao Yu --- fs/f2fs/segment.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index c117e09..0673d08 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -248,7 +248,11 @@ static int __revoke_inmem_pages(struct inode *inode, goto next; } get_node_info(sbi, dn.nid, ); - f2fs_replace_block(sbi, , dn.data_blkaddr, + if (cur->old_addr == NEW_ADDR) { + invalidate_blocks(sbi, dn.data_blkaddr); + f2fs_update_data_blkaddr(, NEW_ADDR); + } else + f2fs_replace_block(sbi, , dn.data_blkaddr, cur->old_addr, ni.version, true, true); f2fs_put_dnode(); } -- Thanks, Yunlong Song
Re: [f2fs-dev] [PATCH 2/2] f2fs: add reserved blocks for root user
On 2018/1/4 2:58, Jaegeuk Kim wrote: @@ -1590,11 +1598,17 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi, sbi->total_valid_block_count += (block_t)(*count); avail_user_block_count = sbi->user_block_count - sbi->current_reserved_blocks; + + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE))) + avail_user_block_count -= sbi->root_reserved_blocks; Should better be: + if (test_opt(sbi, RESERVE_ROOT) && !capable(CAP_SYS_RESOURCE)) + avail_user_block_count -= sbi->root_reserved_blocks; @@ -1783,9 +1797,13 @@ static inline int inc_valid_node_count(struct f2fs_sb_info *sbi, spin_lock(>stat_lock); - valid_block_count = sbi->total_valid_block_count + 1; - if (unlikely(valid_block_count + sbi->current_reserved_blocks > - sbi->user_block_count)) { + valid_block_count = sbi->total_valid_block_count + + sbi->current_reserved_blocks + 1; + + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE))) + valid_block_count += sbi->root_reserved_blocks; + should better be: + if (test_opt(sbi, RESERVE_ROOT) && !capable(CAP_SYS_RESOURCE)) + valid_block_count += sbi->root_reserved_blocks; + if (unlikely(valid_block_count > sbi->user_block_count)) { spin_unlock(>stat_lock); goto enospc; } -- Thanks, Yunlong Song
Re: [f2fs-dev] [PATCH 2/2] f2fs: add reserved blocks for root user
On 2018/1/4 2:58, Jaegeuk Kim wrote: @@ -1590,11 +1598,17 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi, sbi->total_valid_block_count += (block_t)(*count); avail_user_block_count = sbi->user_block_count - sbi->current_reserved_blocks; + + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE))) + avail_user_block_count -= sbi->root_reserved_blocks; Should better be: + if (test_opt(sbi, RESERVE_ROOT) && !capable(CAP_SYS_RESOURCE)) + avail_user_block_count -= sbi->root_reserved_blocks; @@ -1783,9 +1797,13 @@ static inline int inc_valid_node_count(struct f2fs_sb_info *sbi, spin_lock(>stat_lock); - valid_block_count = sbi->total_valid_block_count + 1; - if (unlikely(valid_block_count + sbi->current_reserved_blocks > - sbi->user_block_count)) { + valid_block_count = sbi->total_valid_block_count + + sbi->current_reserved_blocks + 1; + + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE))) + valid_block_count += sbi->root_reserved_blocks; + should better be: + if (test_opt(sbi, RESERVE_ROOT) && !capable(CAP_SYS_RESOURCE)) + valid_block_count += sbi->root_reserved_blocks; + if (unlikely(valid_block_count > sbi->user_block_count)) { spin_unlock(>stat_lock); goto enospc; } -- Thanks, Yunlong Song
Re: [f2fs-dev] [PATCH 1/2] f2fs: show precise # of blocks that user/root can use
NACK man statfs shows: struct statfs { ... fsblkcnt_t f_bfree; /* free blocks in fs */ fsblkcnt_t f_bavail; /* free blocks available to unprivileged user */ ... } f_bfree is free blocks in fs, so buf->bfree should be buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count; On 2018/1/4 2:58, Jaegeuk Kim wrote: Let's show precise # of blocks that user/root can use through bavail and bfree respectively. Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org> --- fs/f2fs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0a820ba55b10..4c1c99cf54ef 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1005,9 +1005,9 @@ static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_bsize = sbi->blocksize; buf->f_blocks = total_count - start_count; - buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count; - buf->f_bavail = user_block_count - valid_user_blocks(sbi) - + buf->f_bfree = user_block_count - valid_user_blocks(sbi) - sbi->current_reserved_blocks; + buf->f_bavail = buf->f_bfree; avail_node_count = sbi->total_node_count - sbi->nquota_files - F2FS_RESERVED_NODE_NUM; -- Thanks, Yunlong Song
Re: [f2fs-dev] [PATCH 1/2] f2fs: show precise # of blocks that user/root can use
NACK man statfs shows: struct statfs { ... fsblkcnt_t f_bfree; /* free blocks in fs */ fsblkcnt_t f_bavail; /* free blocks available to unprivileged user */ ... } f_bfree is free blocks in fs, so buf->bfree should be buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count; On 2018/1/4 2:58, Jaegeuk Kim wrote: Let's show precise # of blocks that user/root can use through bavail and bfree respectively. Signed-off-by: Jaegeuk Kim --- fs/f2fs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0a820ba55b10..4c1c99cf54ef 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1005,9 +1005,9 @@ static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_bsize = sbi->blocksize; buf->f_blocks = total_count - start_count; - buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count; - buf->f_bavail = user_block_count - valid_user_blocks(sbi) - + buf->f_bfree = user_block_count - valid_user_blocks(sbi) - sbi->current_reserved_blocks; + buf->f_bavail = buf->f_bfree; avail_node_count = sbi->total_node_count - sbi->nquota_files - F2FS_RESERVED_NODE_NUM; -- Thanks, Yunlong Song
[PATCH v4] f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/segment.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 890d483..50575d5 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2720,6 +2720,7 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, type = se->type; down_write(_I(sbi)->curseg_lock); + f2fs_bug_on(sbi, se->valid_blocks && !IS_DATASEG(type)); if (!recover_curseg) { /* for recovery flow */ -- 1.8.5.2
[PATCH v4] f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block. Signed-off-by: Yunlong Song --- fs/f2fs/segment.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 890d483..50575d5 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2720,6 +2720,7 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, type = se->type; down_write(_I(sbi)->curseg_lock); + f2fs_bug_on(sbi, se->valid_blocks && !IS_DATASEG(type)); if (!recover_curseg) { /* for recovery flow */ -- 1.8.5.2
[PATCH v3] f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/segment.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 890d483..6c6d2dd 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2719,6 +2719,8 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, se = get_seg_entry(sbi, segno); type = se->type; + f2fs_bug_on(sbi, se->valid_blocks && !IS_DATASEG(type)); + down_write(_I(sbi)->curseg_lock); if (!recover_curseg) { -- 1.8.5.2
[PATCH v3] f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block. Signed-off-by: Yunlong Song --- fs/f2fs/segment.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 890d483..6c6d2dd 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2719,6 +2719,8 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, se = get_seg_entry(sbi, segno); type = se->type; + f2fs_bug_on(sbi, se->valid_blocks && !IS_DATASEG(type)); + down_write(_I(sbi)->curseg_lock); if (!recover_curseg) { -- 1.8.5.2
[PATCH v2] f2fs: check segment type in __f2fs_replace_block
Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/segment.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 890d483..e3bbabf 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2719,6 +2719,8 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, se = get_seg_entry(sbi, segno); type = se->type; + f2fs_bug_on(sbi, se->valid_blocks && IS_NODESEG(type)); + down_write(_I(sbi)->curseg_lock); if (!recover_curseg) { -- 1.8.5.2
[PATCH v2] f2fs: check segment type in __f2fs_replace_block
Signed-off-by: Yunlong Song --- fs/f2fs/segment.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 890d483..e3bbabf 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2719,6 +2719,8 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, se = get_seg_entry(sbi, segno); type = se->type; + f2fs_bug_on(sbi, se->valid_blocks && IS_NODESEG(type)); + down_write(_I(sbi)->curseg_lock); if (!recover_curseg) { -- 1.8.5.2
Re: [PATCH] f2fs: check segment type before recover data
On 2018/1/2 14:49, Chao Yu wrote: On 2017/12/30 15:42, Yunlong Song wrote: In some case, the node blocks has wrong blkaddr whose segment type is You mean *data block* has wrong blkaddr whose segment type is NODE? Yes. NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a Do you mean the root cause is that __f2fs_replace_block didn't update next_blkoff? No, it's not the root cause. The root cause may be something like DDR flip. result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check the segment type before recover data, and stop recover if it is not in DATA segment. Sorry, I can't catch the whole cause and effect from you description, if possible, could you give an example? For example, the i_inline flag has F2FS_INLINE_XATTR, and the last 50 i_addrs have xattr context. But if DDR flips, the i_inline flag may miss F2FS_INLINE_XATTR, and the last 50 i_addrs are considered as data block addr. If the xattr context is 0x1234, and 0x1234 happens to be a valid block addr, and the block 0x1234 happens to be in a warm node segment. Then do_recover_data will call f2fs_replace_block() with dest = 0x1234, which will change curseg of warm node to 0x1234's segment, and make update_sit_entry(sbi, 0x1234, 1), the curseg->next_blkoff also points to 0x1234's offset in its segment. When recovery process calls write_checkpoint, sync nodes will write to 0x1234's offset of curseg warm node. The update_sit_entry will check that offset and find the bitmap is already set to 1 and then calls f2fs_bug_on. Thanks, Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/recovery.c | 3 ++- fs/f2fs/segment.h | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 7d63faf..e8fee4a 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -478,7 +478,8 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode, } /* dest is valid block, try to recover from src to dest */ - if (is_valid_blkaddr(sbi, dest, META_POR)) { + if (is_valid_blkaddr(sbi, dest, META_POR) && + is_data_blkaddr(sbi, dest)) { if (src == NULL_ADDR) { err = reserve_new_block(); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 71a2aaa..5c5a215 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -115,6 +115,9 @@ #define SECTOR_TO_BLOCK(sectors) \ ((sectors) >> F2FS_LOG_SECTORS_PER_BLOCK) +#define is_data_blkaddr(sbi, blkaddr) \ + (IS_DATASEG(get_seg_entry(sbi, GET_SEGNO(sbi, blkaddr))->type)) + /* * indicate a block allocation direction: RIGHT and LEFT. * RIGHT means allocating new sections towards the end of volume. . -- Thanks, Yunlong Song
Re: [PATCH] f2fs: check segment type before recover data
On 2018/1/2 14:49, Chao Yu wrote: On 2017/12/30 15:42, Yunlong Song wrote: In some case, the node blocks has wrong blkaddr whose segment type is You mean *data block* has wrong blkaddr whose segment type is NODE? Yes. NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a Do you mean the root cause is that __f2fs_replace_block didn't update next_blkoff? No, it's not the root cause. The root cause may be something like DDR flip. result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check the segment type before recover data, and stop recover if it is not in DATA segment. Sorry, I can't catch the whole cause and effect from you description, if possible, could you give an example? For example, the i_inline flag has F2FS_INLINE_XATTR, and the last 50 i_addrs have xattr context. But if DDR flips, the i_inline flag may miss F2FS_INLINE_XATTR, and the last 50 i_addrs are considered as data block addr. If the xattr context is 0x1234, and 0x1234 happens to be a valid block addr, and the block 0x1234 happens to be in a warm node segment. Then do_recover_data will call f2fs_replace_block() with dest = 0x1234, which will change curseg of warm node to 0x1234's segment, and make update_sit_entry(sbi, 0x1234, 1), the curseg->next_blkoff also points to 0x1234's offset in its segment. When recovery process calls write_checkpoint, sync nodes will write to 0x1234's offset of curseg warm node. The update_sit_entry will check that offset and find the bitmap is already set to 1 and then calls f2fs_bug_on. Thanks, Signed-off-by: Yunlong Song --- fs/f2fs/recovery.c | 3 ++- fs/f2fs/segment.h | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 7d63faf..e8fee4a 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -478,7 +478,8 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode, } /* dest is valid block, try to recover from src to dest */ - if (is_valid_blkaddr(sbi, dest, META_POR)) { + if (is_valid_blkaddr(sbi, dest, META_POR) && + is_data_blkaddr(sbi, dest)) { if (src == NULL_ADDR) { err = reserve_new_block(); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 71a2aaa..5c5a215 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -115,6 +115,9 @@ #define SECTOR_TO_BLOCK(sectors) \ ((sectors) >> F2FS_LOG_SECTORS_PER_BLOCK) +#define is_data_blkaddr(sbi, blkaddr) \ + (IS_DATASEG(get_seg_entry(sbi, GET_SEGNO(sbi, blkaddr))->type)) + /* * indicate a block allocation direction: RIGHT and LEFT. * RIGHT means allocating new sections towards the end of volume. . -- Thanks, Yunlong Song
[PATCH] f2fs: check segment type before recover data
In some case, the node blocks has wrong blkaddr whose segment type is NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check the segment type before recover data, and stop recover if it is not in DATA segment. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/recovery.c | 3 ++- fs/f2fs/segment.h | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 7d63faf..e8fee4a 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -478,7 +478,8 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode, } /* dest is valid block, try to recover from src to dest */ - if (is_valid_blkaddr(sbi, dest, META_POR)) { + if (is_valid_blkaddr(sbi, dest, META_POR) && + is_data_blkaddr(sbi, dest)) { if (src == NULL_ADDR) { err = reserve_new_block(); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 71a2aaa..5c5a215 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -115,6 +115,9 @@ #define SECTOR_TO_BLOCK(sectors) \ ((sectors) >> F2FS_LOG_SECTORS_PER_BLOCK) +#define is_data_blkaddr(sbi, blkaddr) \ + (IS_DATASEG(get_seg_entry(sbi, GET_SEGNO(sbi, blkaddr))->type)) + /* * indicate a block allocation direction: RIGHT and LEFT. * RIGHT means allocating new sections towards the end of volume. -- 1.8.5.2
[PATCH] f2fs: check segment type before recover data
In some case, the node blocks has wrong blkaddr whose segment type is NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check the segment type before recover data, and stop recover if it is not in DATA segment. Signed-off-by: Yunlong Song --- fs/f2fs/recovery.c | 3 ++- fs/f2fs/segment.h | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 7d63faf..e8fee4a 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -478,7 +478,8 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode, } /* dest is valid block, try to recover from src to dest */ - if (is_valid_blkaddr(sbi, dest, META_POR)) { + if (is_valid_blkaddr(sbi, dest, META_POR) && + is_data_blkaddr(sbi, dest)) { if (src == NULL_ADDR) { err = reserve_new_block(); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 71a2aaa..5c5a215 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -115,6 +115,9 @@ #define SECTOR_TO_BLOCK(sectors) \ ((sectors) >> F2FS_LOG_SECTORS_PER_BLOCK) +#define is_data_blkaddr(sbi, blkaddr) \ + (IS_DATASEG(get_seg_entry(sbi, GET_SEGNO(sbi, blkaddr))->type)) + /* * indicate a block allocation direction: RIGHT and LEFT. * RIGHT means allocating new sections towards the end of volume. -- 1.8.5.2
Re: [PATCH] f2fs: avoid f2fs_gc dead loop
In this case, f2fs_gc will skip all the victims and return with no dead loop. The atomic file will use SSR to OPU, it‘s OK. On 2017/12/25 17:45, Chao Yu wrote: On 2017/12/25 14:15, Yunlong Song wrote: What if the application starts atomic write but forgets to commit, e.g. bugs in application or the application is a malicious software itself? I agree we should consider robustness of f2fs in security aspect, but please consider more scenario of these sqlite customized interface usage, it looks just skipping gc is not enough, for example, if there is one large size db in our partition, with random write, its data spreads in each segment, once this db has been atomic opened, foreground gc may loop for ever. How about checking opened time of atomic or volatile file in f2fs_balance_fs, if it exceeds threshold, we can restore the file to normal one to avoid potential security issue. Thanks, On 2017/12/25 11:44, Chao Yu wrote: On 2017/12/23 21:09, Yunlong Song wrote: For some corner case, f2fs_gc selects one target victim but cannot free that victim segment due to some reason (e.g. the segment has some blocks of atomic file which is not commited yet), in this case, the victim File should not be atomic opened for long time since normally sqlite transaction will finish quickly, so we can expect that gc loop could be ended up soon, right? Thanks, segment may probably be selected over and over, and then f2fs_gc will go to dead loop. This patch identifies the dead-loop segment, and skips it in __get_victim next time. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/f2fs.h | 8 fs/f2fs/gc.c| 34 ++ fs/f2fs/super.c | 3 +++ 3 files changed, 45 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ca6b0c9..b75851b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -115,6 +115,13 @@ struct f2fs_mount_info { unsigned intopt; }; +struct gc_loop_info { + int count; + unsigned int segno; + unsigned long *segmap; +}; +#define GC_LOOP_MAX 10 + #define F2FS_FEATURE_ENCRYPT 0x0001 #define F2FS_FEATURE_BLKZONED0x0002 #define F2FS_FEATURE_ATOMIC_WRITE0x0004 @@ -1125,6 +1132,7 @@ struct f2fs_sb_info { /* threshold for converting bg victims for fg */ u64 fggc_threshold; + struct gc_loop_info gc_loop; /* maximum # of trials to find a victim segment for SSR and GC */ unsigned int max_victim_search; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 5d5bba4..4ee9e1b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -229,6 +229,10 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) if (no_fggc_candidate(sbi, secno)) continue; + if (sbi->gc_loop.segmap && + test_bit(GET_SEG_FROM_SEC(sbi, secno), sbi->gc_loop.segmap)) + continue; + clear_bit(secno, dirty_i->victim_secmap); return GET_SEG_FROM_SEC(sbi, secno); } @@ -371,6 +375,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == FG_GC && p.alloc_mode == LFS && no_fggc_candidate(sbi, secno)) goto next; + if (gc_type == FG_GC && p.alloc_mode == LFS && + sbi->gc_loop.segmap && test_bit(segno, sbi->gc_loop.segmap)) + goto next; cost = get_gc_cost(sbi, segno, ); @@ -1042,6 +1049,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, seg_freed = do_garbage_collect(sbi, segno, _list, gc_type); if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) sec_freed++; + else if (gc_type == FG_GC && seg_freed == 0) { + if (!sbi->gc_loop.segmap) { + sbi->gc_loop.segmap = + kvzalloc(f2fs_bitmap_size(MAIN_SEGS(sbi)), GFP_KERNEL); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + if (segno == sbi->gc_loop.segno) { + if (sbi->gc_loop.count > GC_LOOP_MAX) { + f2fs_bug_on(sbi, 1); + set_bit(segno, sbi->gc_loop.segmap); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + else + sbi->gc_loop.count++; + } else { + sbi->gc_loop.segno = segno; + sbi->gc_loop.count = 0; + } + } total_freed += seg_freed; if (gc_type == FG_GC) @@ -1075,6 +1103,12 @@ int
Re: [PATCH] f2fs: avoid f2fs_gc dead loop
In this case, f2fs_gc will skip all the victims and return with no dead loop. The atomic file will use SSR to OPU, it‘s OK. On 2017/12/25 17:45, Chao Yu wrote: On 2017/12/25 14:15, Yunlong Song wrote: What if the application starts atomic write but forgets to commit, e.g. bugs in application or the application is a malicious software itself? I agree we should consider robustness of f2fs in security aspect, but please consider more scenario of these sqlite customized interface usage, it looks just skipping gc is not enough, for example, if there is one large size db in our partition, with random write, its data spreads in each segment, once this db has been atomic opened, foreground gc may loop for ever. How about checking opened time of atomic or volatile file in f2fs_balance_fs, if it exceeds threshold, we can restore the file to normal one to avoid potential security issue. Thanks, On 2017/12/25 11:44, Chao Yu wrote: On 2017/12/23 21:09, Yunlong Song wrote: For some corner case, f2fs_gc selects one target victim but cannot free that victim segment due to some reason (e.g. the segment has some blocks of atomic file which is not commited yet), in this case, the victim File should not be atomic opened for long time since normally sqlite transaction will finish quickly, so we can expect that gc loop could be ended up soon, right? Thanks, segment may probably be selected over and over, and then f2fs_gc will go to dead loop. This patch identifies the dead-loop segment, and skips it in __get_victim next time. Signed-off-by: Yunlong Song --- fs/f2fs/f2fs.h | 8 fs/f2fs/gc.c| 34 ++ fs/f2fs/super.c | 3 +++ 3 files changed, 45 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ca6b0c9..b75851b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -115,6 +115,13 @@ struct f2fs_mount_info { unsigned intopt; }; +struct gc_loop_info { + int count; + unsigned int segno; + unsigned long *segmap; +}; +#define GC_LOOP_MAX 10 + #define F2FS_FEATURE_ENCRYPT 0x0001 #define F2FS_FEATURE_BLKZONED0x0002 #define F2FS_FEATURE_ATOMIC_WRITE0x0004 @@ -1125,6 +1132,7 @@ struct f2fs_sb_info { /* threshold for converting bg victims for fg */ u64 fggc_threshold; + struct gc_loop_info gc_loop; /* maximum # of trials to find a victim segment for SSR and GC */ unsigned int max_victim_search; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 5d5bba4..4ee9e1b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -229,6 +229,10 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) if (no_fggc_candidate(sbi, secno)) continue; + if (sbi->gc_loop.segmap && + test_bit(GET_SEG_FROM_SEC(sbi, secno), sbi->gc_loop.segmap)) + continue; + clear_bit(secno, dirty_i->victim_secmap); return GET_SEG_FROM_SEC(sbi, secno); } @@ -371,6 +375,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == FG_GC && p.alloc_mode == LFS && no_fggc_candidate(sbi, secno)) goto next; + if (gc_type == FG_GC && p.alloc_mode == LFS && + sbi->gc_loop.segmap && test_bit(segno, sbi->gc_loop.segmap)) + goto next; cost = get_gc_cost(sbi, segno, ); @@ -1042,6 +1049,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, seg_freed = do_garbage_collect(sbi, segno, _list, gc_type); if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) sec_freed++; + else if (gc_type == FG_GC && seg_freed == 0) { + if (!sbi->gc_loop.segmap) { + sbi->gc_loop.segmap = + kvzalloc(f2fs_bitmap_size(MAIN_SEGS(sbi)), GFP_KERNEL); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + if (segno == sbi->gc_loop.segno) { + if (sbi->gc_loop.count > GC_LOOP_MAX) { + f2fs_bug_on(sbi, 1); + set_bit(segno, sbi->gc_loop.segmap); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + else + sbi->gc_loop.count++; + } else { + sbi->gc_loop.segno = segno; + sbi->gc_loop.count = 0; + } + } total_freed += seg_freed; if (gc_type == FG_GC) @@ -1075,6 +1103,12 @@ int f2fs_gc(struct f2fs_s
Re: [PATCH] f2fs: avoid f2fs_gc dead loop
What if the application starts atomic write but forgets to commit, e.g. bugs in application or the application is a malicious software itself? On 2017/12/25 11:44, Chao Yu wrote: On 2017/12/23 21:09, Yunlong Song wrote: For some corner case, f2fs_gc selects one target victim but cannot free that victim segment due to some reason (e.g. the segment has some blocks of atomic file which is not commited yet), in this case, the victim File should not be atomic opened for long time since normally sqlite transaction will finish quickly, so we can expect that gc loop could be ended up soon, right? Thanks, segment may probably be selected over and over, and then f2fs_gc will go to dead loop. This patch identifies the dead-loop segment, and skips it in __get_victim next time. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/f2fs.h | 8 fs/f2fs/gc.c| 34 ++ fs/f2fs/super.c | 3 +++ 3 files changed, 45 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ca6b0c9..b75851b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -115,6 +115,13 @@ struct f2fs_mount_info { unsigned intopt; }; +struct gc_loop_info { + int count; + unsigned int segno; + unsigned long *segmap; +}; +#define GC_LOOP_MAX 10 + #define F2FS_FEATURE_ENCRYPT 0x0001 #define F2FS_FEATURE_BLKZONED 0x0002 #define F2FS_FEATURE_ATOMIC_WRITE 0x0004 @@ -1125,6 +1132,7 @@ struct f2fs_sb_info { /* threshold for converting bg victims for fg */ u64 fggc_threshold; + struct gc_loop_info gc_loop; /* maximum # of trials to find a victim segment for SSR and GC */ unsigned int max_victim_search; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 5d5bba4..4ee9e1b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -229,6 +229,10 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) if (no_fggc_candidate(sbi, secno)) continue; + if (sbi->gc_loop.segmap && + test_bit(GET_SEG_FROM_SEC(sbi, secno), sbi->gc_loop.segmap)) + continue; + clear_bit(secno, dirty_i->victim_secmap); return GET_SEG_FROM_SEC(sbi, secno); } @@ -371,6 +375,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == FG_GC && p.alloc_mode == LFS && no_fggc_candidate(sbi, secno)) goto next; + if (gc_type == FG_GC && p.alloc_mode == LFS && + sbi->gc_loop.segmap && test_bit(segno, sbi->gc_loop.segmap)) + goto next; cost = get_gc_cost(sbi, segno, ); @@ -1042,6 +1049,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, seg_freed = do_garbage_collect(sbi, segno, _list, gc_type); if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) sec_freed++; + else if (gc_type == FG_GC && seg_freed == 0) { + if (!sbi->gc_loop.segmap) { + sbi->gc_loop.segmap = + kvzalloc(f2fs_bitmap_size(MAIN_SEGS(sbi)), GFP_KERNEL); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + if (segno == sbi->gc_loop.segno) { + if (sbi->gc_loop.count > GC_LOOP_MAX) { + f2fs_bug_on(sbi, 1); + set_bit(segno, sbi->gc_loop.segmap); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + else + sbi->gc_loop.count++; + } else { + sbi->gc_loop.segno = segno; + sbi->gc_loop.count = 0; + } + } total_freed += seg_freed; if (gc_type == FG_GC) @@ -1075,6 +1103,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, if (sync) ret = sec_freed ? 0 : -EAGAIN; + if (sbi->gc_loop.segmap) { + kvfree(sbi->gc_loop.segmap); + sbi->gc_loop.segmap = NULL; + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } return ret; } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 031cb26..76f0b72 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2562,6 +2562,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) sbi->last_valid_block_count = sbi->total_valid_block_count; sbi->reserved_blocks = 0; sbi->current_reserved_blocks = 0; + sbi->gc_loop.segmap
Re: [PATCH] f2fs: avoid f2fs_gc dead loop
What if the application starts atomic write but forgets to commit, e.g. bugs in application or the application is a malicious software itself? On 2017/12/25 11:44, Chao Yu wrote: On 2017/12/23 21:09, Yunlong Song wrote: For some corner case, f2fs_gc selects one target victim but cannot free that victim segment due to some reason (e.g. the segment has some blocks of atomic file which is not commited yet), in this case, the victim File should not be atomic opened for long time since normally sqlite transaction will finish quickly, so we can expect that gc loop could be ended up soon, right? Thanks, segment may probably be selected over and over, and then f2fs_gc will go to dead loop. This patch identifies the dead-loop segment, and skips it in __get_victim next time. Signed-off-by: Yunlong Song --- fs/f2fs/f2fs.h | 8 fs/f2fs/gc.c| 34 ++ fs/f2fs/super.c | 3 +++ 3 files changed, 45 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ca6b0c9..b75851b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -115,6 +115,13 @@ struct f2fs_mount_info { unsigned intopt; }; +struct gc_loop_info { + int count; + unsigned int segno; + unsigned long *segmap; +}; +#define GC_LOOP_MAX 10 + #define F2FS_FEATURE_ENCRYPT 0x0001 #define F2FS_FEATURE_BLKZONED 0x0002 #define F2FS_FEATURE_ATOMIC_WRITE 0x0004 @@ -1125,6 +1132,7 @@ struct f2fs_sb_info { /* threshold for converting bg victims for fg */ u64 fggc_threshold; + struct gc_loop_info gc_loop; /* maximum # of trials to find a victim segment for SSR and GC */ unsigned int max_victim_search; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 5d5bba4..4ee9e1b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -229,6 +229,10 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) if (no_fggc_candidate(sbi, secno)) continue; + if (sbi->gc_loop.segmap && + test_bit(GET_SEG_FROM_SEC(sbi, secno), sbi->gc_loop.segmap)) + continue; + clear_bit(secno, dirty_i->victim_secmap); return GET_SEG_FROM_SEC(sbi, secno); } @@ -371,6 +375,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == FG_GC && p.alloc_mode == LFS && no_fggc_candidate(sbi, secno)) goto next; + if (gc_type == FG_GC && p.alloc_mode == LFS && + sbi->gc_loop.segmap && test_bit(segno, sbi->gc_loop.segmap)) + goto next; cost = get_gc_cost(sbi, segno, ); @@ -1042,6 +1049,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, seg_freed = do_garbage_collect(sbi, segno, _list, gc_type); if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) sec_freed++; + else if (gc_type == FG_GC && seg_freed == 0) { + if (!sbi->gc_loop.segmap) { + sbi->gc_loop.segmap = + kvzalloc(f2fs_bitmap_size(MAIN_SEGS(sbi)), GFP_KERNEL); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + if (segno == sbi->gc_loop.segno) { + if (sbi->gc_loop.count > GC_LOOP_MAX) { + f2fs_bug_on(sbi, 1); + set_bit(segno, sbi->gc_loop.segmap); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + else + sbi->gc_loop.count++; + } else { + sbi->gc_loop.segno = segno; + sbi->gc_loop.count = 0; + } + } total_freed += seg_freed; if (gc_type == FG_GC) @@ -1075,6 +1103,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, if (sync) ret = sec_freed ? 0 : -EAGAIN; + if (sbi->gc_loop.segmap) { + kvfree(sbi->gc_loop.segmap); + sbi->gc_loop.segmap = NULL; + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } return ret; } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 031cb26..76f0b72 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2562,6 +2562,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) sbi->last_valid_block_count = sbi->total_valid_block_count; sbi->reserved_blocks = 0; sbi->current_reserved_blocks = 0; + sbi->gc_loop.segmap = NULL; + sbi
[PATCH] f2fs: avoid f2fs_gc dead loop
For some corner case, f2fs_gc selects one target victim but cannot free that victim segment due to some reason (e.g. the segment has some blocks of atomic file which is not commited yet), in this case, the victim segment may probably be selected over and over, and then f2fs_gc will go to dead loop. This patch identifies the dead-loop segment, and skips it in __get_victim next time. Signed-off-by: Yunlong Song <yunlong.s...@huawei.com> --- fs/f2fs/f2fs.h | 8 fs/f2fs/gc.c| 34 ++ fs/f2fs/super.c | 3 +++ 3 files changed, 45 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ca6b0c9..b75851b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -115,6 +115,13 @@ struct f2fs_mount_info { unsigned intopt; }; +struct gc_loop_info { + int count; + unsigned int segno; + unsigned long *segmap; +}; +#define GC_LOOP_MAX 10 + #define F2FS_FEATURE_ENCRYPT 0x0001 #define F2FS_FEATURE_BLKZONED 0x0002 #define F2FS_FEATURE_ATOMIC_WRITE 0x0004 @@ -1125,6 +1132,7 @@ struct f2fs_sb_info { /* threshold for converting bg victims for fg */ u64 fggc_threshold; + struct gc_loop_info gc_loop; /* maximum # of trials to find a victim segment for SSR and GC */ unsigned int max_victim_search; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 5d5bba4..4ee9e1b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -229,6 +229,10 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) if (no_fggc_candidate(sbi, secno)) continue; + if (sbi->gc_loop.segmap && + test_bit(GET_SEG_FROM_SEC(sbi, secno), sbi->gc_loop.segmap)) + continue; + clear_bit(secno, dirty_i->victim_secmap); return GET_SEG_FROM_SEC(sbi, secno); } @@ -371,6 +375,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == FG_GC && p.alloc_mode == LFS && no_fggc_candidate(sbi, secno)) goto next; + if (gc_type == FG_GC && p.alloc_mode == LFS && + sbi->gc_loop.segmap && test_bit(segno, sbi->gc_loop.segmap)) + goto next; cost = get_gc_cost(sbi, segno, ); @@ -1042,6 +1049,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, seg_freed = do_garbage_collect(sbi, segno, _list, gc_type); if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) sec_freed++; + else if (gc_type == FG_GC && seg_freed == 0) { + if (!sbi->gc_loop.segmap) { + sbi->gc_loop.segmap = + kvzalloc(f2fs_bitmap_size(MAIN_SEGS(sbi)), GFP_KERNEL); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + if (segno == sbi->gc_loop.segno) { + if (sbi->gc_loop.count > GC_LOOP_MAX) { + f2fs_bug_on(sbi, 1); + set_bit(segno, sbi->gc_loop.segmap); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + else + sbi->gc_loop.count++; + } else { + sbi->gc_loop.segno = segno; + sbi->gc_loop.count = 0; + } + } total_freed += seg_freed; if (gc_type == FG_GC) @@ -1075,6 +1103,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, if (sync) ret = sec_freed ? 0 : -EAGAIN; + if (sbi->gc_loop.segmap) { + kvfree(sbi->gc_loop.segmap); + sbi->gc_loop.segmap = NULL; + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } return ret; } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 031cb26..76f0b72 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2562,6 +2562,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) sbi->last_valid_block_count = sbi->total_valid_block_count; sbi->reserved_blocks = 0; sbi->current_reserved_blocks = 0; + sbi->gc_loop.segmap = NULL; + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; for (i = 0; i < NR_INODE_TYPE; i++) { INIT_LIST_HEAD(>inode_list[i]); -- 1.8.5.2
[PATCH] f2fs: avoid f2fs_gc dead loop
For some corner case, f2fs_gc selects one target victim but cannot free that victim segment due to some reason (e.g. the segment has some blocks of atomic file which is not commited yet), in this case, the victim segment may probably be selected over and over, and then f2fs_gc will go to dead loop. This patch identifies the dead-loop segment, and skips it in __get_victim next time. Signed-off-by: Yunlong Song --- fs/f2fs/f2fs.h | 8 fs/f2fs/gc.c| 34 ++ fs/f2fs/super.c | 3 +++ 3 files changed, 45 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ca6b0c9..b75851b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -115,6 +115,13 @@ struct f2fs_mount_info { unsigned intopt; }; +struct gc_loop_info { + int count; + unsigned int segno; + unsigned long *segmap; +}; +#define GC_LOOP_MAX 10 + #define F2FS_FEATURE_ENCRYPT 0x0001 #define F2FS_FEATURE_BLKZONED 0x0002 #define F2FS_FEATURE_ATOMIC_WRITE 0x0004 @@ -1125,6 +1132,7 @@ struct f2fs_sb_info { /* threshold for converting bg victims for fg */ u64 fggc_threshold; + struct gc_loop_info gc_loop; /* maximum # of trials to find a victim segment for SSR and GC */ unsigned int max_victim_search; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 5d5bba4..4ee9e1b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -229,6 +229,10 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) if (no_fggc_candidate(sbi, secno)) continue; + if (sbi->gc_loop.segmap && + test_bit(GET_SEG_FROM_SEC(sbi, secno), sbi->gc_loop.segmap)) + continue; + clear_bit(secno, dirty_i->victim_secmap); return GET_SEG_FROM_SEC(sbi, secno); } @@ -371,6 +375,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == FG_GC && p.alloc_mode == LFS && no_fggc_candidate(sbi, secno)) goto next; + if (gc_type == FG_GC && p.alloc_mode == LFS && + sbi->gc_loop.segmap && test_bit(segno, sbi->gc_loop.segmap)) + goto next; cost = get_gc_cost(sbi, segno, ); @@ -1042,6 +1049,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, seg_freed = do_garbage_collect(sbi, segno, _list, gc_type); if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) sec_freed++; + else if (gc_type == FG_GC && seg_freed == 0) { + if (!sbi->gc_loop.segmap) { + sbi->gc_loop.segmap = + kvzalloc(f2fs_bitmap_size(MAIN_SEGS(sbi)), GFP_KERNEL); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + if (segno == sbi->gc_loop.segno) { + if (sbi->gc_loop.count > GC_LOOP_MAX) { + f2fs_bug_on(sbi, 1); + set_bit(segno, sbi->gc_loop.segmap); + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } + else + sbi->gc_loop.count++; + } else { + sbi->gc_loop.segno = segno; + sbi->gc_loop.count = 0; + } + } total_freed += seg_freed; if (gc_type == FG_GC) @@ -1075,6 +1103,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, if (sync) ret = sec_freed ? 0 : -EAGAIN; + if (sbi->gc_loop.segmap) { + kvfree(sbi->gc_loop.segmap); + sbi->gc_loop.segmap = NULL; + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; + } return ret; } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 031cb26..76f0b72 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2562,6 +2562,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) sbi->last_valid_block_count = sbi->total_valid_block_count; sbi->reserved_blocks = 0; sbi->current_reserved_blocks = 0; + sbi->gc_loop.segmap = NULL; + sbi->gc_loop.count = 0; + sbi->gc_loop.segno = NULL_SEGNO; for (i = 0; i < NR_INODE_TYPE; i++) { INIT_LIST_HEAD(>inode_list[i]); -- 1.8.5.2
Re: [PATCH v4] fsck.f2fs: check and fix i_namelen to avoid double free
And there is en[namelen] = '\0', should fix namelen to its right value. On 2017/12/23 11:35, Chao Yu wrote: On 2017/12/23 11:19, Yunlong Song wrote: Double free problem: Since ddr bit jump makes i_namelen a larger value (> 255),when file is not encrypted, the convert_encrypted_name will memcpy out range of en[255], when en is freed, there will be double free problem. It looks there is only memcpy overflow problem here. Thanks, On 2017/12/23 11:05, Chao Yu wrote: On 2017/12/18 21:25, Yunlong Song wrote: v1 -> v2: use child_info to pass dentry namelen v2 -> v3: check child != NULL to include the F2FS_FT_ORPHAN file type v3 -> v4: fix the i_namelen problem of dump.f2fs、 There is no commit log, so what do you mean about "avoid double free"? Other than that, looks good to me. Reviewed-by: Chao Yu <yuch...@huawei.com> Thanks, . . -- Thanks, Yunlong Song
Re: [PATCH v4] fsck.f2fs: check and fix i_namelen to avoid double free
And there is en[namelen] = '\0', should fix namelen to its right value. On 2017/12/23 11:35, Chao Yu wrote: On 2017/12/23 11:19, Yunlong Song wrote: Double free problem: Since ddr bit jump makes i_namelen a larger value (> 255),when file is not encrypted, the convert_encrypted_name will memcpy out range of en[255], when en is freed, there will be double free problem. It looks there is only memcpy overflow problem here. Thanks, On 2017/12/23 11:05, Chao Yu wrote: On 2017/12/18 21:25, Yunlong Song wrote: v1 -> v2: use child_info to pass dentry namelen v2 -> v3: check child != NULL to include the F2FS_FT_ORPHAN file type v3 -> v4: fix the i_namelen problem of dump.f2fs、 There is no commit log, so what do you mean about "avoid double free"? Other than that, looks good to me. Reviewed-by: Chao Yu Thanks, . . -- Thanks, Yunlong Song