On 2018/2/12 11:40, Jaegeuk Kim wrote: > On 02/12, guoweichao wrote: >> >> >> On 2018/2/12 10:32, Jaegeuk Kim wrote: >>> On 02/12, guoweichao wrote: >>>> Hi Jaegeuk, >>>> >>>> On 2018/2/12 7:32, Jaegeuk Kim wrote: >>>>> On 02/06, Weichao Guo wrote: >>>>>> There is a potential inconsistent metadata case due to a cp block >>>>>> crc invalid in the latest checkpoint caused by hardware issues: >>>>>> 1) write nodes into segment x; >>>>>> 2) write checkpoint A; >>>>>> 3) remove nodes in segment x; >>>>>> 4) write checkpoint B; >>>>>> 5) issue discard or write datas into segment x; >>>>>> 6) sudden power-cut; >>>>>> 7) use checkpoint A after reboot as checkpoint B is invalid >>>>>> >>>>>> This inconsistency may be found after several reboots long time later >>>>>> and the kernel log about cp block crc invalid has disappeared. This >>>>>> makes the root cause of the inconsistency is hard to locate. Let us >>>>>> separate such other part issues from f2fs logical bugs in debug version. >>>>>> >>>>>> Signed-off-by: Weichao Guo <guoweic...@huawei.com> >>>>>> --- >>>>>> fs/f2fs/checkpoint.c | 8 ++++++-- >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >>>>>> index 8b0945b..16ba96a 100644 >>>>>> --- a/fs/f2fs/checkpoint.c >>>>>> +++ b/fs/f2fs/checkpoint.c >>>>>> @@ -737,13 +737,17 @@ static int get_checkpoint_version(struct >>>>>> f2fs_sb_info *sbi, block_t cp_addr, >>>>>> crc_offset = le32_to_cpu((*cp_block)->checksum_offset); >>>>>> if (crc_offset > (blk_size - sizeof(__le32))) { >>>>>> f2fs_msg(sbi->sb, KERN_WARNING, >>>>>> - "invalid crc_offset: %zu", crc_offset); >>>>>> + "invalid crc_offset: %zu at blk_addr: 0x%x", >>>>>> + crc_offset, cp_addr); >>>>>> + f2fs_bug_on(sbi, 1); >>>>> >>>>> I don't think we can use bug_on here, since we're easily getting this when >>>>> power-cut happened in the middle of checkpoint pack writes, which is an >>>>> expected >>>>> behavior. Hmm, we need to consider another way to detect that. >>>> We only check CP block crc here. The two CP blocks may have different CP >>>> versions when >>>> power-cut happened, but their crc value should be valid. IMO, this patch >>>> will trigger a >>>> bug_on only when some external issues cause CP block crc invalid as one 4K >>>> page is >>>> persisted atomically. >>> >>> Huh? This checks crc_offset, not crc? Unfortunately, my simple fault >>> injection >>> test gave this bug_on within a day. The below bug_on seems what you're >>> saying >>> about tho. >> oh sorry, I didn't notice the code line carefully. But which fault injection >> trigger >> this bug_on? The crc_offset is also parts of the CP block, it seems >> power-cut happened >> in middle of writing checkpoint should not produce an invalid crc_offset. > > The second cp block can have stale data used by previous part of checkpoint. Yes, but in all CP packs, a cp block crc_offset should always be CHECKSUM_OFFSET which is configured at mkfs, right? I'm still not figure out why crc_offset invalid in the stale data of previous part of checkpoint?
Thanks, > >> >> Thanks, >> >>> >>>> >>>> Thanks, >>>>> >>>>> Thanks, >>>>> >>>>>> return -EINVAL; >>>>>> } >>>>>> >>>>>> crc = cur_cp_crc(*cp_block); >>>>>> if (!f2fs_crc_valid(sbi, crc, *cp_block, crc_offset)) { >>>>>> - f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc value"); >>>>>> + f2fs_msg(sbi->sb, KERN_WARNING, >>>>>> + "invalid crc value at blk_addr: 0x%x", cp_addr); >>>>>> + f2fs_bug_on(sbi, 1); >>>>>> return -EINVAL; >>>>>> } >>>>>> >>>>>> -- >>>>>> 2.10.1 >>>>> >>>>> . >>>>> >>> >>> . >>> > > . > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel