On 2018/2/13 2:03, Jaegeuk Kim wrote: > On 02/12, guoweichao wrote: >> >> >> On 2018/2/12 11:40, Jaegeuk Kim wrote: >>> On 02/12, guoweichao wrote: >>>> >>>> >>>> On 2018/2/12 10:32, Jaegeuk Kim wrote: >>>>> On 02/12, guoweichao wrote: >>>>>> Hi Jaegeuk, >>>>>> >>>>>> On 2018/2/12 7:32, Jaegeuk Kim wrote: >>>>>>> On 02/06, Weichao Guo wrote: >>>>>>>> There is a potential inconsistent metadata case due to a cp block >>>>>>>> crc invalid in the latest checkpoint caused by hardware issues: >>>>>>>> 1) write nodes into segment x; >>>>>>>> 2) write checkpoint A; >>>>>>>> 3) remove nodes in segment x; >>>>>>>> 4) write checkpoint B; >>>>>>>> 5) issue discard or write datas into segment x; >>>>>>>> 6) sudden power-cut; >>>>>>>> 7) use checkpoint A after reboot as checkpoint B is invalid >>>>>>>> >>>>>>>> This inconsistency may be found after several reboots long time later >>>>>>>> and the kernel log about cp block crc invalid has disappeared. This >>>>>>>> makes the root cause of the inconsistency is hard to locate. Let us >>>>>>>> separate such other part issues from f2fs logical bugs in debug >>>>>>>> version. >>>>>>>> >>>>>>>> Signed-off-by: Weichao Guo <guoweic...@huawei.com> >>>>>>>> --- >>>>>>>> fs/f2fs/checkpoint.c | 8 ++++++-- >>>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >>>>>>>> index 8b0945b..16ba96a 100644 >>>>>>>> --- a/fs/f2fs/checkpoint.c >>>>>>>> +++ b/fs/f2fs/checkpoint.c >>>>>>>> @@ -737,13 +737,17 @@ static int get_checkpoint_version(struct >>>>>>>> f2fs_sb_info *sbi, block_t cp_addr, >>>>>>>> crc_offset = le32_to_cpu((*cp_block)->checksum_offset); >>>>>>>> if (crc_offset > (blk_size - sizeof(__le32))) { >>>>>>>> f2fs_msg(sbi->sb, KERN_WARNING, >>>>>>>> - "invalid crc_offset: %zu", crc_offset); >>>>>>>> + "invalid crc_offset: %zu at blk_addr: 0x%x", >>>>>>>> + crc_offset, cp_addr); >>>>>>>> + f2fs_bug_on(sbi, 1); >>>>>>> >>>>>>> I don't think we can use bug_on here, since we're easily getting this >>>>>>> when >>>>>>> power-cut happened in the middle of checkpoint pack writes, which is an >>>>>>> expected >>>>>>> behavior. Hmm, we need to consider another way to detect that. >>>>>> We only check CP block crc here. The two CP blocks may have different CP >>>>>> versions when >>>>>> power-cut happened, but their crc value should be valid. IMO, this patch >>>>>> will trigger a >>>>>> bug_on only when some external issues cause CP block crc invalid as one >>>>>> 4K page is >>>>>> persisted atomically. >>>>> >>>>> Huh? This checks crc_offset, not crc? Unfortunately, my simple fault >>>>> injection >>>>> test gave this bug_on within a day. The below bug_on seems what you're >>>>> saying >>>>> about tho. >>>> oh sorry, I didn't notice the code line carefully. But which fault >>>> injection trigger >>>> this bug_on? The crc_offset is also parts of the CP block, it seems >>>> power-cut happened >>>> in middle of writing checkpoint should not produce an invalid crc_offset. >>> >>> The second cp block can have stale data used by previous part of checkpoint. >> Yes, but in all CP packs, a cp block crc_offset should always be >> CHECKSUM_OFFSET >> which is configured at mkfs, right? I'm still not figure out why crc_offset >> invalid >> in the stale data of previous part of checkpoint? > > The second block position can be varied by some ckpt flags, so that we could > get > it from previous summary block for example. Got it. How about adding a magic number at a specific position of CP block? In such a way, We can distinguish a broken CP block from other type of blocks in a checkpoint.
Thanks, > > Thanks, > >> >> Thanks, >>> >>>> >>>> Thanks, >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>>> return -EINVAL; >>>>>>>> } >>>>>>>> >>>>>>>> crc = cur_cp_crc(*cp_block); >>>>>>>> if (!f2fs_crc_valid(sbi, crc, *cp_block, crc_offset)) { >>>>>>>> - f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc value"); >>>>>>>> + f2fs_msg(sbi->sb, KERN_WARNING, >>>>>>>> + "invalid crc value at blk_addr: 0x%x", cp_addr); >>>>>>>> + f2fs_bug_on(sbi, 1); >>>>>>>> return -EINVAL; >>>>>>>> } >>>>>>>> >>>>>>>> -- >>>>>>>> 2.10.1 >>>>>>> >>>>>>> . >>>>>>> >>>>> >>>>> . >>>>> >>> >>> . >>> > > . > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel