On 2018/2/13 2:03, Jaegeuk Kim wrote:
> On 02/12, guoweichao wrote:
>>
>>
>> On 2018/2/12 11:40, Jaegeuk Kim wrote:
>>> On 02/12, guoweichao wrote:
>>>>
>>>>
>>>> On 2018/2/12 10:32, Jaegeuk Kim wrote:
>>>>> On 02/12, guoweichao wrote:
>>>>>> Hi Jaegeuk,
>>>>>>
>>>>>> On 2018/2/12 7:32, Jaegeuk Kim wrote:
>>>>>>> On 02/06, Weichao Guo wrote:
>>>>>>>> There is a potential inconsistent metadata case due to a cp block
>>>>>>>> crc invalid in the latest checkpoint caused by hardware issues:
>>>>>>>> 1) write nodes into segment x;
>>>>>>>> 2) write checkpoint A;
>>>>>>>> 3) remove nodes in segment x;
>>>>>>>> 4) write checkpoint B;
>>>>>>>> 5) issue discard or write datas into segment x;
>>>>>>>> 6) sudden power-cut;
>>>>>>>> 7) use checkpoint A after reboot as checkpoint B is invalid
>>>>>>>>
>>>>>>>> This inconsistency may be found after several reboots long time later
>>>>>>>> and the kernel log about cp block crc invalid has disappeared. This
>>>>>>>> makes the root cause of the inconsistency is hard to locate. Let us
>>>>>>>> separate such other part issues from f2fs logical bugs in debug 
>>>>>>>> version.
>>>>>>>>
>>>>>>>> Signed-off-by: Weichao Guo <guoweic...@huawei.com>
>>>>>>>> ---
>>>>>>>>  fs/f2fs/checkpoint.c | 8 ++++++--
>>>>>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>>>>>> index 8b0945b..16ba96a 100644
>>>>>>>> --- a/fs/f2fs/checkpoint.c
>>>>>>>> +++ b/fs/f2fs/checkpoint.c
>>>>>>>> @@ -737,13 +737,17 @@ static int get_checkpoint_version(struct 
>>>>>>>> f2fs_sb_info *sbi, block_t cp_addr,
>>>>>>>>        crc_offset = le32_to_cpu((*cp_block)->checksum_offset);
>>>>>>>>        if (crc_offset > (blk_size - sizeof(__le32))) {
>>>>>>>>                f2fs_msg(sbi->sb, KERN_WARNING,
>>>>>>>> -                      "invalid crc_offset: %zu", crc_offset);
>>>>>>>> +                      "invalid crc_offset: %zu at blk_addr: 0x%x",
>>>>>>>> +                              crc_offset, cp_addr);
>>>>>>>> +              f2fs_bug_on(sbi, 1);
>>>>>>>
>>>>>>> I don't think we can use bug_on here, since we're easily getting this 
>>>>>>> when
>>>>>>> power-cut happened in the middle of checkpoint pack writes, which is an 
>>>>>>> expected
>>>>>>> behavior. Hmm, we need to consider another way to detect that.
>>>>>> We only check CP block crc here. The two CP blocks may have different CP 
>>>>>> versions when
>>>>>> power-cut happened, but their crc value should be valid. IMO, this patch 
>>>>>> will trigger a
>>>>>> bug_on only when some external issues cause CP block crc invalid as one 
>>>>>> 4K page is
>>>>>> persisted atomically.
>>>>>
>>>>> Huh? This checks crc_offset, not crc? Unfortunately, my simple fault 
>>>>> injection
>>>>> test gave this bug_on within a day. The below bug_on seems what you're 
>>>>> saying
>>>>> about tho.
>>>> oh sorry, I didn't notice the code line carefully. But which fault 
>>>> injection trigger
>>>> this bug_on? The crc_offset is also parts of the CP block, it seems 
>>>> power-cut happened
>>>> in middle of writing checkpoint should not produce an invalid crc_offset.
>>>
>>> The second cp block can have stale data used by previous part of checkpoint.
>> Yes, but in all CP packs, a cp block crc_offset should always be 
>> CHECKSUM_OFFSET
>> which is configured at mkfs, right? I'm still not figure out why crc_offset 
>> invalid
>> in the stale data of previous part of checkpoint?
> 
> The second block position can be varied by some ckpt flags, so that we could 
> get
> it from previous summary block for example.
Got it. How about adding a magic number at a specific position of CP block? In 
such a way,
We can distinguish a broken CP block from other type of blocks in a checkpoint.

Thanks,
> 
> Thanks,
> 
>>
>> Thanks,
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>>                return -EINVAL;
>>>>>>>>        }
>>>>>>>>  
>>>>>>>>        crc = cur_cp_crc(*cp_block);
>>>>>>>>        if (!f2fs_crc_valid(sbi, crc, *cp_block, crc_offset)) {
>>>>>>>> -              f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc value");
>>>>>>>> +              f2fs_msg(sbi->sb, KERN_WARNING,
>>>>>>>> +                      "invalid crc value at blk_addr: 0x%x", cp_addr);
>>>>>>>> +              f2fs_bug_on(sbi, 1);
>>>>>>>>                return -EINVAL;
>>>>>>>>        }
>>>>>>>>  
>>>>>>>> -- 
>>>>>>>> 2.10.1
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to