Hi Jaegeuk,

On 2018/2/28 12:57, Jaegeuk Kim wrote:
> On 02/26, Weichao Guo wrote:
>> There is a potential inconsistent metadata case due to a cp block
>> crc invalid in the latest checkpoint caused by hardware issues:
>> 1) write nodes into segment x;
>> 2) write checkpoint A;
>> 3) remove nodes in segment x;
>> 4) write checkpoint B;
>> 5) issue discard or write datas into segment x;
>> 6) sudden power-cut;
>> 7) use checkpoint A after reboot as checkpoint B is invalid
>>
>> This inconsistency may be found after several reboots long time
>> later and the kernel log about cp block crc invalid has disappeared.
>> This makes the root cause of the inconsistency is hard to locate.
>> Let us separate such other part issues from f2fs logical bugs in
>> debug version, and try to validate the checkpoint with the backup
>> when cp2 is corrupted.
> 
> How can you guarantee it happened caused by real power-cut during checkpoint?
> Does it really make sense to write addition block at every checkpoint?

I have a short discussion with Weichao off-line, if possible, maybe we can
introduce a new feature to add parity in CP area, use it to check and rebuild
checkpoint once last checkpoint is corrupted, instead of adding one more cp pack
backup, anyway, just a raw thought.

Thanks,

> 
>>
>> Signed-off-by: Weichao Guo <guoweic...@huawei.com>
>> ---
>>  fs/f2fs/checkpoint.c | 27 +++++++++++++++++++++++----
>>  1 file changed, 23 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index 512dca8..81c4acd 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -761,15 +761,29 @@ static struct page *validate_checkpoint(struct 
>> f2fs_sb_info *sbi,
>>  
>>      err = get_checkpoint_version(sbi, cp_addr, &cp_block,
>>                                      &cp_page_1, version);
>> -    if (err)
>> +    if (err) {
>> +            f2fs_msg(sbi->sb, KERN_WARNING,
>> +                    "corrupted cp1 at blk_addr: 0x%x", cp_addr);
>> +            f2fs_bug_on(sbi, 1);
>>              goto invalid_cp1;
>> +    }
>>      pre_version = *version;
>>  
>>      cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1;
>>      err = get_checkpoint_version(sbi, cp_addr, &cp_block,
>>                                      &cp_page_2, version);
>> -    if (err)
>> -            goto invalid_cp2;
>> +    if (err) {
>> +            err = get_checkpoint_version(sbi, cp_addr + 1, &cp_block,
>> +                                                    &cp_page_2, version);
>> +            if (err) {
>> +                    goto invalid_cp2;
>> +            }
>> +            else if (*version == pre_version) {
>> +                    f2fs_msg(sbi->sb, KERN_WARNING,
>> +                            "corrupted cp2 at blk_addr: 0x%x", cp_addr);
>> +                    f2fs_bug_on(sbi, 1);
>> +            }
>> +    }
>>      cur_version = *version;
>>  
>>      if (cur_version == pre_version) {
>> @@ -1302,7 +1316,12 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
>> struct cp_control *cpc)
>>      }
>>  
>>      /* writeout checkpoint block */
>> -    update_meta_page(sbi, ckpt, start_blk);
>> +    update_meta_page(sbi, ckpt, start_blk++);
>> +    /* writeout a backup of cp2 if available */
>> +    if (!enabled_nat_bits(sbi, cpc) ||
>> +            (ckpt->cp_pack_total_block_count + 1 <=
>> +                    sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks))
>> +            update_meta_page(sbi, ckpt, start_blk);
>>  
>>      /* wait for previous submitted node/meta pages writeback */
>>      wait_on_all_pages_writeback(sbi);
>> -- 
>> 2.10.1
> 
> .
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to