On 2019/1/16 下午8:01, Hugo Mills wrote:
>    Hi, Qu,
> 
> On Wed, Jan 16, 2019 at 07:53:08PM +0800, Qu Wenruo wrote:
>> There are at least 2 reports about memory bit flip sneaking into on-disk
>> data.
>>
>> Currently we only have a relaxed check triggered at
>> btrfs_mark_buffer_dirty() time, as it's not mandatory, only for
>> CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build.
>>
>> This patch will address the hole by triggering comprehensive check on
>> tree blocks before writing it back to disk.
>>
>> The timing is set to csum_tree_block() where @verify == 0.
>> At that timing, we're generation csum for tree blocks before submitting
>> the metadata bio, so we could avoid all the unnecessary calls at
>> btrfs_mark_buffer_dirty(), but still catch enough error.
> 
>    I agree wholeheartedly with the idea of this change. Just one
> question:
> 
>    How does this get reported to the user/administrator? As I
> understand it, a detectably corrupt metadata page will generate an I/O
> error from the filesystem before it's written to disk? How will this
> show up in kernel logs?

Well, you caught me.

I haven't try the error case, and in fact if it fails, it fails by
triggering kernel BUG_ON(), thus you may not have a chance to see the
error message from btrfs module.

> Is it distinguishable in any way from a
> similar error that was generated on reading such a corrupt metadata
> node from the disk?

Kind of distinguishable for this patch, when you hit kernel BUG_ON() at
fs/btrfs/extent_io.c:4016 then it's definitely from this patch. :P

> 
>    Basically, I want to be able to distinguish this case (error
> detected when writing) from the existing case (error detected when
> reading) when someone shows up on IRC with a "broken filesystem".

Definitely, I'll address this and the BUG_ON() in next version.

Thanks,
Qu

> 
>    Hugo.
> 
>> Reported-by: Leonard Lausen <[email protected]>
>> Signed-off-by: Qu Wenruo <[email protected]>
>> ---
>>  fs/btrfs/disk-io.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 8da2f380d3c0..45bf6be9e751 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -313,6 +313,16 @@ static int csum_tree_block(struct btrfs_fs_info 
>> *fs_info,
>>                      return -EUCLEAN;
>>              }
>>      } else {
>> +            /*
>> +             * Here we're calculating csum before writing it to disk,
>> +             * do comprenhensive check here to catch memory corruption
>> +             */
>> +            if (btrfs_header_level(buf))
>> +                    err = btrfs_check_node(fs_info, buf);
>> +            else
>> +                    err = btrfs_check_leaf_full(fs_info, buf);
>> +            if (err < 0)
>> +                    return err;
>>              write_extent_buffer(buf, result, 0, csum_size);
>>      }
>>  
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to