On 2019/3/11 下午8:37, Nikolay Borisov wrote:
>
>
> On 11.03.19 г. 14:35 ч., Qu Wenruo wrote:
>>
>>
>> On 2019/3/11 下午8:26, Nikolay Borisov wrote:
>>>
>>>
>>> On 11.03.19 г. 3:17 ч., Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2019/3/11 上午7:09, Chris Murphy wrote:
>>>>> In the case where superblock 0 at 65536 is valid but stale (older than
>>>>> the others):
>>>>
>>>> Then this means either the fs is fuzzed, or the FUA implementation of
>>>> the disk is completely screwed up.
>>>>
>>>> Btrfs kernel submit super blocks as the following sequence:
>>>> 1) wait all metadata write
>>>> 2) flush
>>>> 3) FUA the primary superblock
>>>
>>> SATA devices generally do not have FUA support. For example my evo 850
>>> ssds do not support it nor does my evo 860 PRO. IMO not having
>>> functioning FUA seems to be the norm rather than an exception.
>>
>> Kernel block layer will translate FUA to write + flush.
>
> Where exactly does this happen?
block/blk-flush.c
The comment part at the beginning:
* If the device has writeback cache and doesn't support FUA, REQ_PREFLUSH
* is translated to PREFLUSH and REQ_FUA to POSTFLUSH.
I need extra digging for exactly which line does this, but I think that
should explain the workflow fine.
Thanks,
Qu
>
>> So in that case we will do:
>>
>> 1) wait all metadata write
>> 2) flush
>> 3) write first sb, flush
>> 4) write backup sb
>>
>> For FUA -> write + flush, it's less atomic than native FUA, but it
>> should be good enough for pseudo-atomic.
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>>> 4) write the backup superblocks
>>>>
>>>> If backup is newer than primary, then the FUA write doesn't reach disk
>>>> before normal write.
>>>> This means any fs could be corrupted on that disk, not only btrfs.
>>>>
>>>>>
>>>>> 1. btrfs check doesn't complain, the stale super is used for the check
>>>>> 2. when mounting, super 0 is used, no complaints at mount time, fairly
>>>>> quickly the newer supers are overwritten
>>>>
>>>> The reason why kernel doesn't search backup roots is to avoid stale btrfs.
>>>> For case like mkfs.btrfs -> do btrfs write -> mkfs.xfs -> try mount as
>>>> btrfs again, this would cause problems.
>>>>
>>>> So IMHO always use the primary superblock is the designed behavior.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> Is this expected? In particular, in lieu of `btrfs rescue super`
>>>>> behavior which considers super 0 a bad super, and offers to fix it
>>>>> from the newer ones, and when I answer y, it replaces super 0 with
>>>>> newer information from the other supers.
>>>>>
>>>>> I think the `btrfs rescue` behavior is correct. I would expect that
>>>>> all the supers are read at mount time, and if there's discrepancy that
>>>>> either there's code to suspiciously sanity check the latest roots in
>>>>> the newest super, or it flat out fails to mount. Mounting based on
>>>>> stale super data seems risky doesn't it?
>>>>>
>>>>
>>