On 2019/9/4 下午6:53, Nikolay Borisov wrote:
>
>
> On 4.09.19 г. 11:30 ч., Qu Wenruo wrote:
>>
>>
>> On 2019/9/4 下午3:37, Nikolay Borisov wrote:
>>>
>>>
>>> On 3.09.19 г. 11:24 ч., Qu Wenruo wrote:
>>>> Before this patch, repair_imode_common() can only handle two types of
>>>> inodes:
>>>> - Free space cache inodes
>>>> - ROOT DIR inodes
>>>>
>>>> For inodes in subvolume trees, the problem is how to determine the
>>>> correct imode, thus it was not implemented.
>>>>
>>>> However there are more reports of incorrect imode in subvolume trees, we
>>>> need to support such fix.
>>>>
>>>> So this patch adds a new function, detect_imode(), to detect (or call it
>>>> educated guess) imode for inodes in subvolume trees.
>>>>
>>>> Signed-off-by: Qu Wenruo <w...@suse.com>
>>>> ---
>>>> check/mode-common.c | 96 +++++++++++++++++++++++++++++++++++++++------
>>>> 1 file changed, 83 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/check/mode-common.c b/check/mode-common.c
>>>> index 195b6efaa7aa..807d7daf98a6 100644
>>>> --- a/check/mode-common.c
>>>> +++ b/check/mode-common.c
>>>> @@ -836,6 +836,80 @@ int reset_imode(struct btrfs_trans_handle *trans,
>>>> struct btrfs_root *root,
>>>> return ret;
>>>> }
>>>>
>>>> +static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
>>>> + u32 *imode_ret)
>>>> +{
>>>> + struct btrfs_key key;
>>>> + struct btrfs_inode_item *iitem;
>>>> + const u32 priv = 0700;
>>>> + u64 ino;
>>>> + u32 imode = S_IFREG;
>>>> + int ret = 0;
>>>> +
>>>> + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>>>> + ino = key.objectid;
>>>> + iitem = btrfs_item_ptr(path->nodes[0], path->slots[0],
>>>> + struct btrfs_inode_item);
>>>> +
>>>> + /*
>>>> + * Both CHR and BLK uses rdev, no way to distinguish them, so fall back
>>>> + * to BLK.
>>>> + */
>>>> + if (btrfs_inode_rdev(path->nodes[0], iitem) != 0) {
>>>> + imode = S_IFBLK;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + /* root inode */
>>>> + if (ino == BTRFS_FIRST_FREE_OBJECTID) {
>>>> + imode = S_IFDIR;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + while (1) {
>>>> + ret = btrfs_next_item(root, path);
>>>> + if (ret > 0) {
>>>> + /* No determining result found, falls back to REG */
>>>> + ret = 0;
>>>> + goto out;
>>>> + }
>>>> + if (ret < 0)
>>>> + goto out;
>>>> + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>>>> + if (key.objectid != ino)
>>>> + goto out;
>>>> +
>>>> + /*
>>>> + * We ignore some types to make life easier:
>>>> + * - INODE_REF
>>>> + * We need to do a full tree search, which can fail for
>>>> + * corrupted fs, but not worthy compared to other easier
>>>> + * to determine types.
>>>> + * - XATTR
>>>> + * Both REG and DIR can have xattr, so not useful
>>>> + */
>>>> + switch (key.type) {
>>>> + case BTRFS_DIR_ITEM_KEY:
>>>> + case BTRFS_DIR_INDEX_KEY:
>>>> + imode = S_IFDIR;
>>>> + goto out;
>>>> + case BTRFS_EXTENT_DATA_KEY:
>>>> + /*
>>>> + * Both REG and LINK could have EXTENT_DATA.
>>>> + * We just fall back to REG as user can inspect the
>>>> + * content.
>>>> + */
>>>> + imode = S_IFREG;
>>>> + goto out;
>>>> + }
>>>> + }
>>>> +
>>>> +out:
>>>> + /* Set default value even when something wrong happened */
>>>> + *imode_ret = (imode | priv);
>>>> + return ret;
>>>> +}
>>>> +
>>>> /*
>>>> * Reset the inode mode of the inode specified by @path.
>>>> *
>>>> @@ -852,22 +926,18 @@ int repair_imode_common(struct btrfs_root *root,
>>>> struct btrfs_path *path)
>>>> u32 imode;
>>>> int ret;
>>>>
>>>> - if (root->root_key.objectid != BTRFS_ROOT_TREE_OBJECTID) {
>>>> - error(
>>>> - "repair inode mode outside of root tree is not supported yet");
>>>> - return -ENOTTY;
>>>> - }
>>>> btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>>>> ASSERT(key.type == BTRFS_INODE_ITEM_KEY);
>>>> - if (key.objectid != BTRFS_ROOT_TREE_DIR_OBJECTID &&
>>>> - !is_fstree(key.objectid)) {
>>>> - error("unsupported ino %llu", key.objectid);
>>>> - return -ENOTTY;
>>>> + if (root->objectid == BTRFS_ROOT_TREE_OBJECTID) {
>>>> + /* In root tree we only have two possible imode */
>>>> + if (key.objectid == BTRFS_ROOT_TREE_OBJECTID)
>>>> + imode = S_IFDIR | 0755;
>>>> + else
>>>> + imode = S_IFREG | 0600;
>>>> + } else {
>>>> + detect_imode(root, path, &imode);
>>>> + /* Ignore error returned, just use the default value returned */
>>>
>>> Is this safe enough though?
>>
>> It depends. But I'd say if we failed in detect_imode(), then it doesn't
>> matter whatever the type we're putting here.
>>
>>> What if due to an error a directory is
>>> corrected to be file?
>>
>> If a failure happens, it means btrfs-progs fails to read the next leaf.
>> Then it really doesn't make much sense whatever the type is.
>>
>> But I get your point, indeed we should error out without trying out to
>> fix the inode.
>>
>> I'll change this behavior in next version.
>
> IMO the goal of btrfs-progs should be to make a broken filesystem better
> and not replace one type of breakage with potentially another. In such
> cases we ought to break out gracefully.
Totally agreed.
BTW, I'll also implement the INODE_REF lookup method to provide the most
accurate way to recover.
The planned recovery would be:
- INODE_REF lookup
If everything goes fine, that's it and call it a day.
If it failed, try the rest.
- INODE rdev lookup
For this case, either BLK or CHR, but it doesn't really matter nor
would cause problem.
- DIR_INDEX/ITEM lookup
- EXTENT_DATA lookup
Both REG and LNK would be possible but falls back to REG won't cause
further problem.
Thanks,
Qu
>
>>
>> Thanks,
>> Qu
>>
>>> Let's not forget the context we are oprating here
>>> - a broken fs so it's possible (if not likely) that any of the search
>>> functions inside detect_imode could return negative error value. OTOH
>>> I'd expect the transaction commit to fail if that's the case e.g. faulty
>>> device.
>>>
>>>> }
>>>> - if (key.objectid == BTRFS_ROOT_TREE_DIR_OBJECTID)
>>>> - imode = 040755;
>>>> - else
>>>> - imode = 0100600;
>>>>
>>>> trans = btrfs_start_transaction(root, 1);
>>>> if (IS_ERR(trans)) {
>>>>
>>