On Thu, Feb 18, 2021 at 6:12 PM Daniel Dawson <danielcdaw...@gmail.com> wrote:
>
> On 2/18/21 3:57 PM, Chris Murphy wrote:
> > metadata raid6 as well?
>
> Yes.

Once everything else is figured out, you should consider converting
metadata to raid1c3.

https://lore.kernel.org/linux-btrfs/20200627032414.gx10...@hungrycats.org/


> > What replacement command(s) are you using?
>
> For this drive, it was "btrfs replace start -r 3 /dev/sda3 /"

OK replace is good.


> > Do a RAM test for as long as you can tolerate it, or it finds the
> > defect. Sometimes they show up quickly, other times days.
> I didn't think of a flipped bit. Thanks.
> >>         devid    0 size 457.64GiB used 39.53GiB path /dev/sdc3
> >>         devid    1 size 457.64GiB used 39.56GiB path /dev/sda3
> >>         devid    2 size 457.64GiB used 39.56GiB path /dev/sdb3
> >>         devid    4 size 457.64GiB used 39.53GiB path /dev/sdd3
> >
> > This is confusing. devid 3 is claimed to be missing, but fi show isn't
> > showing any missing devices. If none of sd[abcd] are devid 3, then
> > what dev node is devid 3 and where is it?
> It looks to me like btrfs is temporarily assigning devid 0 to the new
> device being used as a replacement.That is what I observed before; once
> the replace operation was complete, it went back to the normal number.
> Since the replacement didn't finish this time, sdc3 is still devid 0.

The new replacement is devid 0 during the replacement. The drive being
replaced keeps its devid until the end, and then there's a switch,
that device is removed, and the signature on the old drive is wiped.
Sooo.... something is still wrong with the above because there's no
devid 3, there's kernel and btrfs check messages saying devid 3 is
missing.

It doesn't seem likely that /dev/sdc3 is devid 3 because it can't be
both missing and be the mounted dev node.

>[  202.676601] BTRFS warning (device sdc3): devid 3 uuid 
>911a642e-0a4c-4483-9a1f-cde7b87c5519 is missing

Try a reboot, and use blkid to check you've got all devices + 1 (the
new one that failed replacement). Verify all supers as well with
'btrfs rescue super-recover -v' and that it all correlates with 'btrfs
filesystem show' as well.

What should be true is the replace will resume upon being normally
mounted. But for that to happen, all the drives + 1 must be available.

If a tree log is damaged and prevents mount then, you need to make a
calculation. You can try to mount with ro,nologreplay and freshen
backups for anything you'd rather not lose - just in case things get
worse. And then you can zero the log and see if that'll let you
normally mount the device (i.e. rw and not degraded). But some of it
will depend on what's wrong.



-- 
Chris Murphy

Reply via email to