On 2021/4/5 下午11:18, Steven Davies wrote:
Kernel: 5.11.8 vanilla, btrfs-progs 5.11.1

I booted a box with a root btrfs raid1 across two devices,
/dev/nvme0n1p2 (devid 2) and /dev/sda2 (devid 3). For whatever reason
during the initrd stage, btrfs device scan was unable to see the NVMe
device and mounted the rootfs degraded after multiple retries as I had
designed in the init script.

It looks more like a problem in your initramfs environment.

The more possible cause is, your initramfs only has driver for SATA
disks, but no NVME modules.

You may try to include nvme module in your initramfs to see if that
solves the problem.

Thanks,
Qu


Once booted apparently the kernel was able to see nvme0n1p2 again (with
no intervention from me) and btrfs device usage / btrfs filesystem show
did not report any missing devices. btrfs scrub reported that devid 2
was unwriteable but the scrub completed successfully on devid 3 with no
errors. New block groups for data and metadata were being created as
single on devid 3.

I balanced with -dconvert=single -mconvert=dup which moved all block
groups to devid 3 and completed successfully; there was nothing
remaining on devid 2 so I removed the device from the filesystem and
re-added it as devid 4. Once I'd balanced the filesystem back to
-dconvert=raid1 -mconvert=raid1 everything was back to normal.

My main observation was that it was very hard to notice that there was
an issue. Yes, I'd purposefully mounted as degraded, but there was no
indication from the btrfs tools as to why new block groups were only
being created as single on one device: nothing was marked as missing or
unwriteable. Is this behavour expected? How can a device be unwriteable
but not marked as missing?

Was my course of action to correct the issue correct - is there a better
way to re-sync a raid1 device which has temporarily been removed?

(Afterwards I realised what caused the issue - missing libraries in the
initrd - and I can reproduce it if necessary.)

Reply via email to