Re: Adding a disk to existing BTRFS

Javier Perez Tue, 24 Dec 2019 11:41:27 -0800

Thanks!

On Mon, Dec 23, 2019 at 1:45 PM Chris Murphy <li...@colorremedies.com>
wrote:


> On Sun, Dec 22, 2019 at 12:52 AM Javier Perez <pepeb...@gmail.com> wrote:
> >
> > Hi
> > My home partition is on a 2T HDD using btrfs
> >
> > I am reading the material at
> > http://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
> > but still I am not that clear on some items.
> >
> > If I want to to add a second 2T drive to work as a mirror (RAID1) it
> looks like I do not have to invoke mdadm or anything similar, it seems like
> btrfs will handle it all internally. Am I understanding this right?
>
> Correct.
>
> >
> > Also, before I add a new device, do I have to partition the drive or
> does btrfs take over all these duties (partitioning, formating) when it
> adds the device to the filesystem?
>
> Partitioning is optional. Drives I dedicate for one task only, I do
> not partition. If I use them for other things, or might use them for
> other things, then I partition them.
>
> The add command formats the new device and resizes the file system:
> # btrfs device add /dev/sdX /mountpoint
>
> The balance command with a convert filter changes the profile for
> specified block groups, and does replication:
> # btrfs balance start -dconvert=raid1 -mconvert=raid1 /mountpoint
>
>
> > What has been the experience like with such a system?
>
> Gotcha 1: applies to mdadm and LVM raid as well as Btrfs, is that it's
> really common for mismatching drive SCT ERC and kernel SCSI block
> command timer. That is, there is a drive error timeout and a kernel
> block device error timeout. The drive's timeout must be less than the
> kernel, or valuable information is lost that prevents self-healing,
> allows bad sectors to accumulate, and eventually there will be data
> loss. The thing is, the defaults are often wrong: consumer hard drives
> often have very long SCT ERC, typically it's disabled, making for
> really impressive timeouts in excess of 1 minute (some suggest it can
> be 2 or 3 minutes), whereas the kernel command timeout is 30 seconds.
> Ideally, use 'smartctl -l scterc' to set the SCT ERC to something like
> 7 seconds, this can also be set using a udev rule pointed to the
> device by-id using serial number or wwn. You want the drive firmware
> to give up on read errors quickly, that way it reports the bad
> sector's LBA to the kernel, which in turn can find a good copy (raid1,
> 5, 6 or DUP profiles on Btrfs) and overwrite the bad sector thereby
> fixing it. If the drive doesn't support SCT ERC, then you'll need to
> increase the kernel's command timer. This is a kernel setting, but it
> is per block device. And raise the value to something fairly
> incredible, like 180 seconds. That means worst case scenario, a
> marginally bad sector results in possibly a 3 minute hang until the
> drive gives up, and reports a read error - and then it gets fixed up.
>
> It seems esoteric, but really it's pernicious and common in the data
> loss cases reported on linux-raid@ where they have the most experience
> with RAID. But it applies the same to Btrfs.
>
> More info here:
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
>
> Gotcha 2, 3, 4:  Device failures mean multiple gotchas all at once, so
> you kinda need a plan how to deal with this so you aren't freaking out
> if it happens. Panic often leads to user induced data loss. If in
> doubt, you are best off doing nothing and asking. Both linux-btrfs@
> list and #btrfs on IRC freenode.net are approachable for this.
>
> Gotcha: If a device dies, you're not likely to see any indication of
> failure unless you're looking at kernel messages, and see a ton of
> Btrfs complaints. Like, several scary red warnings *per* lost write.
> If a drive dies, there will quickly be thousands of these. Whether you
> do or don't notice this, the next time you reboot...
>
> Gotcha: By default, Btrfs fails to mount if it can't find all devices.
> This is because there are consequences to degraded operation, and it
> requires user interaction to make sure its all resolved. But because
> such mounts fail, there's a udev rule to wait for all Btrfs member
> devices, that way small delays between multiple devices appearing,
> don't result in failed mounts. But there's no timeout for this udev
> rule, near as I can tell:
>
> This is the rule
> /usr/lib/udev/rules.d/64-btrfs.rules
>
> So now you're stuck in this startup hang.
>
> If it's just a case of the device accidentally missing, it's safe to
> reconnect it, and then startup will proceed normally.
>
> Otherwise, you need a way to get unstuck.
>
> I'm improvising here, but what you want to do is remove the suspect
> drive, (temporarily) disable this udev rule, so that it *will* try to
> mount /home, and also you could change the fstab to add the "degraded"
> option so that the mount attempt won't fail. Now at least you can boot
> and work while degraded until you get a chance to really fix the
> problem. A degraded /home operation isn't any more risky than a single
> device /home - the consequences really are all in making sure it's put
> back together correctly.
>
> Ok so how to do all that? Either boot off a Live CD, inhibit the udev
> rule, change fstab. Or you could boot your system with
> rd.break=cmdline, mount root file system to /sysroot and make these
> changes. Before rebooting, use 'btrfs filesystem show' to identify
> which drive btrfs thinks is missing/bad and remove it.
>
> You can use 'btrfs replace' or 'btrfs dev add' followed by 'btrfs dev
> rem missing'; the first is preferred but you need to read all the man
> pages about both methods so you're aware of whether or not you need to
> do an fs resize; and use 'btrfs fi us /mount' to check usage for any
> block groups that are not raid1. During degraded write, it's possible
> some single copy data block groups are created, those need to be
> manually converted to raid1 (yes you can have mixed replication levels
> on btrfs).
>
> And in the case where some degraded writes happen, and you get the
> missing device reconnected, you'll use 'btrfs scrub' to get those
> degraded writes replicated to the formerly missing device. That's not
> automatic either.
>
> A couple more gotchas to be aware of, which might be less bad with the
> latest kernels, but without testing for it I wouldn't assume they're
> fixed:
>
> https://btrfs.wiki.kernel.org/index.php/Gotchas#raid1_volumes_only_mountable_once_RW_if_degraded
>
> https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of_devices
>
> Otherwise, btrfs raid1 is stable on stable hardware. It automatically
> self heals if it finds problems during normal operation. And also
> heals during scrubs. The gotchas only start if there's some kind of
> problem. And then the challenge is to understand the exact nature of
> the problem before taking action. Same issue with mdadm, and LVM raids
> - just different gotchas and commands.
>
>
>
> --
> Chris Murphy
>


-- 
------------------------------
 /\_/\
 |O O|  pepeb...@gmail.com
 ~~~~     Javier Perez
 ~~~~          While the night runs
 ~~~~          toward the day...
  m m       Pepebuho watches
                from his high perch.

_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org

Re: Adding a disk to existing BTRFS

Reply via email to