Thanks! On Mon, Dec 23, 2019 at 1:45 PM Chris Murphy <li...@colorremedies.com> wrote:
> On Sun, Dec 22, 2019 at 12:52 AM Javier Perez <pepeb...@gmail.com> wrote: > > > > Hi > > My home partition is on a 2T HDD using btrfs > > > > I am reading the material at > > http://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices > > but still I am not that clear on some items. > > > > If I want to to add a second 2T drive to work as a mirror (RAID1) it > looks like I do not have to invoke mdadm or anything similar, it seems like > btrfs will handle it all internally. Am I understanding this right? > > Correct. > > > > > Also, before I add a new device, do I have to partition the drive or > does btrfs take over all these duties (partitioning, formating) when it > adds the device to the filesystem? > > Partitioning is optional. Drives I dedicate for one task only, I do > not partition. If I use them for other things, or might use them for > other things, then I partition them. > > The add command formats the new device and resizes the file system: > # btrfs device add /dev/sdX /mountpoint > > The balance command with a convert filter changes the profile for > specified block groups, and does replication: > # btrfs balance start -dconvert=raid1 -mconvert=raid1 /mountpoint > > > > What has been the experience like with such a system? > > Gotcha 1: applies to mdadm and LVM raid as well as Btrfs, is that it's > really common for mismatching drive SCT ERC and kernel SCSI block > command timer. That is, there is a drive error timeout and a kernel > block device error timeout. The drive's timeout must be less than the > kernel, or valuable information is lost that prevents self-healing, > allows bad sectors to accumulate, and eventually there will be data > loss. The thing is, the defaults are often wrong: consumer hard drives > often have very long SCT ERC, typically it's disabled, making for > really impressive timeouts in excess of 1 minute (some suggest it can > be 2 or 3 minutes), whereas the kernel command timeout is 30 seconds. > Ideally, use 'smartctl -l scterc' to set the SCT ERC to something like > 7 seconds, this can also be set using a udev rule pointed to the > device by-id using serial number or wwn. You want the drive firmware > to give up on read errors quickly, that way it reports the bad > sector's LBA to the kernel, which in turn can find a good copy (raid1, > 5, 6 or DUP profiles on Btrfs) and overwrite the bad sector thereby > fixing it. If the drive doesn't support SCT ERC, then you'll need to > increase the kernel's command timer. This is a kernel setting, but it > is per block device. And raise the value to something fairly > incredible, like 180 seconds. That means worst case scenario, a > marginally bad sector results in possibly a 3 minute hang until the > drive gives up, and reports a read error - and then it gets fixed up. > > It seems esoteric, but really it's pernicious and common in the data > loss cases reported on linux-raid@ where they have the most experience > with RAID. But it applies the same to Btrfs. > > More info here: > https://raid.wiki.kernel.org/index.php/Timeout_Mismatch > > Gotcha 2, 3, 4: Device failures mean multiple gotchas all at once, so > you kinda need a plan how to deal with this so you aren't freaking out > if it happens. Panic often leads to user induced data loss. If in > doubt, you are best off doing nothing and asking. Both linux-btrfs@ > list and #btrfs on IRC freenode.net are approachable for this. > > Gotcha: If a device dies, you're not likely to see any indication of > failure unless you're looking at kernel messages, and see a ton of > Btrfs complaints. Like, several scary red warnings *per* lost write. > If a drive dies, there will quickly be thousands of these. Whether you > do or don't notice this, the next time you reboot... > > Gotcha: By default, Btrfs fails to mount if it can't find all devices. > This is because there are consequences to degraded operation, and it > requires user interaction to make sure its all resolved. But because > such mounts fail, there's a udev rule to wait for all Btrfs member > devices, that way small delays between multiple devices appearing, > don't result in failed mounts. But there's no timeout for this udev > rule, near as I can tell: > > This is the rule > /usr/lib/udev/rules.d/64-btrfs.rules > > So now you're stuck in this startup hang. > > If it's just a case of the device accidentally missing, it's safe to > reconnect it, and then startup will proceed normally. > > Otherwise, you need a way to get unstuck. > > I'm improvising here, but what you want to do is remove the suspect > drive, (temporarily) disable this udev rule, so that it *will* try to > mount /home, and also you could change the fstab to add the "degraded" > option so that the mount attempt won't fail. Now at least you can boot > and work while degraded until you get a chance to really fix the > problem. A degraded /home operation isn't any more risky than a single > device /home - the consequences really are all in making sure it's put > back together correctly. > > Ok so how to do all that? Either boot off a Live CD, inhibit the udev > rule, change fstab. Or you could boot your system with > rd.break=cmdline, mount root file system to /sysroot and make these > changes. Before rebooting, use 'btrfs filesystem show' to identify > which drive btrfs thinks is missing/bad and remove it. > > You can use 'btrfs replace' or 'btrfs dev add' followed by 'btrfs dev > rem missing'; the first is preferred but you need to read all the man > pages about both methods so you're aware of whether or not you need to > do an fs resize; and use 'btrfs fi us /mount' to check usage for any > block groups that are not raid1. During degraded write, it's possible > some single copy data block groups are created, those need to be > manually converted to raid1 (yes you can have mixed replication levels > on btrfs). > > And in the case where some degraded writes happen, and you get the > missing device reconnected, you'll use 'btrfs scrub' to get those > degraded writes replicated to the formerly missing device. That's not > automatic either. > > A couple more gotchas to be aware of, which might be less bad with the > latest kernels, but without testing for it I wouldn't assume they're > fixed: > > https://btrfs.wiki.kernel.org/index.php/Gotchas#raid1_volumes_only_mountable_once_RW_if_degraded > > https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of_devices > > Otherwise, btrfs raid1 is stable on stable hardware. It automatically > self heals if it finds problems during normal operation. And also > heals during scrubs. The gotchas only start if there's some kind of > problem. And then the challenge is to understand the exact nature of > the problem before taking action. Same issue with mdadm, and LVM raids > - just different gotchas and commands. > > > > -- > Chris Murphy > -- ------------------------------ /\_/\ |O O| pepeb...@gmail.com ~~~~ Javier Perez ~~~~ While the night runs ~~~~ toward the day... m m Pepebuho watches from his high perch.
_______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org