On Aug 23, 2013, Chris Murphy <li...@colorremedies.com> wrote:
When replacing a failed disk, I'd like btrfs to compare states between the available drives and know that it needs to catch up the newly added device,
but this doesn't yet happen. It's necessary to call btrfs balance.

I can only test device replacement, not a readd. Upon 'btrfs device delete missing /mnt' there's a delay, and it's doing a balance. I don't know what happens for a readd, if the whole volume needs balancing or if it's able to
just write the changes.

Similar to what Duncan described in his response, on a hot-remove (without doing the proper btrfs device delete), there is no opportunity for a rebalance or metadata change on the pulled drives, so I would expect there to be a signature of some sort for consistency checking before readding it. At least, btrfs shouldn't add the readded device back as an active device when it's really still inconsistent and not being used, even if it indicates the same UUID.

And still further, somehow the data profile has reverted to single even
though the mkfs.btfs was raid1. [SNIP] That is a huge bug. I'll try to come
up with some steps to reproduce.

If I create the file system, mount it, but I do not copy any data, upon adding new and deleting missing, the data profile is changed from raid1 to single. If I've first copied data to the volume prior to device failure/missing, this
doesn't happen, it remains raid1.

And yet, the tools indicate that it is still raid1, even if internally it reverts to single???

Based on my experience with this and Duncan's feedback, I'd like to see the wiki have some warnings about dealing with multidevice filesystems, especially surrounding the degraded mount option. Specifically, it sounds like a reasonable practice with the current state is that after a device is removed from a filesystem which receives any subsequent changes, the removed device should be cleared (dd if=/dev/zero, at least the superblocks) before being readded in order to remove any ambiguity about state. Adding such a note to the wiki now would communicate the potential pitfalls (especially with the difficulty determining what the state it by bugs below), and also allow updating once things are improved in this area.

Looking again at the wiki Gotchas page, it does say
On a multi device btrfs filesystem, mistakingly re-adding
a block device that is already part of the btrfs fs with
btrfs device add results in an error, and brings btrfs in
an inconsistent state. In striping mode, this causes data
loss and kernel oops. The btrfs userland tools need to do
more checking to prevent these easy mistakes.

Which seems very related, however perhaps adding the workaround and clarifying that even hotplugged devices may be added and bitten by this.

Should I file a few bugs to capture the related issues? Here are the discrete issues that seem to be present from a user point of view:

1. btrfs filesystem show - shouldn't list devices as present unlesss they're in use and in a consistent state

If an explicit add is needed to add new device, don't auto-add, even for devices previously part of the filesystem. Although I would claim that auto-adding and being consistent is most desired (when it makes sense with an existing signature, an empty drive has no indication on where to be added), it should be all or nothing instead of showing the device as being added (or at least how I interpret it being present in the 'fi show' listing) but internally being untracked or inconsistent.

As Chris said, "There isn't a readd option in btrfs, which in md parlance is used for readding a device previously part of an array." However, when I hotplugged the drive and it reappeared in the 'fi show' output, I assumed exactly the md semantics had occurred, with the drive having been readded and made consistent - it didn't take any time, but I hadn't copied data yet and knew btrfs may only sync the used data and metadata blocks.

In other words, I never ran a device add or remove, but still saw what appeared to be consistent behavior.

2. data profile shouldn't revert to single if adding/deleting before copying data

3. degraded mount option should be "allow degraded if needed", allowing non-degraded when it becomes available

Shouldn't force degraded, especially after adding (either manually or automatically) sufficient devices to operate in non-degraded mode. As soon as devices are added, a rebalance should be done to bring the new device(s) into consistent state.

This then drives the question, how does one check the degraded state of a filesystem if not the mount flag. I (quite likely with an md-raid bias) expected to use the 'filesystem show' output, listed the devices as well as a status flag of fully-consistent or rebalance in progress. If that's not the correct or intended location, then provide documentation on how to properly check the consistency state and degraded state of a filesystem.

Joel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to