Re: Help interpreting RAID1 space allocation

Joel Johnson Sat, 24 Aug 2013 10:25:37 -0700

On Aug 23, 2013, Chris Murphy <li...@colorremedies.com> wrote:

When replacing a failed disk, I'd like btrfs to compare states betweentheavailable drives and know that it needs to catch up the newly addeddevice,
but this doesn't yet happen. It's necessary to call btrfs balance.

I can only test device replacement, not a readd. Upon 'btrfs devicedeletemissing /mnt' there's a delay, and it's doing a balance. I don't knowwhathappens for a readd, if the whole volume needs balancing or if it'sable to
just write the changes.

Similar to what Duncan described in his response, on a hot-remove(without doing the proper btrfs device delete), there is no opportunityfor a rebalance or metadata change on the pulled drives, so I wouldexpect there to be a signature of some sort for consistency checkingbefore readding it. At least, btrfs shouldn't add the readded deviceback as an active device when it's really still inconsistent and notbeing used, even if it indicates the same UUID.

And still further, somehow the data profile has reverted to single even
though the mkfs.btfs was raid1. [SNIP] That is a huge bug. I'll try tocome
up with some steps to reproduce.

If I create the file system, mount it, but I do not copy any data, uponaddingnew and deleting missing, the data profile is changed from raid1 tosingle. IfI've first copied data to the volume prior to device failure/missing,this
doesn't happen, it remains raid1.

And yet, the tools indicate that it is still raid1, even if internallyit reverts to single???

Based on my experience with this and Duncan's feedback, I'd like to seethe wiki have some warnings about dealing with multidevice filesystems,especially surrounding the degraded mount option. Specifically, itsounds like a reasonable practice with the current state is that after adevice is removed from a filesystem which receives any subsequentchanges, the removed device should be cleared (dd if=/dev/zero, at leastthe superblocks) before being readded in order to remove any ambiguityabout state. Adding such a note to the wiki now would communicate thepotential pitfalls (especially with the difficulty determining what thestate it by bugs below), and also allow updating once things areimproved in this area.


Looking again at the wiki Gotchas page, it does say
On a multi device btrfs filesystem, mistakingly re-adding
a block device that is already part of the btrfs fs with
btrfs device add results in an error, and brings btrfs in
an inconsistent state. In striping mode, this causes data
loss and kernel oops. The btrfs userland tools need to do
more checking to prevent these easy mistakes.

Which seems very related, however perhaps adding the workaround andclarifying that even hotplugged devices may be added and bitten by this.

Should I file a few bugs to capture the related issues? Here are thediscrete issues that seem to be present from a user point of view:

1. btrfs filesystem show - shouldn't list devices as present unlesssthey're in use and in a consistent state

If an explicit add is needed to add new device, don't auto-add, even fordevices previously part of the filesystem. Although I would claim thatauto-adding and being consistent is most desired (when it makes sensewith an existing signature, an empty drive has no indication on where tobe added), it should be all or nothing instead of showing the device asbeing added (or at least how I interpret it being present in the 'fishow' listing) but internally being untracked or inconsistent.

As Chris said, "There isn't a readd option in btrfs, which in mdparlance is used for readding a device previously part of an array."However, when I hotplugged the drive and it reappeared in the 'fi show'output, I assumed exactly the md semantics had occurred, with the drivehaving been readded and made consistent - it didn't take any time, but Ihadn't copied data yet and knew btrfs may only sync the used data andmetadata blocks.

In other words, I never ran a device add or remove, but still saw whatappeared to be consistent behavior.

2. data profile shouldn't revert to single if adding/deleting beforecopying data

3. degraded mount option should be "allow degraded if needed", allowingnon-degraded when it becomes available

Shouldn't force degraded, especially after adding (either manually orautomatically) sufficient devices to operate in non-degraded mode. Assoon as devices are added, a rebalance should be done to bring the newdevice(s) into consistent state.

This then drives the question, how does one check the degraded state ofa filesystem if not the mount flag. I (quite likely with an md-raidbias) expected to use the 'filesystem show' output, listed the devicesas well as a status flag of fully-consistent or rebalance in progress.If that's not the correct or intended location, then providedocumentation on how to properly check the consistency state anddegraded state of a filesystem.


Joel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help interpreting RAID1 space allocation

Reply via email to