Re: RAID-1 refuses to balance large drive

Chris Murphy Wed, 23 Mar 2016 11:34:33 -0700

On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton <brad...@gmail.com> wrote:
> Thanks for assist.  To reiterate what I said in private:
>
> a) I am fairly sure I swapped drives by adding the 6TB drive and then
> removing the 2TB drive, which would not have made the 6TB think it was
> only 2TB.    The btrfs statistics commands have shown from the beginning
> the size of the device as 6TB, and that after the remove, it haad 4TB
> unallocated.


I agree this seems to be consistent with what's been reported.


>
> So I am looking for other options, or if people have commands I might
> execute to diagnose this (as it seems to be a flaw in balance) let me know.

What version of btrfs-progs is this? I'm vaguely curious what 'btrfs
check' reports (without --repair). Any version is OK but it's better
to use something fairly recent since the check code continues to
change a lot.

Another thing you could try is a newer kernel. Maybe there's a related
bug in 4.2.0. I think it may be more likely this is just an edge case
bug that's always been there, but it's valuable to know if recent
kernels exhibit the problem.

And before proceeding with a change in layout (converting to another
profile) I suggest taking an image of the metadata with btrfs-image,
it might come in handy for a developer.



>
> Some options remaining open to me:
>
> a) I could re-add the 2TB device, which is still there.  Then balance
> again, which hopefully would move a lot of stuff.   Then remove it again
> and hopefully the new stuff would distribute mostly to the large drive.
>  Then I could try balance again.

Yeah, to do this will require -f to wipe the signature info from that
drive when you add it. But I don't think this is a case of needing
more free space, I think it might be due to the odd number of drives
that are also fairly different in size.

But then what happens when you delete the 2TB drive after the balance?
Do you end up right back in this same situation?



>
> b) It was suggested I could (with a good backup) convert the drive to
> non-RAID1 to free up tons of space and then re-convert.  What's the
> precise procedure for that?  Perhaps I can do it with a limit to see how
> it works as an experiment?   Any way to specifically target the blocks
> that have their two copies on the 2 smaller drives for conversion?

btrfs balance -dconvert=single -mconvert=single -f   ## you have to
use -f to force reduction in redundancy
btrfs balance -dconvert=raid1 -mconvert=raid1

There is the devid= filter but I'm not sure of the consequences of
limiting the conversion to two of three devices, that's kinda
confusing and is sufficiently an edge case I wonder how many bugs
you're looking to find today? :-)



> c) Finally, I could take a full-full backup (my normal backups don't
> bother with cached stuff and certain other things that you can recover)
> and take the system down for a while to just wipe and restore the
> volumes.  That doesn't find the bug, however.

I'd have the full backup no matter what choice you make. At any time
for any reason any filesystem can face plant without warning.

But yes this should definitely work or else you've definitely found a
bug. Finding the bug in your current scenario is harder because the
history of this volume makes it really non-deterministic whereas if
you start with a 3 disk volume at mkfs time, and then you reproduce
this problem, for sure it's a bug. And fairly straightforward to
reproduce.

I still recommend a newer kernel and progs though, just because
there's no work being done on 4.2 anymore. I suggest 4.4.6 and 4.4.1
progs. And then if you reproduce it, it's not just a bug, it's a
current bug.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID-1 refuses to balance large drive

Reply via email to