Re: BTRFS balance segfault, where to go from here

Duncan Mon, 27 Oct 2014 17:08:38 -0700

Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted:

> On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan...@gmx.com> wrote:
>> 
>> My question is where to go from here? What I going to do right now is
>> to copy the most important data to another separated XFS drive.
>> What I planning to do is:
>> 
>> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing.
> 
> Definitely upgrade the kernel and see how that goes, there's been many
> many changes since 3.13. I would upgrade the user space tools also but
> that's not as important.


Just emphasizing...

Because btrfs is still under heavy development and not yet fully stable, 
keeping particularly the kernel updated is vital, because running an old 
kernel often means running a kernel with known btrfs bugs, fixed in newer 
kernels.

The userspace isn't quite as important since under normal operation it 
mostly simply tells the kernel what operations to perform, and an older 
userspace simply means you might be missing newer features.  However, 
commands such as btrfs check (the old btrfsck) and btrfs restore work 
from userspace, so having a current btrfs-progs is important when you run 
into trouble and you're trying to fix things.

That said, a couple of recent kernels has known issues.  Don't use the 
3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16 
series.  3.17 introduced another bug, with the fix hopefully in 3.17.2 
(it didn't make 3.17.1) and in 3.18-rcs.

So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git 
kernel, is what I'd recommend.  The other alternative if you're really 
conservative is the latest long-term stable series kernel, 3.14.x, as it 
gets critical bugfixes as well, tho it won't be quite as current as 
3.16.x or 3.18-rc.  But anything older than the latest 3.14.x stable 
series is old and outdated in btrfs terms, and is thus not recommended.  
And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are 
blackout versions due to known btrfs bugs.  Avoid them.

Of course with btrfs still not fully stable, the usual sysadmin rule of 
thumb that if you don't have a tested backup you don't have a backup, and 
if you don't have a backup, by definition you don't care if you lose the 
data, applies more than ever.  If you're on not-yet-fully-stable btrfs 
and you don't have backups, by definition you don't care if you lose that 
data.  There's people having to learn that the hard way, tho btrfs 
restore can often recover at least some of what would otherwise be lost.

> FYI you can mount with skip_balance mount option to inhibit resuming
> balance, sometimes pausing the balance isn't fast enough when there are
> balance problems.

=:^)

>> Could someone please also explain that how is exactly the raid10 setup
>> works with ODD number of drives with btrfs?
>> Raid10 should be a stripe of mirrors. Now then this sdf drive is
>> mirrored or striped or what?
> 
> I have no idea honestly. Btrfs is very tolerant of adding odd number and
> sizes of devices, but things get a bit nutty in actual operation
> sometimes.

In btrfs, raid1, including the raid1 side of raid10, is defined as 
exactly two copies of the data, one on each of two different devices.  
These copies are allocated by chunk size, 1 GiB size for data, quarter 
GiB size for metadata, and chunks are normally allocated on the device 
with the most unallocated space available, provided the other constraints 
(such as don't but both copies on the same device) are met.

Btrfs raid0 stripes will be as wide as possible, but again are allocated 
a chunk at a time, in sub-chunk-size strips.

While I've not run btrfs raid10 personally and thus (as a sysadmin not a 
dev) can't say for sure, what this implies to me is that, assuming equal 
sized devices, an odd number of devices in raid10 will alternate skipping 
one device at each chunk allocation.

So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs 
will allocate chunks from four at once, two mirrors, two stripes, with 
the fifth one unused for that chunk allocation.  However, at the next 
chunk allocation, the device skipped in the previous allocation will now 
have the most free space and will thus get the first allocation, with the 
one of the other four devices skipped in that allocation round.  After 
five allocation rounds (assuming all allocation rounds were 1 GiB data 
chunks, not quarter-GiB metadata), usage should thus be balanced across 
all five devices.

Of course with six same-size devices, because btrfs raid1 does exactly 
two copies, no more, each stripe will be three devices wide.


As for the dataloss question, unlike say raid56 mode which is known to be 
effectively little more than expensive raid0 at this point, raid10 should 
be as reliable as raid1, etc.  But I'd refer again to that sysadmin's 
rule of thumb above.  If you don't have tested backups, you don't have 
backups, and if you don't have backups, the data is by definition not 
valuable enough to be worth the hassle of backing it up; the calculated 
risk cost of data loss is lower than the given time required to make, 
test and keep current the backups.  After that, it's your decision 
whether you value that data more than the time required to make and 
maintain those backups, or not, given the risk factor including the fact 
that btrfs is still under heavy development and is not yet fully stable.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS balance segfault, where to go from here

Reply via email to