Rich Freeman <[email protected]> writes:

> On Fri, Jan 1, 2016 at 5:42 AM, lee <[email protected]> wrote:
>> "Stefan G. Weichinger" <[email protected]> writes:
>>
>>> btrfs offers RAID-like redundancy as well, no mdadm involved here.
>>>
>>> The general recommendation now is to stay at level-1 for now. That fits
>>> your 2-disk-situation.
>>
>> Well, what shows better performance?  No btrfs-raid on hardware raid or
>> btrfs raid on JBOD?
>
> I would run btrfs on bare partitions and use btrfs's raid1
> capabilities.  You're almost certainly going to get better
> performance, and you get more data integrity features.

That would require me to set up software raid with mdadm as well, for
the swap partition.

> If you have a silent corruption with mdadm doing the raid1 then btrfs
> will happily warn you of your problem and you're going to have a
> really hard time fixing it,

BTW, what do you do when you have silent corruption on a swap partition?
Is that possible, or does swapping use its own checksums?

> [...]
>
>>>
>>> I would avoid converting and stuff.
>>>
>>> Why not try a fresh install on the new disks with btrfs?
>>
>> Why would I want to spend another year to get back to where I'm now?
>
> I wouldn't do a fresh install.  I'd just set up btrfs on the new disks
> and copy your data over (preserving attributes/etc).

That was the idea.

> I wouldn't do an in-place ext4->btrfs conversion.  I know that there
> were some regressions in that feature recently and I'm not sure where
> it stands right now.

That adds to the uncertainty of btrfs.


> [...]
>>
>> There you go, you end up with an odd setup.  I don't like /boot
>> partitions.  As well as swap partitions, they need to be on raid.  So
>> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/
>> perhaps ext4, /and/ multiple partitions.
>
> [...]
> There isn't really anything painful about that setup though.

It's still odd.  I already have two different file systems and the
overhead of one kind of software raid while I would rather stick to one
file system.  With btrfs, I'd still have two different file systems ---
plus mdadm and the overhead of three different kinds of software raid.

How would it be so much better to triple the software raids and to still
have the same number of file systems?

>> When you use hardware raid, it
>> can be disadvantageous compared to btrfs-raid --- and when you use it
>> anyway, things are suddenly much more straightforward because everything
>> is on raid to begin with.
>
> I'd stick with mdadm.  You're never going to run mixed
> btrfs/hardware-raid on a single drive,

A single disk doesn't make for a raid.

> and the only time I'd consider
> hardware raid is with a high quality raid card.  You'd still have to
> convince me not to use mdadm even if I had one of those lying around.

>From my own experience, I can tell you that mdadm already does have
significant overhead when you use a raid1 of two disks and a raid5 with
three disks.  This overhead may be somewhat due to the SATA controller
not being as capable as one would expect --- yet that doesn't matter
because one thing you're looking at, besides reliability, is the overall
performance.  And the overall performance very noticeably increased when
I migrated from mdadm raids to hardware raids, with the same disks and
the same hardware, except that the raid card was added.

And that was only 5 disks.  I also know that the performance with a ZFS
mirror with two disks was disappointingly poor.  Those disks aren't
exactly fast, but still.  I haven't tested yet if it changed after
adding 4 mirrored disks to the pool.  And I know that the performance of
another hardware raid5 with 6 disks was very good.

Thus I'm not convinced that software raid is the way to go.  I wish they
would make hardware ZFS (or btrfs, if it ever becomes reliable)
controllers.

Now consider:


+ candidates for hardware raid are two small disks (72GB each)
+ data on those is either mostly read, or temporary/cache-like
+ this setup works without any issues for over a year now
+ using btrfs would triple the software raids used
+ btrfs is uncertain, reliability questionable
+ mdadm would have to be added as another layer of complexity
+ the disks are SAS disks, genuinely made to be run in a hardware raid
+ the setup with hardware raid is straightforward and simple, the setup
  with btrfs is anything but


The relevant advantage of btrfs is being able to make snapshots.  Is
that worth all the (potential) trouble?  Snapshots are worthless when
the file system destroys them with the rest of the data.

> [...]
>> How's btrfs's performance when you use swap files instead of swap
>> partitions to avoid the need for mdadm?
>
> btrfs does not support swap files at present.

What happens when you try it?

> When it does you'll need to disable COW for them (using chattr)
> otherwise they'll be fragmented until your system grinds to a halt.  A
> swap file is about the worst case scenario for any COW filesystem -
> I'm not sure how ZFS handles them.

Well, then they need to make special provisions for swap files in btrfs
so that we can finally get rid of the swap partitions.


> [...]
>>> As mentioned here several times I am using btrfs on >6 of my systems for
>>> years now. And I don't look back so far.
>>
>> And has it always been reliable?
>>
>
> I've never had an episode that resulted in actual data loss.  I HAVE
> had an episode or two which resulted in downtime.
>
> When I've had btrfs issues I can generally mount the filesystem
> read-only just fine.  The problem was that cleanup threads were
> causing kernel BUGs which cause the filesystem to stop syncing (not a
> full panic, but when all your filesystems are effectively read-only
> there isn't much difference in many cases).  If I rebooted the system
> would BUG within a few minutes.  In one case I was able to boot from a
> more recent kernel on a rescue disk and fix things by just mounting
> the drive and letting it sit for 20min to finish cleaning things up
> while the disk was otherwise idle (some kind of locking issue most
> likely) - maybe I had to run btrfsck on it.  In the other case it was
> being really fussy and I ended up just restoring from a backup since
> that was the path of least resistance.  I could have probably
> eventually fixed the problem, and the drive was mountable read-only
> the entire time so given sufficient space I could have copied all the
> data over to a new filesystem with no loss at all.

That's exactly what I don't want to have to deal with.  It would defeat
the most important purpose of using raid.

> Things have been pretty quiet for the last six months though, and I
> think it is largely due to a change in strategy around kernel
> versions.  Right now I'm running 3.18.  I'm starting to consider a
> move to 4.1, but there is a backlog of btrfs fixes for stable that I'm
> waiting for Greg to catch up on and maybe I'll wait for a version
> after that to see if things settle down.  Around the time of
> 3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point
> I think newer kernels are more likely to introduce regressions than
> fix problems.  The pace of btrfs patching seems to have increased as
> well in the last year (which is good in the long-term - most are
> bugfixes - but in the short term even bugfixes can introduce bugs).
> Unless I have a reason not to at this point I plan to run only
> longterm kernels, and move to them when they're about six months
> mature.

That's another thing making it difficult to use btrfs.

> If I had done that in the past I think I would have completely avoided
> that issue that required me to restore from backups.  That happened in
> the 3.15/3.16 timeframe and I'd have never even run those kernels.
> They were stable kernels at the time, and a few versions in when I
> switched to them (I was probably just following gentoo-sources stable
> keywords back then), but they still had regressions (fixes were
> eventually backported).

How do you know if an old kernel you pick because you think the btrfs
part works well enough is the right pick?  You can either encounter a
bug that has been fixed or a regression that hasn't been
discovered/fixed yet.  That way, you can't win.

> I think btrfs is certainly usable today, though I'd be hesitant to run
> it on production servers depending on the use case (I'd be looking for
> a use case that actually has a significant benefit from using btrfs,
> and which somehow mitigates the risks).

There you go, it's usable, and the risk of using it is too high.

> Right now I keep a daily rsnapshot (rsync on steroids - it's in the
> Gentoo repo) backup of my btrfs filesystems on ext4.  I occasionally
> debate whether I still need it, but I sleep better knowing I have it.
> This is in addition to my daily duplicity cloud backups of my most
> important data (so, /etc and /home are in the cloud, and mythtv's
> /var/video is just on a local rsync backup).

I wouldn't give my data out of my hands.

> Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm
> raid5/6 is fine, but you lose the data integrity features).  I
> wouldn't go anywhere near that for at least a year, and probably
> longer.

It might take another 5 or 10 years before btrfs isn't questionable
anymore, if it ever gets there.

> Overall I'm very happy with btrfs though.  Snapshots and reflinks are
> very handy - I can update containers and nfs roots after snapshotting
> them and it gives me a trivial rollback solution, and while I don't
> use snapper I do manually rotate through snapshots weekly.  If you do
> run snapper I'd probably avoid generating large numbers of snapshots -
> one of my BUG problems happened as a result of snapper deleting a few
> hundred snapshots at once.

Snapper?  I've never heard of that ...

> Btrfs's deferred processing of the log/btrees can cause the kinds of
> performance issues associated with garbage collection (or BUGs due to
> thundering herd problems).  I use ionice to try to prioritize my IO so
> that stuff like mythtv recordings will block less realtime activities,
> and in the past that hasn't always worked with btrfs.  The problem is
> that btrfs would accept too much data into its log, and then it would
> block all writes while it tried to catch up.  I haven't seen that as
> much recently, so maybe they're getting better about that.  As with
> any other scheduling problem it only works if you correctly block
> writes into the start of the pipeline (I've heard of similar problems
> with TCP QoS and such if you don't ensure that the bottleneck is the
> first router along the route - you can let in too much low-priority
> traffic and then at that point you're stuck dealing with it).

Queuing up the data when there's more data than the system can deal with
only works when the system has sufficient time to catch up with the
queue.  Otherwise, you have to block something at some point, or you
must drop the data.  At that point, it doesn't matter how you arrange
the contents of the queue within it.

> I'd suggest looking at the btrfs mailing list to get a survey for what
> people are dealing with.  Just ignore all the threads marked as
> patches and look at the discussion threads.
>
> If you're getting the impression that btrfs isn't quite
> fire-and-forget, you're getting the right impression.  Neither is
> Gentoo, so I wouldn't let that alone scare you off.  But, I see no
> reason to not give you fair warning.

Gentoo /is/ fire-and-forget in that it works fine.  Btrfs is not in that
it may work or not.

Reply via email to