Rich Freeman <rich0 <at> gentoo.org> writes:

> >> You can turn off COW and go single on btrfs to speed it up but bugs in
> >> ceph and btrfs lose data real fast!

> So, btrfs and ceph solve an overlapping set of problems in an
> overlapping set of ways.  In general adding data security often comes
> at the cost of performance, and obviously adding it at multiple layers
> can come at the cost of additional performance.  I think the right
> solution is going to depend on the circumstances.

Raid 1 with btrfs can not only protect the ceph fs files but the gentoo
node installation itself.  I'm not so worried about proformance, because
my main (end result) goal is to throttle codes so they run almost
exclusively in ram (in memory) as design by amplabs. Spark plus Tachyon is a
work in progress, for sure.  The DFS will be used in lieu of HDFS for
distributed/cluster types of apps, hence ceph.  Btrfs + raid 1 is as
a failsafe for the node installations, but also all data. I only intend
to write out data, once a job/run is finished; but granted that is very
experimental right now and will evolve over time.


> 
> if ceph provided that protection against bitrot I'd probably avoid a
> COW filesystem entirely.  It isn't going to add any additional value,
> and they do have a performance cost.  If I had mirroring at the ceph
> level I'd probably just run them on ext4 on lvm with no
> mdadm/btrfs/whatever below that.  Availability is already ensured by
> ceph - if you lose a drive then other nodes will pick up the load.  If
> I didn't have robust mirroring at the ceph level then having mirroring
> of some kind at the individual node level would improve availability.

I've read that btrfs and ceph are a very, suitable, yet very immature
match for local-distributed file system needs.


> On the other hand, ceph currently has some gaps, so having it on top
> of zfs/btrfs could provide protection against bitrot.  However, right
> now there is no way to turn off COW while leaving checksumming
> enabled.  It would be nice if you could leave the checksumming on.
> Then if there was bitrot btrfs would just return an error when you
> tried to read the file, and then ceph would handle it like any other
> disk error and use a mirrored copy on another node.  The problem with
> ceph+ext4 is that if there is bitrot neither layer will detect it.

Good points, hence a flexible configuration where ceph can be reconfigured
and recovered as warranted, for this long term set of experiments.

> Does btrfs+ceph really have a performance hit that is larger than
> btrfs without ceph?  I fully expect it to be slower than ext4+ceph.
> Btrfs in general performs fairly poorly right now - that is expected
> to improve in the future, but I doubt that it will ever outperform
> ext4 other than for specific operations that benefit from it (like
> reflink copies).  It will always be faster to just overwrite one block
> in the middle of a file than to write the block out to unallocated
> space and update all the metadata.

I fully expect the combination of btrfs+ceph to mature and become
competitive. It's not critical data, but a long term experiment. Surely
critical data will be backed up off the 3-node cluster. I hope to use
ansible to enable recovery, configuration changes and bringing on and
managing additional nodes; this a concept at the moment, but googling around
it does seem to be a popular idea.

As always your insight and advice is warmly received.


James


> 





Reply via email to