On 10 August 2018 at 07:09, Janne Johansson <icepic...@gmail.com> wrote:
> As opposed to the previous setup, this will add some inter-host traffic
> aswell, each write to the primary PG will then in turn cause that host to
> replicate it
> again over the network to X other hosts to form the required amount of
> replicas, so where raid/zfs/lvm did the copying internally on the same host
> it will now go once or twice over the net. Not a huge problem, but worth
> noticing. When repairs happen, like if a host dies, the amount of
> traffic will be quite large, given the 80-130T described above.
Yeah, we've been doing the inter-host replication effectively "by hand."
It's time intensive and complex (trying to sort out best-fit on blocks of
data of a dozen or more TB). I'm counting on Ceph to take over that work,
and save us storage space in the process (3x cross chassis copies is far
better than 3x on-chassis copies times 2x cross-chassis copies). Right now
the network takes that hit in large concentrated chunks as we sync data
from chassis to chassis.. moving to something like Ceph will smooth that
> I still think ceph can be a decent solution for your problem, but it would
> be easier to make rolling maintenance on a cluster if the loss is smaller
> when a
> host is gone if the cluster is made up of many smaller hosts. So, while
> the above situation is better than what you came from, it would not be an
> ceph setup, but then again, who has an optimal setup anyhow. Everyone
> wants to fix some part of the cluster always, if money and time was endless.
Yep. Initially we'll still have to make sure we've got at least the
largest file server worth of space free in the cluster to cover for a
hardware failure.. but that situation will improve as we swap out the big
file servers for more smaller ones. The interim period will require a lot
of careful attention, but in a few years it gets us to a much better space.
Thanks very much for your help!
ceph-users mailing list