Hi list,
I am thinking about the possibility to add some primitive in CRUSH to meet
the following user stories:
A. "Same host", "Same rack"
To balance between availability and performance ,one may like such a
rule: 3 Replicas, Replica 1 and Replica 2 should in the same rack while Replica
3 reside in another rack.This is common because a typical d eployment in
datacenter usually has much fewer uplink bandwidth than backbone bandwidth.
More aggressive guys may even want same host, since the most common failure is
disk failure. And we have several disk (also means several OSDs) reside in the
same physical machine. If we can place Replica 1 & 2 on the same host but
replica 3 in somewhere else.It will not only reduce replication traffic but
also saving a lot of time & bandwidth when disk failure happened and a recovery
take place.
B."local"
Although we cannot mount RBD volumes to where a OSD running at, but
QEMU canbe used. This scenarios is really common in cloud computing. We have a
large amount of compute-nodes, just plug in some disks and make the
machines reused for Ceph cluster. To reduce network traffic and latency , if it
is possible to have some placement-group-maybe 3 PG for a compute-node. Define
the rules like: primary copy of the PG should (if possible) reside in
localhost, the second replica should go different places
By doing this , a significant amount of network bandwidth & a RTT can
be saved. What's more ,since read always go to primary, it will benefit a lot
from such mechanism.
It looks to me that A is simpler but B seems much complex. Hoping for inputs.
Xiaoxi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html