On Tue, Apr 1, 2014 at 7:34 PM, Shang Wu <[email protected]> wrote: > Hi all, > > I have some questions about the Ceph multi-site implementation. > > I am thinking to have Ceph as the storage solution for across three internal > site. I think, with a good internet connection, using the Multi-site object > storage with RADOS (or RGW) might be a good use here. Thus, each site will > have a MON node and many OSDs and replicate data between each other. With > this implementation, I hope it will allow user to READ/WRITE from/to the > local office and Ceph will take care the replication. > > So my question is: > > 1. How does Ceph know how to retrieve data from the nearest location? (As > Ceph usually calculate where the data is through CRUSH rather than the > nearest location for the user.) Will the data be distributed evenly > throughout the three sites? If not, how can we let user to access the _local > copy_ ?
The idea is to keep separate ceph clusters, one for each zone. You'll have rgw running for each zone, configured to contact the local ceph cluster. > 2. Is " Multi-site object storage with RADOS" a good fit for their > implementation? i.e. to READ/Write data To/From their local site? If not, > what is the best way to approach this? It's definitely one approach. I'm not sure I know enough about the requirements to say whether it's a good fit. Specifically with this approach the replication is not bi-directional within each region so there are some limitations. > 3. Does Ceph use the same ID (object name?) for all its replica? Can we > access(read/write) these replica directly? Yes. You can access the replicas directly (of course, depending on the configuration), and replicas are generally independent. But writes should not go into the replicas as there's no bi-directional sync process. You can set it up so that reads could go to the replicas though. > 4. From this multi-site scenario, when a user write data to Ceph, will it > find the nearest OSD to put the data? When a user read data, does it always > respond from the primary data set (doesn't matter the location) or respond > from the nearest replica copy? > This is a bit moot, as with my proposed solution you'll have a cluster per location and not a single cluster overall. In any case, at this moment the rgw accesses the primary wherever it is. Yehuda _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
