Sergio Esteves created CASSANDRA-7453:
-----------------------------------------

             Summary: Geo-replication in Cassandra
                 Key: CASSANDRA-7453
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7453
             Project: Cassandra
          Issue Type: Wish
            Reporter: Sergio Esteves
            Priority: Minor


Currently, a Cassandra cluster spanned across different datacenters replicates 
all data to all datacenters when an update is performed. This is a problem for 
the scalability of Cassandra as the number of datacenters increases.

It would be desirable to have some way to make Cassandra aware of the location 
of data requests so that it could place replicas close to users and avoid 
replicating to remote datacenters that are far away.

To this end, we thought of implementing a new replication strategy and some 
possible solutions to achieve our goals are:
1) Using a byte from every row key to identify the location of the primary 
datacenter where data should be stored (i.e., where it is likely to be 
accessed).
2) Using an additional CF for every row to specify the origin of the data.
3) Replicating only to the 2 closest datacenters from the user (for reliability 
reasons) upon a write update. For reads, a user would try to fetch data from 
the 2 closest datacenters; if data is not available it would try the other 
remaining datacenters. If data fails to be retrieved too many times, it means 
that the client has moved to other part of the planet, and thus data should be 
migrated accordingly. We could have some problems here, like having the same 
rows, but with different CFs in different DCs (i.e., if users perform updates 
to the same rows from different remote places).

What would be the best way to do this?

Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to