We have a pilot project running where all our historical data
worldwide would be stored using cassandra.  So far, we have been
successful at getting the write and read throughput we need, in fact,
coming in over 27% over our needed capacity and well beyond what we
were able to achieve with mysql, very impressive.

However, one thing that escapes me is how we should organize different
data center access.

The scenario is the following:

- We have data centers in North America, London, Tokyo and so on.
- The relative cost of data centers is very different, e.g., TCO for
one server in Tokyo is about the same than 5 such computers in New
York.
- We want to have access to all the data from North America, hence we
would run Hadoop/Pig queries from the New York/North America data
center only.

The problem is this: we would like the historical data from Tokyo to
stay in Tokyo and only be replicated to New York.  The one in London
to be in London and only be replicated to New York and so on for all
data centers.

Is this currently possible with Cassandra?  I believe we would need to
run multiple clusters and migrate data manually from data centers to
North America to achieve this.  Also, any suggestions would also be
welcomed.

Reply via email to