We have a pilot project running where all our historical data worldwide would be stored using cassandra. So far, we have been successful at getting the write and read throughput we need, in fact, coming in over 27% over our needed capacity and well beyond what we were able to achieve with mysql, very impressive.
However, one thing that escapes me is how we should organize different data center access. The scenario is the following: - We have data centers in North America, London, Tokyo and so on. - The relative cost of data centers is very different, e.g., TCO for one server in Tokyo is about the same than 5 such computers in New York. - We want to have access to all the data from North America, hence we would run Hadoop/Pig queries from the New York/North America data center only. The problem is this: we would like the historical data from Tokyo to stay in Tokyo and only be replicated to New York. The one in London to be in London and only be replicated to New York and so on for all data centers. Is this currently possible with Cassandra? I believe we would need to run multiple clusters and migrate data manually from data centers to North America to achieve this. Also, any suggestions would also be welcomed.