It reminds me of “shadow writes” described in [1]. During data migration the coordinator forwards a copy of any write request regarding tokens that are being transferred to the new node.
[1] Incremental Elasticity for NoSQL Data Stores, SRDS’17, https://ieeexplore.ieee.org/document/8069080 > On 18 Oct 2018, at 18:53, Carl Mueller <carl.muel...@smartthings.com.INVALID> > wrote: > > tl;dr: a generic trigger on TABLES that will mirror all writes to > facilitate data migrations between clusters or systems. What is necessary > to ensure full write mirroring/coherency? > > When cassandra clusters have several "apps" aka keyspaces serving > applications colocated on them, but the app/keyspace bandwidth and size > demands begin impacting other keyspaces/apps, then one strategy is to > migrate the keyspace to its own dedicated cluster. > > With backups/sstableloading, this will entail a delay and therefore a > "coherency" shortfall between the clusters. So typically one would employ a > "double write, read once": > > - all updates are mirrored to both clusters > - writes come from the current most coherent. > > Often two sstable loads are done: > > 1) first load > 2) turn on double writes/write mirroring > 3) a second load is done to finalize coherency > 4) switch the app to point to the new cluster now that it is coherent > > The double writes and read is the sticking point. We could do it at the app > layer, but if the app wasn't written with that, it is a lot of testing and > customization specific to the framework. > > We could theoretically do some sort of proxying of the java-driver somehow, > but all the async structures and complex interfaces/apis would be difficult > to proxy. Maybe there is a lower level in the java-driver that is possible. > This also would only apply to the java-driver, and not > python/go/javascript/other drivers. > > Finally, I suppose we could do a trigger on the tables. It would be really > nice if we could add to the cassandra toolbox the basics of a write > mirroring trigger that could be activated "fairly easily"... now I know > there are the complexities of inter-cluster access, and if we are even > using cassandra as the target mirror system (for example there is an > article on triggers write-mirroring to kafka: > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1). > > And this starts to get into the complexities of hinted handoff as well. But > fundamentally this seems something that would be a very nice feature > (especially when you NEED it) to have in the core of cassandra. > > Finally, is the mutation hook in triggers sufficient to track all incoming > mutations (outside of "shudder" other triggers generating data)