Hi, we are about to open source our tooling for comparing two cassandra 
clusters and want to get some feedback where to push it. I think the options 
are: (name bike-shedding welcome)

1. create repos/asf/cassandra-diff.git
2. create a generic repos/asf/cassandra-contrib.git where we can add more 
contributed tools in the future

Temporary location: https://github.com/krummas/cassandra-diff

Cassandra-diff is a spark job that compares the data in two clusters - it pages 
through all partitions and reads all rows for those partitions in both clusters 
to make sure they are identical. Based on the configuration variable 
“reverse_read_probability” the rows are either read forward or in reverse order.

Our main use case for cassandra-diff has been to set up two identical clusters, 
transfer a snapshot from the cluster we want to test to these clusters and 
upgrade one side. When that is done we run this tool to make sure that 2.1 and 
3.0 gives the same results. A few examples of the bugs we have found using this 
tool:

* CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple 
index blocks create invalid bound sequences on 3.0+
* CASSANDRA-14803: Rows that cross index block boundaries can cause incomplete 
reverse reads in some cases
* CASSANDRA-15178: Skipping illegal legacy cells can break reverse iteration of 
indexed partitions

/Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to