Hi, we are about to open source our tooling for comparing two cassandra clusters and want to get some feedback where to push it. I think the options are: (name bike-shedding welcome)
1. create repos/asf/cassandra-diff.git 2. create a generic repos/asf/cassandra-contrib.git where we can add more contributed tools in the future Temporary location: https://github.com/krummas/cassandra-diff Cassandra-diff is a spark job that compares the data in two clusters - it pages through all partitions and reads all rows for those partitions in both clusters to make sure they are identical. Based on the configuration variable “reverse_read_probability” the rows are either read forward or in reverse order. Our main use case for cassandra-diff has been to set up two identical clusters, transfer a snapshot from the cluster we want to test to these clusters and upgrade one side. When that is done we run this tool to make sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we have found using this tool: * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple index blocks create invalid bound sequences on 3.0+ * CASSANDRA-14803: Rows that cross index block boundaries can cause incomplete reverse reads in some cases * CASSANDRA-15178: Skipping illegal legacy cells can break reverse iteration of indexed partitions /Marcus --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org