It’s great to contribute such a tool. The change between 2.x and 3.0 brought a translation layer from thrift to cql that is hard to validate on real clusters without something like this. Thank you.
As for naming, perhaps cassandra-compare might be clearer as diff is an overloaded word but that’s a bikeshed sort of argument. > On Aug 22, 2019, at 12:32 AM, Vinay Chella <vinaykumar...@gmail.com> wrote: > > This is a great addition to our Cassandra validation framework/tools. I can > see many teams in the community get benefited from tooling like this. > > I like the idea of the generic repo (repos/asf/cassandra-contrib.git > or *whatever > the name is*) for tools like this, for the following 2 main reasons. > > 1. Easily accessible/ reachable/ searchable > 2. Welcomes community in Cassandra ecosystem to contribute more easily > > > > Thanks, > Vinay Chella > > >> On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson <marc...@apache.org> wrote: >> >> Hi, we are about to open source our tooling for comparing two cassandra >> clusters and want to get some feedback where to push it. I think the >> options are: (name bike-shedding welcome) >> >> 1. create repos/asf/cassandra-diff.git >> 2. create a generic repos/asf/cassandra-contrib.git where we can add more >> contributed tools in the future >> >> Temporary location: https://github.com/krummas/cassandra-diff >> >> Cassandra-diff is a spark job that compares the data in two clusters - it >> pages through all partitions and reads all rows for those partitions in >> both clusters to make sure they are identical. Based on the configuration >> variable “reverse_read_probability” the rows are either read forward or in >> reverse order. >> >> Our main use case for cassandra-diff has been to set up two identical >> clusters, transfer a snapshot from the cluster we want to test to these >> clusters and upgrade one side. When that is done we run this tool to make >> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we >> have found using this tool: >> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple >> index blocks create invalid bound sequences on 3.0+ >> * CASSANDRA-14803: Rows that cross index block boundaries can cause >> incomplete reverse reads in some cases >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse >> iteration of indexed partitions >> >> /Marcus >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org