It’s great to contribute such a tool. The change between 2.x and 3.0 brought a 
translation layer from thrift to cql that is hard to validate on real clusters 
without something like this. Thank you.

As for naming, perhaps cassandra-compare might be clearer as diff is an 
overloaded word but that’s a bikeshed sort of argument.

> On Aug 22, 2019, at 12:32 AM, Vinay Chella <vinaykumar...@gmail.com> wrote:
> 
> This is a great addition to our Cassandra validation framework/tools. I can
> see many teams in the community get benefited from tooling like this.
> 
> I like the idea of the generic repo (repos/asf/cassandra-contrib.git
> or *whatever
> the name is*) for tools like this, for the following 2 main reasons.
> 
>   1. Easily accessible/ reachable/ searchable
>   2. Welcomes community in Cassandra ecosystem to contribute more easily
> 
> 
> 
> Thanks,
> Vinay Chella
> 
> 
>> On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson <marc...@apache.org> wrote:
>> 
>> Hi, we are about to open source our tooling for comparing two cassandra
>> clusters and want to get some feedback where to push it. I think the
>> options are: (name bike-shedding welcome)
>> 
>> 1. create repos/asf/cassandra-diff.git
>> 2. create a generic repos/asf/cassandra-contrib.git where we can add more
>> contributed tools in the future
>> 
>> Temporary location: https://github.com/krummas/cassandra-diff
>> 
>> Cassandra-diff is a spark job that compares the data in two clusters - it
>> pages through all partitions and reads all rows for those partitions in
>> both clusters to make sure they are identical. Based on the configuration
>> variable “reverse_read_probability” the rows are either read forward or in
>> reverse order.
>> 
>> Our main use case for cassandra-diff has been to set up two identical
>> clusters, transfer a snapshot from the cluster we want to test to these
>> clusters and upgrade one side. When that is done we run this tool to make
>> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we
>> have found using this tool:
>> 
>> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple
>> index blocks create invalid bound sequences on 3.0+
>> * CASSANDRA-14803: Rows that cross index block boundaries can cause
>> incomplete reverse reads in some cases
>> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
>> iteration of indexed partitions
>> 
>> /Marcus
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to