Very powerful tool indeed, thanks for sharing!

I believe it is best to keep tools like this in different repos since different 
tools will probably have different life cycles and tool chains. Yes, that could 
be handled in a single repo, but with different repos we'd get natural 
boundaries.

-----Original Message-----
From: Sumanth Pasupuleti <spasupul...@netflix.com.INVALID> 
Sent: den 22 augusti 2019 14:40
To: dev@cassandra.apache.org
Subject: Re: Contributing cassandra-diff

No hard preference on the repo, but just excited about this tool! Looking 
forward to employing this for upgrade testing (very timely :))

On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe <s...@beobal.com> wrote:

> My own weak preference would be for a dedicated repo in the first 
> instance. If/when additional tools are contributed we should look at 
> co-locating common stuff, but rushing toward a monorepo would be a 
> mistake IMO.
>
> > On 22 Aug 2019, at 11:10, Jeff Jirsa <jji...@gmail.com> wrote:
> >
> > I weakly prefer contrib.
> >
> >
> > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson 
> > <marc...@apache.org>
> wrote:
> >
> >> Hi, we are about to open source our tooling for comparing two 
> >> cassandra clusters and want to get some feedback where to push it. 
> >> I think the options are: (name bike-shedding welcome)
> >>
> >> 1. create repos/asf/cassandra-diff.git 2. create a generic 
> >> repos/asf/cassandra-contrib.git where we can add
> more
> >> contributed tools in the future
> >>
> >> Temporary location: 
> >> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
> >> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
> >> Fcassandra-diff
> >>
> >> Cassandra-diff is a spark job that compares the data in two 
> >> clusters -
> it
> >> pages through all partitions and reads all rows for those 
> >> partitions in both clusters to make sure they are identical. Based 
> >> on the
> configuration
> >> variable “reverse_read_probability” the rows are either read 
> >> forward or
> in
> >> reverse order.
> >>
> >> Our main use case for cassandra-diff has been to set up two 
> >> identical clusters, transfer a snapshot from the cluster we want to 
> >> test to these clusters and upgrade one side. When that is done we 
> >> run this tool to
> make
> >> sure that 2.1 and 3.0 gives the same results. A few examples of the
> bugs we
> >> have found using this tool:
> >>
> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
> multiple
> >> index blocks create invalid bound sequences on 3.0+
> >> * CASSANDRA-14803: Rows that cross index block boundaries can cause 
> >> incomplete reverse reads in some cases
> >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse 
> >> iteration of indexed partitions
> >>
> >> /Marcus
> >>
> >> -------------------------------------------------------------------
> >> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Reply via email to