Very powerful tool indeed, thanks for sharing! I believe it is best to keep tools like this in different repos since different tools will probably have different life cycles and tool chains. Yes, that could be handled in a single repo, but with different repos we'd get natural boundaries.
-----Original Message----- From: Sumanth Pasupuleti <spasupul...@netflix.com.INVALID> Sent: den 22 augusti 2019 14:40 To: dev@cassandra.apache.org Subject: Re: Contributing cassandra-diff No hard preference on the repo, but just excited about this tool! Looking forward to employing this for upgrade testing (very timely :)) On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe <s...@beobal.com> wrote: > My own weak preference would be for a dedicated repo in the first > instance. If/when additional tools are contributed we should look at > co-locating common stuff, but rushing toward a monorepo would be a > mistake IMO. > > > On 22 Aug 2019, at 11:10, Jeff Jirsa <jji...@gmail.com> wrote: > > > > I weakly prefer contrib. > > > > > > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson > > <marc...@apache.org> > wrote: > > > >> Hi, we are about to open source our tooling for comparing two > >> cassandra clusters and want to get some feedback where to push it. > >> I think the options are: (name bike-shedding welcome) > >> > >> 1. create repos/asf/cassandra-diff.git 2. create a generic > >> repos/asf/cassandra-contrib.git where we can add > more > >> contributed tools in the future > >> > >> Temporary location: > >> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717 > >> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2 > >> Fcassandra-diff > >> > >> Cassandra-diff is a spark job that compares the data in two > >> clusters - > it > >> pages through all partitions and reads all rows for those > >> partitions in both clusters to make sure they are identical. Based > >> on the > configuration > >> variable “reverse_read_probability” the rows are either read > >> forward or > in > >> reverse order. > >> > >> Our main use case for cassandra-diff has been to set up two > >> identical clusters, transfer a snapshot from the cluster we want to > >> test to these clusters and upgrade one side. When that is done we > >> run this tool to > make > >> sure that 2.1 and 3.0 gives the same results. A few examples of the > bugs we > >> have found using this tool: > >> > >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning > multiple > >> index blocks create invalid bound sequences on 3.0+ > >> * CASSANDRA-14803: Rows that cross index block boundaries can cause > >> incomplete reverse reads in some cases > >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse > >> iteration of indexed partitions > >> > >> /Marcus > >> > >> ------------------------------------------------------------------- > >> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >