Re: Contributing cassandra-diff

2019-08-27 Thread Marcus Eriksson
I just created a dedicated repository and pushed the code there

Since there was no outrage over the name, I opted to save myself some painful
renaming work and keep the cassandra-diff name.

https://gitbox.apache.org/repos/asf?p=cassandra-diff.git
https://github.com/apache/cassandra-diff

Pull requests welcome!

/Marcus

On Thu, Aug 22, 2019 at 08:38:55AM +0200, Marcus Eriksson wrote:
> Hi, we are about to open source our tooling for comparing two cassandra 
> clusters and want to get some feedback where to push it. I think the options 
> are: (name bike-shedding welcome)
> 
> 1. create repos/asf/cassandra-diff.git
> 2. create a generic repos/asf/cassandra-contrib.git where we can add more 
> contributed tools in the future
> 
> Temporary location: https://github.com/krummas/cassandra-diff
> 
> Cassandra-diff is a spark job that compares the data in two clusters - it 
> pages through all partitions and reads all rows for those partitions in both 
> clusters to make sure they are identical. Based on the configuration variable 
> “reverse_read_probability” the rows are either read forward or in reverse 
> order.
> 
> Our main use case for cassandra-diff has been to set up two identical 
> clusters, transfer a snapshot from the cluster we want to test to these 
> clusters and upgrade one side. When that is done we run this tool to make 
> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we 
> have found using this tool:
> 
> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple 
> index blocks create invalid bound sequences on 3.0+
> * CASSANDRA-14803: Rows that cross index block boundaries can cause 
> incomplete reverse reads in some cases
> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse iteration 
> of indexed partitions
> 
> /Marcus
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Contributing cassandra-diff

2019-08-25 Thread Laxmikant Upadhyay
+1 for separate repo.

On Fri, Aug 23, 2019 at 9:29 AM Yifan Cai  wrote:

> Great addition in the tool set!
>
> A separate repo would be better.
>
> Grouping repos together only to be easier indexed does not seems to be a
> strong supportive reason. Just my 2 cents.
>
> - Yifan
>
> - Yifan
>
> 
> From: Dinesh Joshi 
> Sent: Thursday, August 22, 2019 11:42 AM
> To: dev
> Subject: Re: Contributing cassandra-diff
>
> +1 on a discrete repo.
>
> Dinesh
>
> > On Aug 22, 2019, at 9:14 AM, Michael Shuler 
> wrote:
> >
> > CI git polling for changes on a separate repository (if/when CI is
> needed) is probably a better way to go. I don't believe there are any
> issues with INFRA on us having discrete repos, and creating them with the
> self-help web tool is quick and easy.
> >
> > Thanks for the neat looking utility!
> >
> > Michael
> >
> > On 8/22/19 10:33 AM, Sankalp Kohli wrote:
> >> A different repo will be better
> >>> On Aug 22, 2019, at 6:16 AM, Per Otterström <
> per.otterst...@ericsson.com> wrote:
> >>>
> >>> Very powerful tool indeed, thanks for sharing!
> >>>
> >>> I believe it is best to keep tools like this in different repos since
> different tools will probably have different life cycles and tool chains.
> Yes, that could be handled in a single repo, but with different repos we'd
> get natural boundaries.
> >>>
> >>> -Original Message-
> >>> From: Sumanth Pasupuleti 
> >>> Sent: den 22 augusti 2019 14:40
> >>> To: dev@cassandra.apache.org
> >>> Subject: Re: Contributing cassandra-diff
> >>>
> >>> No hard preference on the repo, but just excited about this tool!
> Looking forward to employing this for upgrade testing (very timely :))
> >>>
> >>>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe 
> wrote:
> >>>>
> >>>> My own weak preference would be for a dedicated repo in the first
> >>>> instance. If/when additional tools are contributed we should look at
> >>>> co-locating common stuff, but rushing toward a monorepo would be a
> >>>> mistake IMO.
> >>>>
> >>>>>> On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
> >>>>>
> >>>>> I weakly prefer contrib.
> >>>>>
> >>>>>
> >>>>> On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson
> >>>>> 
> >>>> wrote:
> >>>>>
> >>>>>> Hi, we are about to open source our tooling for comparing two
> >>>>>> cassandra clusters and want to get some feedback where to push it.
> >>>>>> I think the options are: (name bike-shedding welcome)
> >>>>>>
> >>>>>> 1. create repos/asf/cassandra-diff.git 2. create a generic
> >>>>>> repos/asf/cassandra-contrib.git where we can add
> >>>> more
> >>>>>> contributed tools in the future
> >>>>>>
> >>>>>> Temporary location:
> >>>>>> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
> >>>>>> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
> >>>>>> Fcassandra-diff
> >>>>>>
> >>>>>> Cassandra-diff is a spark job that compares the data in two
> >>>>>> clusters -
> >>>> it
> >>>>>> pages through all partitions and reads all rows for those
> >>>>>> partitions in both clusters to make sure they are identical. Based
> >>>>>> on the
> >>>> configuration
> >>>>>> variable “reverse_read_probability† the rows are either read
> >>>>>> forward or
> >>>> in
> >>>>>> reverse order.
> >>>>>>
> >>>>>> Our main use case for cassandra-diff has been to set up two
> >>>>>> identical clusters, transfer a snapshot from the cluster we want to
> >>>>>> test to these clusters and upgrade one side. When that is done we
> >>>>>> run this tool to
> >>>> make
> >>>>>> sure that 2.1 and 3.0 gives the same results. A few examples of the
> >>>> bugs we
> >>>>>> have found using this tool:
> >>>>>

Re: Contributing cassandra-diff

2019-08-22 Thread Yifan Cai
Great addition in the tool set!

A separate repo would be better.

Grouping repos together only to be easier indexed does not seems to be a strong 
supportive reason. Just my 2 cents.

- Yifan

- Yifan


From: Dinesh Joshi 
Sent: Thursday, August 22, 2019 11:42 AM
To: dev
Subject: Re: Contributing cassandra-diff

+1 on a discrete repo.

Dinesh

> On Aug 22, 2019, at 9:14 AM, Michael Shuler  wrote:
>
> CI git polling for changes on a separate repository (if/when CI is needed) is 
> probably a better way to go. I don't believe there are any issues with INFRA 
> on us having discrete repos, and creating them with the self-help web tool is 
> quick and easy.
>
> Thanks for the neat looking utility!
>
> Michael
>
> On 8/22/19 10:33 AM, Sankalp Kohli wrote:
>> A different repo will be better
>>> On Aug 22, 2019, at 6:16 AM, Per Otterström  
>>> wrote:
>>>
>>> Very powerful tool indeed, thanks for sharing!
>>>
>>> I believe it is best to keep tools like this in different repos since 
>>> different tools will probably have different life cycles and tool chains. 
>>> Yes, that could be handled in a single repo, but with different repos we'd 
>>> get natural boundaries.
>>>
>>> -Original Message-
>>> From: Sumanth Pasupuleti 
>>> Sent: den 22 augusti 2019 14:40
>>> To: dev@cassandra.apache.org
>>> Subject: Re: Contributing cassandra-diff
>>>
>>> No hard preference on the repo, but just excited about this tool! Looking 
>>> forward to employing this for upgrade testing (very timely :))
>>>
>>>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe  wrote:
>>>>
>>>> My own weak preference would be for a dedicated repo in the first
>>>> instance. If/when additional tools are contributed we should look at
>>>> co-locating common stuff, but rushing toward a monorepo would be a
>>>> mistake IMO.
>>>>
>>>>>> On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
>>>>>
>>>>> I weakly prefer contrib.
>>>>>
>>>>>
>>>>> On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson
>>>>> 
>>>> wrote:
>>>>>
>>>>>> Hi, we are about to open source our tooling for comparing two
>>>>>> cassandra clusters and want to get some feedback where to push it.
>>>>>> I think the options are: (name bike-shedding welcome)
>>>>>>
>>>>>> 1. create repos/asf/cassandra-diff.git 2. create a generic
>>>>>> repos/asf/cassandra-contrib.git where we can add
>>>> more
>>>>>> contributed tools in the future
>>>>>>
>>>>>> Temporary location:
>>>>>> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
>>>>>> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
>>>>>> Fcassandra-diff
>>>>>>
>>>>>> Cassandra-diff is a spark job that compares the data in two
>>>>>> clusters -
>>>> it
>>>>>> pages through all partitions and reads all rows for those
>>>>>> partitions in both clusters to make sure they are identical. Based
>>>>>> on the
>>>> configuration
>>>>>> variable “reverse_read_probability” the rows are either read
>>>>>> forward or
>>>> in
>>>>>> reverse order.
>>>>>>
>>>>>> Our main use case for cassandra-diff has been to set up two
>>>>>> identical clusters, transfer a snapshot from the cluster we want to
>>>>>> test to these clusters and upgrade one side. When that is done we
>>>>>> run this tool to
>>>> make
>>>>>> sure that 2.1 and 3.0 gives the same results. A few examples of the
>>>> bugs we
>>>>>> have found using this tool:
>>>>>>
>>>>>> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
>>>> multiple
>>>>>> index blocks create invalid bound sequences on 3.0+
>>>>>> * CASSANDRA-14803: Rows that cross index block boundaries can cause
>>>>>> incomplete reverse reads in some cases
>>>>>> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
>>>>>> iteration of indexed partitions
>>>>>>
>>>>>

Re: Contributing cassandra-diff

2019-08-22 Thread Dinesh Joshi
+1 on a discrete repo.

Dinesh

> On Aug 22, 2019, at 9:14 AM, Michael Shuler  wrote:
> 
> CI git polling for changes on a separate repository (if/when CI is needed) is 
> probably a better way to go. I don't believe there are any issues with INFRA 
> on us having discrete repos, and creating them with the self-help web tool is 
> quick and easy.
> 
> Thanks for the neat looking utility!
> 
> Michael
> 
> On 8/22/19 10:33 AM, Sankalp Kohli wrote:
>> A different repo will be better
>>> On Aug 22, 2019, at 6:16 AM, Per Otterström  
>>> wrote:
>>> 
>>> Very powerful tool indeed, thanks for sharing!
>>> 
>>> I believe it is best to keep tools like this in different repos since 
>>> different tools will probably have different life cycles and tool chains. 
>>> Yes, that could be handled in a single repo, but with different repos we'd 
>>> get natural boundaries.
>>> 
>>> -Original Message-
>>> From: Sumanth Pasupuleti 
>>> Sent: den 22 augusti 2019 14:40
>>> To: dev@cassandra.apache.org
>>> Subject: Re: Contributing cassandra-diff
>>> 
>>> No hard preference on the repo, but just excited about this tool! Looking 
>>> forward to employing this for upgrade testing (very timely :))
>>> 
>>>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe  wrote:
>>>> 
>>>> My own weak preference would be for a dedicated repo in the first
>>>> instance. If/when additional tools are contributed we should look at
>>>> co-locating common stuff, but rushing toward a monorepo would be a
>>>> mistake IMO.
>>>> 
>>>>>> On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
>>>>> 
>>>>> I weakly prefer contrib.
>>>>> 
>>>>> 
>>>>> On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson
>>>>> 
>>>> wrote:
>>>>> 
>>>>>> Hi, we are about to open source our tooling for comparing two
>>>>>> cassandra clusters and want to get some feedback where to push it.
>>>>>> I think the options are: (name bike-shedding welcome)
>>>>>> 
>>>>>> 1. create repos/asf/cassandra-diff.git 2. create a generic
>>>>>> repos/asf/cassandra-contrib.git where we can add
>>>> more
>>>>>> contributed tools in the future
>>>>>> 
>>>>>> Temporary location:
>>>>>> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
>>>>>> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
>>>>>> Fcassandra-diff
>>>>>> 
>>>>>> Cassandra-diff is a spark job that compares the data in two
>>>>>> clusters -
>>>> it
>>>>>> pages through all partitions and reads all rows for those
>>>>>> partitions in both clusters to make sure they are identical. Based
>>>>>> on the
>>>> configuration
>>>>>> variable “reverse_read_probability” the rows are either read
>>>>>> forward or
>>>> in
>>>>>> reverse order.
>>>>>> 
>>>>>> Our main use case for cassandra-diff has been to set up two
>>>>>> identical clusters, transfer a snapshot from the cluster we want to
>>>>>> test to these clusters and upgrade one side. When that is done we
>>>>>> run this tool to
>>>> make
>>>>>> sure that 2.1 and 3.0 gives the same results. A few examples of the
>>>> bugs we
>>>>>> have found using this tool:
>>>>>> 
>>>>>> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
>>>> multiple
>>>>>> index blocks create invalid bound sequences on 3.0+
>>>>>> * CASSANDRA-14803: Rows that cross index block boundaries can cause
>>>>>> incomplete reverse reads in some cases
>>>>>> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
>>>>>> iteration of indexed partitions
>>>>>> 
>>>>>> /Marcus
>>>>>> 
>>>>>> ---
>>>>>> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>>> 
>>> B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ]‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ]‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃBƒB
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Contributing cassandra-diff

2019-08-22 Thread Roopa Tangirala
Markus,

This is great and very helpful for anyone running Cassandra in production
and have peace of mind to roll out upgrades. Thank you !

*Director, Cloud Data Engineering*


*Regards,Roopa Tangirala*



On Thu, Aug 22, 2019 at 9:14 AM Michael Shuler 
wrote:

> CI git polling for changes on a separate repository (if/when CI is
> needed) is probably a better way to go. I don't believe there are any
> issues with INFRA on us having discrete repos, and creating them with
> the self-help web tool is quick and easy.
>
> Thanks for the neat looking utility!
>
> Michael
>
> On 8/22/19 10:33 AM, Sankalp Kohli wrote:
> > A different repo will be better
> >
> >> On Aug 22, 2019, at 6:16 AM, Per Otterström <
> per.otterst...@ericsson.com> wrote:
> >>
> >> Very powerful tool indeed, thanks for sharing!
> >>
> >> I believe it is best to keep tools like this in different repos since
> different tools will probably have different life cycles and tool chains.
> Yes, that could be handled in a single repo, but with different repos we'd
> get natural boundaries.
> >>
> >> -Original Message-
> >> From: Sumanth Pasupuleti 
> >> Sent: den 22 augusti 2019 14:40
> >> To: dev@cassandra.apache.org
> >> Subject: Re: Contributing cassandra-diff
> >>
> >> No hard preference on the repo, but just excited about this tool!
> Looking forward to employing this for upgrade testing (very timely :))
> >>
> >>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe 
> wrote:
> >>>
> >>> My own weak preference would be for a dedicated repo in the first
> >>> instance. If/when additional tools are contributed we should look at
> >>> co-locating common stuff, but rushing toward a monorepo would be a
> >>> mistake IMO.
> >>>
> >>>>> On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
> >>>>
> >>>> I weakly prefer contrib.
> >>>>
> >>>>
> >>>> On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson
> >>>> 
> >>> wrote:
> >>>>
> >>>>> Hi, we are about to open source our tooling for comparing two
> >>>>> cassandra clusters and want to get some feedback where to push it.
> >>>>> I think the options are: (name bike-shedding welcome)
> >>>>>
> >>>>> 1. create repos/asf/cassandra-diff.git 2. create a generic
> >>>>> repos/asf/cassandra-contrib.git where we can add
> >>> more
> >>>>> contributed tools in the future
> >>>>>
> >>>>> Temporary location:
> >>>>> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
> >>>>> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
> >>>>> Fcassandra-diff
> >>>>>
> >>>>> Cassandra-diff is a spark job that compares the data in two
> >>>>> clusters -
> >>> it
> >>>>> pages through all partitions and reads all rows for those
> >>>>> partitions in both clusters to make sure they are identical. Based
> >>>>> on the
> >>> configuration
> >>>>> variable “reverse_read_probability† the rows are either read
> >>>>> forward or
> >>> in
> >>>>> reverse order.
> >>>>>
> >>>>> Our main use case for cassandra-diff has been to set up two
> >>>>> identical clusters, transfer a snapshot from the cluster we want to
> >>>>> test to these clusters and upgrade one side. When that is done we
> >>>>> run this tool to
> >>> make
> >>>>> sure that 2.1 and 3.0 gives the same results. A few examples of the
> >>> bugs we
> >>>>> have found using this tool:
> >>>>>
> >>>>> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
> >>> multiple
> >>>>> index blocks create invalid bound sequences on 3.0+
> >>>>> * CASSANDRA-14803: Rows that cross index block boundaries can cause
> >>>>> incomplete reverse reads in some cases
> >>>>> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
> >>>>> iteration of indexed partitions
> >>>>>
> >>>>> /Marcus
> >>>>>
> >>>>> ---
> >>>>> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>
> >>>>>
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>>
> >>
> B‹CB• È
> [œÝXœØÜšX™K  K[XZ[ ˆ  ]‹][œÝXœØÜšX™P Ø\ÜØ[™ ˜K˜\ XÚ K›Ü™ÃB‘›Üˆ Y  ] [Û˜[
> ÛÛ[X[™ Ë  K[XZ[ ˆ  ]‹Z [   Ø\ÜØ[™ ˜K˜\ XÚ K›Ü™ÃBƒB
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Contributing cassandra-diff

2019-08-22 Thread Michael Shuler
CI git polling for changes on a separate repository (if/when CI is 
needed) is probably a better way to go. I don't believe there are any 
issues with INFRA on us having discrete repos, and creating them with 
the self-help web tool is quick and easy.


Thanks for the neat looking utility!

Michael

On 8/22/19 10:33 AM, Sankalp Kohli wrote:

A different repo will be better


On Aug 22, 2019, at 6:16 AM, Per Otterström  wrote:

Very powerful tool indeed, thanks for sharing!

I believe it is best to keep tools like this in different repos since different 
tools will probably have different life cycles and tool chains. Yes, that could 
be handled in a single repo, but with different repos we'd get natural 
boundaries.

-Original Message-
From: Sumanth Pasupuleti 
Sent: den 22 augusti 2019 14:40
To: dev@cassandra.apache.org
Subject: Re: Contributing cassandra-diff

No hard preference on the repo, but just excited about this tool! Looking 
forward to employing this for upgrade testing (very timely :))


On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe  wrote:

My own weak preference would be for a dedicated repo in the first
instance. If/when additional tools are contributed we should look at
co-locating common stuff, but rushing toward a monorepo would be a
mistake IMO.


On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:


I weakly prefer contrib.


On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson


wrote:



Hi, we are about to open source our tooling for comparing two
cassandra clusters and want to get some feedback where to push it.
I think the options are: (name bike-shedding welcome)

1. create repos/asf/cassandra-diff.git 2. create a generic
repos/asf/cassandra-contrib.git where we can add

more

contributed tools in the future

Temporary location:
https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
Fcassandra-diff

Cassandra-diff is a spark job that compares the data in two
clusters -

it

pages through all partitions and reads all rows for those
partitions in both clusters to make sure they are identical. Based
on the

configuration

variable “reverse_read_probability” the rows are either read
forward or

in

reverse order.

Our main use case for cassandra-diff has been to set up two
identical clusters, transfer a snapshot from the cluster we want to
test to these clusters and upgrade one side. When that is done we
run this tool to

make

sure that 2.1 and 3.0 gives the same results. A few examples of the

bugs we

have found using this tool:

* CASSANDRA-14823: Legacy sstables with range tombstones spanning

multiple

index blocks create invalid bound sequences on 3.0+
* CASSANDRA-14803: Rows that cross index block boundaries can cause
incomplete reverse reads in some cases
* CASSANDRA-15178: Skipping illegal legacy cells can break reverse
iteration of indexed partitions

/Marcus

---
-- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ]‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ]‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃBƒB


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Contributing cassandra-diff

2019-08-22 Thread Sankalp Kohli
A different repo will be better 

> On Aug 22, 2019, at 6:16 AM, Per Otterström  
> wrote:
> 
> Very powerful tool indeed, thanks for sharing!
> 
> I believe it is best to keep tools like this in different repos since 
> different tools will probably have different life cycles and tool chains. 
> Yes, that could be handled in a single repo, but with different repos we'd 
> get natural boundaries.
> 
> -Original Message-
> From: Sumanth Pasupuleti  
> Sent: den 22 augusti 2019 14:40
> To: dev@cassandra.apache.org
> Subject: Re: Contributing cassandra-diff
> 
> No hard preference on the repo, but just excited about this tool! Looking 
> forward to employing this for upgrade testing (very timely :))
> 
>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe  wrote:
>> 
>> My own weak preference would be for a dedicated repo in the first 
>> instance. If/when additional tools are contributed we should look at 
>> co-locating common stuff, but rushing toward a monorepo would be a 
>> mistake IMO.
>> 
>>>> On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
>>> 
>>> I weakly prefer contrib.
>>> 
>>> 
>>> On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson 
>>> 
>> wrote:
>>> 
>>>> Hi, we are about to open source our tooling for comparing two 
>>>> cassandra clusters and want to get some feedback where to push it. 
>>>> I think the options are: (name bike-shedding welcome)
>>>> 
>>>> 1. create repos/asf/cassandra-diff.git 2. create a generic 
>>>> repos/asf/cassandra-contrib.git where we can add
>> more
>>>> contributed tools in the future
>>>> 
>>>> Temporary location: 
>>>> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
>>>> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
>>>> Fcassandra-diff
>>>> 
>>>> Cassandra-diff is a spark job that compares the data in two 
>>>> clusters -
>> it
>>>> pages through all partitions and reads all rows for those 
>>>> partitions in both clusters to make sure they are identical. Based 
>>>> on the
>> configuration
>>>> variable “reverse_read_probability” the rows are either read 
>>>> forward or
>> in
>>>> reverse order.
>>>> 
>>>> Our main use case for cassandra-diff has been to set up two 
>>>> identical clusters, transfer a snapshot from the cluster we want to 
>>>> test to these clusters and upgrade one side. When that is done we 
>>>> run this tool to
>> make
>>>> sure that 2.1 and 3.0 gives the same results. A few examples of the
>> bugs we
>>>> have found using this tool:
>>>> 
>>>> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
>> multiple
>>>> index blocks create invalid bound sequences on 3.0+
>>>> * CASSANDRA-14803: Rows that cross index block boundaries can cause 
>>>> incomplete reverse reads in some cases
>>>> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse 
>>>> iteration of indexed partitions
>>>> 
>>>> /Marcus
>>>> 
>>>> ---
>>>> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 
> B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ]‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ]‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃBƒB

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: Contributing cassandra-diff

2019-08-22 Thread Per Otterström
Very powerful tool indeed, thanks for sharing!

I believe it is best to keep tools like this in different repos since different 
tools will probably have different life cycles and tool chains. Yes, that could 
be handled in a single repo, but with different repos we'd get natural 
boundaries.

-Original Message-
From: Sumanth Pasupuleti  
Sent: den 22 augusti 2019 14:40
To: dev@cassandra.apache.org
Subject: Re: Contributing cassandra-diff

No hard preference on the repo, but just excited about this tool! Looking 
forward to employing this for upgrade testing (very timely :))

On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe  wrote:

> My own weak preference would be for a dedicated repo in the first 
> instance. If/when additional tools are contributed we should look at 
> co-locating common stuff, but rushing toward a monorepo would be a 
> mistake IMO.
>
> > On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
> >
> > I weakly prefer contrib.
> >
> >
> > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson 
> > 
> wrote:
> >
> >> Hi, we are about to open source our tooling for comparing two 
> >> cassandra clusters and want to get some feedback where to push it. 
> >> I think the options are: (name bike-shedding welcome)
> >>
> >> 1. create repos/asf/cassandra-diff.git 2. create a generic 
> >> repos/asf/cassandra-contrib.git where we can add
> more
> >> contributed tools in the future
> >>
> >> Temporary location: 
> >> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717
> >> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2
> >> Fcassandra-diff
> >>
> >> Cassandra-diff is a spark job that compares the data in two 
> >> clusters -
> it
> >> pages through all partitions and reads all rows for those 
> >> partitions in both clusters to make sure they are identical. Based 
> >> on the
> configuration
> >> variable “reverse_read_probability” the rows are either read 
> >> forward or
> in
> >> reverse order.
> >>
> >> Our main use case for cassandra-diff has been to set up two 
> >> identical clusters, transfer a snapshot from the cluster we want to 
> >> test to these clusters and upgrade one side. When that is done we 
> >> run this tool to
> make
> >> sure that 2.1 and 3.0 gives the same results. A few examples of the
> bugs we
> >> have found using this tool:
> >>
> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
> multiple
> >> index blocks create invalid bound sequences on 3.0+
> >> * CASSANDRA-14803: Rows that cross index block boundaries can cause 
> >> incomplete reverse reads in some cases
> >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse 
> >> iteration of indexed partitions
> >>
> >> /Marcus
> >>
> >> ---
> >> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Contributing cassandra-diff

2019-08-22 Thread Sumanth Pasupuleti
No hard preference on the repo, but just excited about this tool! Looking
forward to employing this for upgrade testing (very timely :))

On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe  wrote:

> My own weak preference would be for a dedicated repo in the first
> instance. If/when additional tools are contributed we should look at
> co-locating common stuff, but rushing toward a monorepo would be a mistake
> IMO.
>
> > On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
> >
> > I weakly prefer contrib.
> >
> >
> > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson 
> wrote:
> >
> >> Hi, we are about to open source our tooling for comparing two cassandra
> >> clusters and want to get some feedback where to push it. I think the
> >> options are: (name bike-shedding welcome)
> >>
> >> 1. create repos/asf/cassandra-diff.git
> >> 2. create a generic repos/asf/cassandra-contrib.git where we can add
> more
> >> contributed tools in the future
> >>
> >> Temporary location: https://github.com/krummas/cassandra-diff
> >>
> >> Cassandra-diff is a spark job that compares the data in two clusters -
> it
> >> pages through all partitions and reads all rows for those partitions in
> >> both clusters to make sure they are identical. Based on the
> configuration
> >> variable “reverse_read_probability” the rows are either read forward or
> in
> >> reverse order.
> >>
> >> Our main use case for cassandra-diff has been to set up two identical
> >> clusters, transfer a snapshot from the cluster we want to test to these
> >> clusters and upgrade one side. When that is done we run this tool to
> make
> >> sure that 2.1 and 3.0 gives the same results. A few examples of the
> bugs we
> >> have found using this tool:
> >>
> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
> multiple
> >> index blocks create invalid bound sequences on 3.0+
> >> * CASSANDRA-14803: Rows that cross index block boundaries can cause
> >> incomplete reverse reads in some cases
> >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
> >> iteration of indexed partitions
> >>
> >> /Marcus
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Contributing cassandra-diff

2019-08-22 Thread Sam Tunnicliffe
My own weak preference would be for a dedicated repo in the first instance. 
If/when additional tools are contributed we should look at co-locating common 
stuff, but rushing toward a monorepo would be a mistake IMO.

> On 22 Aug 2019, at 11:10, Jeff Jirsa  wrote:
> 
> I weakly prefer contrib.
> 
> 
> On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson  wrote:
> 
>> Hi, we are about to open source our tooling for comparing two cassandra
>> clusters and want to get some feedback where to push it. I think the
>> options are: (name bike-shedding welcome)
>> 
>> 1. create repos/asf/cassandra-diff.git
>> 2. create a generic repos/asf/cassandra-contrib.git where we can add more
>> contributed tools in the future
>> 
>> Temporary location: https://github.com/krummas/cassandra-diff
>> 
>> Cassandra-diff is a spark job that compares the data in two clusters - it
>> pages through all partitions and reads all rows for those partitions in
>> both clusters to make sure they are identical. Based on the configuration
>> variable “reverse_read_probability” the rows are either read forward or in
>> reverse order.
>> 
>> Our main use case for cassandra-diff has been to set up two identical
>> clusters, transfer a snapshot from the cluster we want to test to these
>> clusters and upgrade one side. When that is done we run this tool to make
>> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we
>> have found using this tool:
>> 
>> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple
>> index blocks create invalid bound sequences on 3.0+
>> * CASSANDRA-14803: Rows that cross index block boundaries can cause
>> incomplete reverse reads in some cases
>> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
>> iteration of indexed partitions
>> 
>> /Marcus
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Contributing cassandra-diff

2019-08-22 Thread Jeff Jirsa
I weakly prefer contrib.


On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson  wrote:

> Hi, we are about to open source our tooling for comparing two cassandra
> clusters and want to get some feedback where to push it. I think the
> options are: (name bike-shedding welcome)
>
> 1. create repos/asf/cassandra-diff.git
> 2. create a generic repos/asf/cassandra-contrib.git where we can add more
> contributed tools in the future
>
> Temporary location: https://github.com/krummas/cassandra-diff
>
> Cassandra-diff is a spark job that compares the data in two clusters - it
> pages through all partitions and reads all rows for those partitions in
> both clusters to make sure they are identical. Based on the configuration
> variable “reverse_read_probability” the rows are either read forward or in
> reverse order.
>
> Our main use case for cassandra-diff has been to set up two identical
> clusters, transfer a snapshot from the cluster we want to test to these
> clusters and upgrade one side. When that is done we run this tool to make
> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we
> have found using this tool:
>
> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple
> index blocks create invalid bound sequences on 3.0+
> * CASSANDRA-14803: Rows that cross index block boundaries can cause
> incomplete reverse reads in some cases
> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
> iteration of indexed partitions
>
> /Marcus
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Contributing cassandra-diff

2019-08-22 Thread Ahmed Eljami
Great addition! Thanks Marcus.

+1 for cassandra-compare as said by Jeremy.

We can also think about other features like:

- Comparing just the count between 2 tables. In some cases, It will be
enough to say that our copy is OK.

- Making a difference on a set of partition ==> This will avoid comparing
the full of data in case of large volumes and when a set of data will be
enough to be sure of our copy.

Thanks

Le jeu. 22 août 2019 à 09:49, Jeremy Hanna  a
écrit :

> It’s great to contribute such a tool. The change between 2.x and 3.0
> brought a translation layer from thrift to cql that is hard to validate on
> real clusters without something like this. Thank you.
>
> As for naming, perhaps cassandra-compare might be clearer as diff is an
> overloaded word but that’s a bikeshed sort of argument.
>
> > On Aug 22, 2019, at 12:32 AM, Vinay Chella 
> wrote:
> >
> > This is a great addition to our Cassandra validation framework/tools. I
> can
> > see many teams in the community get benefited from tooling like this.
> >
> > I like the idea of the generic repo (repos/asf/cassandra-contrib.git
> > or *whatever
> > the name is*) for tools like this, for the following 2 main reasons.
> >
> >   1. Easily accessible/ reachable/ searchable
> >   2. Welcomes community in Cassandra ecosystem to contribute more easily
> >
> >
> >
> > Thanks,
> > Vinay Chella
> >
> >
> >> On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson 
> wrote:
> >>
> >> Hi, we are about to open source our tooling for comparing two cassandra
> >> clusters and want to get some feedback where to push it. I think the
> >> options are: (name bike-shedding welcome)
> >>
> >> 1. create repos/asf/cassandra-diff.git
> >> 2. create a generic repos/asf/cassandra-contrib.git where we can add
> more
> >> contributed tools in the future
> >>
> >> Temporary location: https://github.com/krummas/cassandra-diff
> >>
> >> Cassandra-diff is a spark job that compares the data in two clusters -
> it
> >> pages through all partitions and reads all rows for those partitions in
> >> both clusters to make sure they are identical. Based on the
> configuration
> >> variable “reverse_read_probability” the rows are either read forward or
> in
> >> reverse order.
> >>
> >> Our main use case for cassandra-diff has been to set up two identical
> >> clusters, transfer a snapshot from the cluster we want to test to these
> >> clusters and upgrade one side. When that is done we run this tool to
> make
> >> sure that 2.1 and 3.0 gives the same results. A few examples of the
> bugs we
> >> have found using this tool:
> >>
> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
> multiple
> >> index blocks create invalid bound sequences on 3.0+
> >> * CASSANDRA-14803: Rows that cross index block boundaries can cause
> >> incomplete reverse reads in some cases
> >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
> >> iteration of indexed partitions
> >>
> >> /Marcus
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Cordialement;

Ahmed ELJAMI


Re: Contributing cassandra-diff

2019-08-22 Thread Jeremy Hanna
It’s great to contribute such a tool. The change between 2.x and 3.0 brought a 
translation layer from thrift to cql that is hard to validate on real clusters 
without something like this. Thank you.

As for naming, perhaps cassandra-compare might be clearer as diff is an 
overloaded word but that’s a bikeshed sort of argument.

> On Aug 22, 2019, at 12:32 AM, Vinay Chella  wrote:
> 
> This is a great addition to our Cassandra validation framework/tools. I can
> see many teams in the community get benefited from tooling like this.
> 
> I like the idea of the generic repo (repos/asf/cassandra-contrib.git
> or *whatever
> the name is*) for tools like this, for the following 2 main reasons.
> 
>   1. Easily accessible/ reachable/ searchable
>   2. Welcomes community in Cassandra ecosystem to contribute more easily
> 
> 
> 
> Thanks,
> Vinay Chella
> 
> 
>> On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson  wrote:
>> 
>> Hi, we are about to open source our tooling for comparing two cassandra
>> clusters and want to get some feedback where to push it. I think the
>> options are: (name bike-shedding welcome)
>> 
>> 1. create repos/asf/cassandra-diff.git
>> 2. create a generic repos/asf/cassandra-contrib.git where we can add more
>> contributed tools in the future
>> 
>> Temporary location: https://github.com/krummas/cassandra-diff
>> 
>> Cassandra-diff is a spark job that compares the data in two clusters - it
>> pages through all partitions and reads all rows for those partitions in
>> both clusters to make sure they are identical. Based on the configuration
>> variable “reverse_read_probability” the rows are either read forward or in
>> reverse order.
>> 
>> Our main use case for cassandra-diff has been to set up two identical
>> clusters, transfer a snapshot from the cluster we want to test to these
>> clusters and upgrade one side. When that is done we run this tool to make
>> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we
>> have found using this tool:
>> 
>> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple
>> index blocks create invalid bound sequences on 3.0+
>> * CASSANDRA-14803: Rows that cross index block boundaries can cause
>> incomplete reverse reads in some cases
>> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
>> iteration of indexed partitions
>> 
>> /Marcus
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Contributing cassandra-diff

2019-08-22 Thread Vinay Chella
This is a great addition to our Cassandra validation framework/tools. I can
see many teams in the community get benefited from tooling like this.

I like the idea of the generic repo (repos/asf/cassandra-contrib.git
or *whatever
the name is*) for tools like this, for the following 2 main reasons.

   1. Easily accessible/ reachable/ searchable
   2. Welcomes community in Cassandra ecosystem to contribute more easily



Thanks,
Vinay Chella


On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson  wrote:

> Hi, we are about to open source our tooling for comparing two cassandra
> clusters and want to get some feedback where to push it. I think the
> options are: (name bike-shedding welcome)
>
> 1. create repos/asf/cassandra-diff.git
> 2. create a generic repos/asf/cassandra-contrib.git where we can add more
> contributed tools in the future
>
> Temporary location: https://github.com/krummas/cassandra-diff
>
> Cassandra-diff is a spark job that compares the data in two clusters - it
> pages through all partitions and reads all rows for those partitions in
> both clusters to make sure they are identical. Based on the configuration
> variable “reverse_read_probability” the rows are either read forward or in
> reverse order.
>
> Our main use case for cassandra-diff has been to set up two identical
> clusters, transfer a snapshot from the cluster we want to test to these
> clusters and upgrade one side. When that is done we run this tool to make
> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we
> have found using this tool:
>
> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple
> index blocks create invalid bound sequences on 3.0+
> * CASSANDRA-14803: Rows that cross index block boundaries can cause
> incomplete reverse reads in some cases
> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
> iteration of indexed partitions
>
> /Marcus
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>