Re: Roadmap for 4.0

2018-04-04 Thread Blake Eggleston
+1

On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:

Earlier than I’d have personally picked, but I’m +1 too



-- 
Jeff Jirsa


> On Apr 4, 2018, at 5:06 PM, Nate McCall  wrote:
> 
> Top-posting as I think this summary is on point - thanks, Scott! (And
> great to have you back, btw).
> 
> It feels to me like we are coalescing on two points:
> 1. June 1 as a freeze for alpha
> 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> implied by such before a GA)
> 
> How do folks feel about the above points?
> 
> 
>> Re-raising a point made earlier in the thread by Jeff and affirmed by 
Josh:
>> 
>> –––
>> Jeff:
 A hard date for a feature freeze makes sense, a hard date for a release
 does not.
>> 
>> Josh:
>>> Strongly agree. We should also collectively define what "Done" looks 
like
>>> post freeze so we don't end up in bike-shedding hell like we have in the
>>> past.
>> –––
>> 
>> Another way of saying this: ensuring that the 4.0 release is of high 
quality is more important than cutting the release on a specific date.
>> 
>> If we adopt Sylvain's suggestion of freezing features on a "feature 
complete" date (modulo a "definition of done" as Josh suggested), that will 
help us align toward the polish, performance work, and dog-fooding needed to 
feel great about shipping 4.0. It's a good time to start thinking about the 
approaches to testing, profiling, and dog-fooding various contributors will 
want to take on before release.
>> 
>> I love how Ben put it:
>> 
>>> An "exciting" 4.0 release to me is one that is stable and usable
>>> with no perf regressions on day 1 and includes some of the big
>>> internal changes mentioned previously.
>>> 
>>> This will set the community up well for some awesome and exciting
>>> stuff that will still be in the pipeline if it doesn't make it to 4.0.
>> 
>> That sounds great to me, too.
>> 
>> – Scott
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-04 Thread Alex Lourie
+1 and seriously hoping stuff marked "Patch available" will at least get a
chance of cutting in.

On Thu, 5 Apr 2018 at 12:43 kurt greaves  wrote:

> >
> > Earlier than I’d have personally picked, but I’m +1 too
>
> This but +1.
>
> On 5 April 2018 at 03:04, J. D. Jordan  wrote:
>
> > +1
> >
> > > On Apr 4, 2018, at 5:06 PM, Nate McCall  wrote:
> > >
> > > Top-posting as I think this summary is on point - thanks, Scott! (And
> > > great to have you back, btw).
> > >
> > > It feels to me like we are coalescing on two points:
> > > 1. June 1 as a freeze for alpha
> > > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > > implied by such before a GA)
> > >
> > > How do folks feel about the above points?
> > >
> > >
> > >> Re-raising a point made earlier in the thread by Jeff and affirmed by
> > Josh:
> > >>
> > >> –––
> > >> Jeff:
> >  A hard date for a feature freeze makes sense, a hard date for a
> > release
> >  does not.
> > >>
> > >> Josh:
> > >>> Strongly agree. We should also collectively define what "Done" looks
> > like
> > >>> post freeze so we don't end up in bike-shedding hell like we have in
> > the
> > >>> past.
> > >> –––
> > >>
> > >> Another way of saying this: ensuring that the 4.0 release is of high
> > quality is more important than cutting the release on a specific date.
> > >>
> > >> If we adopt Sylvain's suggestion of freezing features on a "feature
> > complete" date (modulo a "definition of done" as Josh suggested), that
> will
> > help us align toward the polish, performance work, and dog-fooding needed
> > to feel great about shipping 4.0. It's a good time to start thinking
> about
> > the approaches to testing, profiling, and dog-fooding various
> contributors
> > will want to take on before release.
> > >>
> > >> I love how Ben put it:
> > >>
> > >>> An "exciting" 4.0 release to me is one that is stable and usable
> > >>> with no perf regressions on day 1 and includes some of the big
> > >>> internal changes mentioned previously.
> > >>>
> > >>> This will set the community up well for some awesome and exciting
> > >>> stuff that will still be in the pipeline if it doesn't make it to
> 4.0.
> > >>
> > >> That sounds great to me, too.
> > >>
> > >> – Scott
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
-- 


*Alex Lourie*
*Software Engineer*+61 423177059


   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Roadmap for 4.0

2018-04-04 Thread kurt greaves
>
> Earlier than I’d have personally picked, but I’m +1 too

This but +1.

On 5 April 2018 at 03:04, J. D. Jordan  wrote:

> +1
>
> > On Apr 4, 2018, at 5:06 PM, Nate McCall  wrote:
> >
> > Top-posting as I think this summary is on point - thanks, Scott! (And
> > great to have you back, btw).
> >
> > It feels to me like we are coalescing on two points:
> > 1. June 1 as a freeze for alpha
> > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > implied by such before a GA)
> >
> > How do folks feel about the above points?
> >
> >
> >> Re-raising a point made earlier in the thread by Jeff and affirmed by
> Josh:
> >>
> >> –––
> >> Jeff:
>  A hard date for a feature freeze makes sense, a hard date for a
> release
>  does not.
> >>
> >> Josh:
> >>> Strongly agree. We should also collectively define what "Done" looks
> like
> >>> post freeze so we don't end up in bike-shedding hell like we have in
> the
> >>> past.
> >> –––
> >>
> >> Another way of saying this: ensuring that the 4.0 release is of high
> quality is more important than cutting the release on a specific date.
> >>
> >> If we adopt Sylvain's suggestion of freezing features on a "feature
> complete" date (modulo a "definition of done" as Josh suggested), that will
> help us align toward the polish, performance work, and dog-fooding needed
> to feel great about shipping 4.0. It's a good time to start thinking about
> the approaches to testing, profiling, and dog-fooding various contributors
> will want to take on before release.
> >>
> >> I love how Ben put it:
> >>
> >>> An "exciting" 4.0 release to me is one that is stable and usable
> >>> with no perf regressions on day 1 and includes some of the big
> >>> internal changes mentioned previously.
> >>>
> >>> This will set the community up well for some awesome and exciting
> >>> stuff that will still be in the pipeline if it doesn't make it to 4.0.
> >>
> >> That sounds great to me, too.
> >>
> >> – Scott
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Roadmap for 4.0

2018-04-04 Thread J. D. Jordan
+1

> On Apr 4, 2018, at 5:06 PM, Nate McCall  wrote:
> 
> Top-posting as I think this summary is on point - thanks, Scott! (And
> great to have you back, btw).
> 
> It feels to me like we are coalescing on two points:
> 1. June 1 as a freeze for alpha
> 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> implied by such before a GA)
> 
> How do folks feel about the above points?
> 
> 
>> Re-raising a point made earlier in the thread by Jeff and affirmed by Josh:
>> 
>> –––
>> Jeff:
 A hard date for a feature freeze makes sense, a hard date for a release
 does not.
>> 
>> Josh:
>>> Strongly agree. We should also collectively define what "Done" looks like
>>> post freeze so we don't end up in bike-shedding hell like we have in the
>>> past.
>> –––
>> 
>> Another way of saying this: ensuring that the 4.0 release is of high quality 
>> is more important than cutting the release on a specific date.
>> 
>> If we adopt Sylvain's suggestion of freezing features on a "feature 
>> complete" date (modulo a "definition of done" as Josh suggested), that will 
>> help us align toward the polish, performance work, and dog-fooding needed to 
>> feel great about shipping 4.0. It's a good time to start thinking about the 
>> approaches to testing, profiling, and dog-fooding various contributors will 
>> want to take on before release.
>> 
>> I love how Ben put it:
>> 
>>> An "exciting" 4.0 release to me is one that is stable and usable
>>> with no perf regressions on day 1 and includes some of the big
>>> internal changes mentioned previously.
>>> 
>>> This will set the community up well for some awesome and exciting
>>> stuff that will still be in the pipeline if it doesn't make it to 4.0.
>> 
>> That sounds great to me, too.
>> 
>> – Scott
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-04 Thread Jeremy Hanna
On Wed, Apr 4, 2018 at 7:50 PM, Michael Shuler 
wrote:

> On 04/04/2018 07:06 PM, Nate McCall wrote:
> >
> > It feels to me like we are coalescing on two points:
> > 1. June 1 as a freeze for alpha
> > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > implied by such before a GA)
> >
> > How do folks feel about the above points?
>
> +1
> +1
>

+1 though we may need some additional work on counters ;)


> :)
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Roadmap for 4.0

2018-04-04 Thread Ben Bromhead
+1

On Wed, Apr 4, 2018 at 8:50 PM Michael Shuler 
wrote:

> On 04/04/2018 07:06 PM, Nate McCall wrote:
> >
> > It feels to me like we are coalescing on two points:
> > 1. June 1 as a freeze for alpha
> > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > implied by such before a GA)
> >
> > How do folks feel about the above points?
>
> +1
> +1
>
> :)
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
> --
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer


Re: Roadmap for 4.0

2018-04-04 Thread Michael Shuler
On 04/04/2018 07:06 PM, Nate McCall wrote:
> 
> It feels to me like we are coalescing on two points:
> 1. June 1 as a freeze for alpha
> 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> implied by such before a GA)
> 
> How do folks feel about the above points?

+1
+1

:)
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-04 Thread Jeff Jirsa
Earlier than I’d have personally picked, but I’m +1 too



-- 
Jeff Jirsa


> On Apr 4, 2018, at 5:06 PM, Nate McCall  wrote:
> 
> Top-posting as I think this summary is on point - thanks, Scott! (And
> great to have you back, btw).
> 
> It feels to me like we are coalescing on two points:
> 1. June 1 as a freeze for alpha
> 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> implied by such before a GA)
> 
> How do folks feel about the above points?
> 
> 
>> Re-raising a point made earlier in the thread by Jeff and affirmed by Josh:
>> 
>> –––
>> Jeff:
 A hard date for a feature freeze makes sense, a hard date for a release
 does not.
>> 
>> Josh:
>>> Strongly agree. We should also collectively define what "Done" looks like
>>> post freeze so we don't end up in bike-shedding hell like we have in the
>>> past.
>> –––
>> 
>> Another way of saying this: ensuring that the 4.0 release is of high quality 
>> is more important than cutting the release on a specific date.
>> 
>> If we adopt Sylvain's suggestion of freezing features on a "feature 
>> complete" date (modulo a "definition of done" as Josh suggested), that will 
>> help us align toward the polish, performance work, and dog-fooding needed to 
>> feel great about shipping 4.0. It's a good time to start thinking about the 
>> approaches to testing, profiling, and dog-fooding various contributors will 
>> want to take on before release.
>> 
>> I love how Ben put it:
>> 
>>> An "exciting" 4.0 release to me is one that is stable and usable
>>> with no perf regressions on day 1 and includes some of the big
>>> internal changes mentioned previously.
>>> 
>>> This will set the community up well for some awesome and exciting
>>> stuff that will still be in the pipeline if it doesn't make it to 4.0.
>> 
>> That sounds great to me, too.
>> 
>> – Scott
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-04 Thread Jon Haddad
+1, well said Scott.

> On Apr 4, 2018, at 5:13 PM, Jonathan Ellis  wrote:
> 
> On Wed, Apr 4, 2018, 7:06 PM Nate McCall  wrote:
> 
>> Top-posting as I think this summary is on point - thanks, Scott! (And
>> great to have you back, btw).
>> 
>> It feels to me like we are coalescing on two points:
>> 1. June 1 as a freeze for alpha
>> 2. "Stable" is the new "Exciting" (and the testing and dogfooding
>> implied by such before a GA)
>> 
>> How do folks feel about the above points?
>> 
> 
> +1
> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-04 Thread Jonathan Ellis
On Wed, Apr 4, 2018, 7:06 PM Nate McCall  wrote:

> Top-posting as I think this summary is on point - thanks, Scott! (And
> great to have you back, btw).
>
> It feels to me like we are coalescing on two points:
> 1. June 1 as a freeze for alpha
> 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> implied by such before a GA)
>
> How do folks feel about the above points?
>

+1

>


RE: Roadmap for 4.0

2018-04-04 Thread Kenneth Brotman
So in a vibrant community like ours, each year it is reasonable to expect
that some new features will be ready.  We probably can't predict far in
advance which ones.  So each year whatever is ready, is included in that
year's major release (using a 12 month cycle as an example) and then no one
is rushing to meet a deadline because they are holding up a version, because
that feature will just be able to go in the next version.  No pressure; and
no energy discussing how we will approach each version.  That's a lot of
high caliber talent we are consuming.

Kenneth Brotman

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, April 04, 2018 4:51 PM
To: dev@cassandra.apache.org
Subject: RE: Roadmap for 4.0

The group seems to be trying to find a set of features that will define
version 4.0.  I'm saying that makes things way too complicated.  You'll
drift, time will go by, no release because of this or that.  I'm saying
instead, accept that you can't know the time frame really that it will take
to properly develop and test a feature.  You won't want to release it until
it's ready.  So take the pressure and complication out of it.

Every once in a while a nice set of features will be ready and should not be
delayed when they are.  That's when you have a new major release.  Let it
happen more naturally instead of forcing it.

Kenneth Brotman

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID]
Sent: Wednesday, April 04, 2018 4:23 PM
To: dev@cassandra.apache.org
Subject: RE: Roadmap for 4.0

I wouldn't want to add anything to a release that isn't ready.  Whatever
isn't ready can go in a future release. 

-Original Message-
From: Scott Andreas [mailto:sc...@paradoxica.net]
Sent: Wednesday, April 04, 2018 4:18 PM
To: dev@cassandra.apache.org
Subject: Re: Roadmap for 4.0

Re-raising a point made earlier in the thread by Jeff and affirmed by Josh:

---
Jeff:
>> A hard date for a feature freeze makes sense, a hard date for a 
>> release does not.

Josh:
> Strongly agree. We should also collectively define what "Done" looks 
> like post freeze so we don't end up in bike-shedding hell like we have 
> in the past.
---

Another way of saying this: ensuring that the 4.0 release is of high quality
is more important than cutting the release on a specific date.

If we adopt Sylvain's suggestion of freezing features on a "feature
complete" date (modulo a "definition of done" as Josh suggested), that will
help us align toward the polish, performance work, and dog-fooding needed to
feel great about shipping 4.0. It's a good time to start thinking about the
approaches to testing, profiling, and dog-fooding various contributors will
want to take on before release.

I love how Ben put it:

> An "exciting" 4.0 release to me is one that is stable and usable with 
> no perf regressions on day 1 and includes some of the big internal 
> changes mentioned previously.
>
> This will set the community up well for some awesome and exciting 
> stuff that will still be in the pipeline if it doesn't make it to 4.0.

That sounds great to me, too.

- Scott


From: Kenneth Brotman 
Sent: Wednesday, April 4, 2018 2:20:59 PM
To: dev@cassandra.apache.org
Subject: RE: Roadmap for 4.0

Focusing on 4.0 release then, lets agree on a date next year. Whatever is
ready for release by that date is what will be in that release.

Kenneth Brotman

-Original Message-
From: Nate McCall [mailto:zznat...@gmail.com]
Sent: Wednesday, April 04, 2018 12:59 PM
To: dev
Subject: Re: Roadmap for 4.0

On Thu, Apr 5, 2018 at 3:26 AM, Kenneth Brotman
 wrote:
> Can I suggest a way of defining the next few progressions as a way of
approaching this?
>
> How about something like this:
> Version 4.0:  A major release of as many improvements to the 
> code
as can be ready for a release on a date sometime next year ;to be decided on
by us this month.
> Versions 4.x: minor releases about every three months starting
after a major release with improvements to the code that can be ready for
release, with bug fixes as needed done in between.
> Version: 5.0: a major release of whatever significant 
> improvements
are ready for release one year after the release of 4.0
> Versions 5.x: minor releases about every three months with
improvements, with bug fixes as needed done in between,
> And so on:
> A Major release every 12 months of whatever can be 
> ready
for release in that major version,
> A minor release every 3 months of whatever can be 
> ready
for release in that minor version.
> Bug fixes as needed.
>
> The folks working on code could then get an idea of when their code 
> would
be ready for release version-wise.
>
> Kenneth Brotman

Hi Kenneth,
Appreciate the input, but this is quite a 

Re: Roadmap for 4.0

2018-04-04 Thread Nate McCall
Top-posting as I think this summary is on point - thanks, Scott! (And
great to have you back, btw).

It feels to me like we are coalescing on two points:
1. June 1 as a freeze for alpha
2. "Stable" is the new "Exciting" (and the testing and dogfooding
implied by such before a GA)

How do folks feel about the above points?


> Re-raising a point made earlier in the thread by Jeff and affirmed by Josh:
>
> –––
> Jeff:
>>> A hard date for a feature freeze makes sense, a hard date for a release
>>> does not.
>
> Josh:
>> Strongly agree. We should also collectively define what "Done" looks like
>> post freeze so we don't end up in bike-shedding hell like we have in the
>> past.
> –––
>
> Another way of saying this: ensuring that the 4.0 release is of high quality 
> is more important than cutting the release on a specific date.
>
> If we adopt Sylvain's suggestion of freezing features on a "feature complete" 
> date (modulo a "definition of done" as Josh suggested), that will help us 
> align toward the polish, performance work, and dog-fooding needed to feel 
> great about shipping 4.0. It's a good time to start thinking about the 
> approaches to testing, profiling, and dog-fooding various contributors will 
> want to take on before release.
>
> I love how Ben put it:
>
>> An "exciting" 4.0 release to me is one that is stable and usable
>> with no perf regressions on day 1 and includes some of the big
>> internal changes mentioned previously.
>>
>> This will set the community up well for some awesome and exciting
>> stuff that will still be in the pipeline if it doesn't make it to 4.0.
>
> That sounds great to me, too.
>
> – Scott

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: Roadmap for 4.0

2018-04-04 Thread Kenneth Brotman
The group seems to be trying to find a set of features that will define
version 4.0.  I'm saying that makes things way too complicated.  You'll
drift, time will go by, no release because of this or that.  I'm saying
instead, accept that you can't know the time frame really that it will take
to properly develop and test a feature.  You won't want to release it until
it's ready.  So take the pressure and complication out of it.

Every once in a while a nice set of features will be ready and should not be
delayed when they are.  That's when you have a new major release.  Let it
happen more naturally instead of forcing it.

Kenneth Brotman

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, April 04, 2018 4:23 PM
To: dev@cassandra.apache.org
Subject: RE: Roadmap for 4.0

I wouldn't want to add anything to a release that isn't ready.  Whatever
isn't ready can go in a future release. 

-Original Message-
From: Scott Andreas [mailto:sc...@paradoxica.net]
Sent: Wednesday, April 04, 2018 4:18 PM
To: dev@cassandra.apache.org
Subject: Re: Roadmap for 4.0

Re-raising a point made earlier in the thread by Jeff and affirmed by Josh:

---
Jeff:
>> A hard date for a feature freeze makes sense, a hard date for a 
>> release does not.

Josh:
> Strongly agree. We should also collectively define what "Done" looks 
> like post freeze so we don't end up in bike-shedding hell like we have 
> in the past.
---

Another way of saying this: ensuring that the 4.0 release is of high quality
is more important than cutting the release on a specific date.

If we adopt Sylvain's suggestion of freezing features on a "feature
complete" date (modulo a "definition of done" as Josh suggested), that will
help us align toward the polish, performance work, and dog-fooding needed to
feel great about shipping 4.0. It's a good time to start thinking about the
approaches to testing, profiling, and dog-fooding various contributors will
want to take on before release.

I love how Ben put it:

> An "exciting" 4.0 release to me is one that is stable and usable with 
> no perf regressions on day 1 and includes some of the big internal 
> changes mentioned previously.
>
> This will set the community up well for some awesome and exciting 
> stuff that will still be in the pipeline if it doesn't make it to 4.0.

That sounds great to me, too.

- Scott


From: Kenneth Brotman 
Sent: Wednesday, April 4, 2018 2:20:59 PM
To: dev@cassandra.apache.org
Subject: RE: Roadmap for 4.0

Focusing on 4.0 release then, lets agree on a date next year. Whatever is
ready for release by that date is what will be in that release.

Kenneth Brotman

-Original Message-
From: Nate McCall [mailto:zznat...@gmail.com]
Sent: Wednesday, April 04, 2018 12:59 PM
To: dev
Subject: Re: Roadmap for 4.0

On Thu, Apr 5, 2018 at 3:26 AM, Kenneth Brotman
 wrote:
> Can I suggest a way of defining the next few progressions as a way of
approaching this?
>
> How about something like this:
> Version 4.0:  A major release of as many improvements to the 
> code
as can be ready for a release on a date sometime next year ;to be decided on
by us this month.
> Versions 4.x: minor releases about every three months starting
after a major release with improvements to the code that can be ready for
release, with bug fixes as needed done in between.
> Version: 5.0: a major release of whatever significant 
> improvements
are ready for release one year after the release of 4.0
> Versions 5.x: minor releases about every three months with
improvements, with bug fixes as needed done in between,
> And so on:
> A Major release every 12 months of whatever can be 
> ready
for release in that major version,
> A minor release every 3 months of whatever can be 
> ready
for release in that minor version.
> Bug fixes as needed.
>
> The folks working on code could then get an idea of when their code 
> would
be ready for release version-wise.
>
> Kenneth Brotman

Hi Kenneth,
Appreciate the input, but this is quite a well-trodden path of discussion.
Please see the following two (lengthy) threads from last year for
background:

https://lists.apache.org/thread.html/f7e1fa12ea2fb9c3eb366a04dfd7cab5d0d64eb
9f4057ad65bd62ace@%3Cdev.cassandra.apache.org%3E
https://lists.apache.org/thread.html/684b559bf27b9deca0be0dd9629e6cd1fff5644
598180f950ff4f478@%3Cdev.cassandra.apache.org%3E

Let's focus this thread on 4.0 release.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

RE: Roadmap for 4.0

2018-04-04 Thread Kenneth Brotman
I wouldn't want to add anything to a release that isn't ready.  Whatever
isn't ready can go in a future release. 

-Original Message-
From: Scott Andreas [mailto:sc...@paradoxica.net] 
Sent: Wednesday, April 04, 2018 4:18 PM
To: dev@cassandra.apache.org
Subject: Re: Roadmap for 4.0

Re-raising a point made earlier in the thread by Jeff and affirmed by Josh:

---
Jeff:
>> A hard date for a feature freeze makes sense, a hard date for a 
>> release does not.

Josh:
> Strongly agree. We should also collectively define what "Done" looks 
> like post freeze so we don't end up in bike-shedding hell like we have 
> in the past.
---

Another way of saying this: ensuring that the 4.0 release is of high quality
is more important than cutting the release on a specific date.

If we adopt Sylvain's suggestion of freezing features on a "feature
complete" date (modulo a "definition of done" as Josh suggested), that will
help us align toward the polish, performance work, and dog-fooding needed to
feel great about shipping 4.0. It's a good time to start thinking about the
approaches to testing, profiling, and dog-fooding various contributors will
want to take on before release.

I love how Ben put it:

> An "exciting" 4.0 release to me is one that is stable and usable with 
> no perf regressions on day 1 and includes some of the big internal 
> changes mentioned previously.
>
> This will set the community up well for some awesome and exciting 
> stuff that will still be in the pipeline if it doesn't make it to 4.0.

That sounds great to me, too.

- Scott


From: Kenneth Brotman 
Sent: Wednesday, April 4, 2018 2:20:59 PM
To: dev@cassandra.apache.org
Subject: RE: Roadmap for 4.0

Focusing on 4.0 release then, lets agree on a date next year. Whatever is
ready for release by that date is what will be in that release.

Kenneth Brotman

-Original Message-
From: Nate McCall [mailto:zznat...@gmail.com]
Sent: Wednesday, April 04, 2018 12:59 PM
To: dev
Subject: Re: Roadmap for 4.0

On Thu, Apr 5, 2018 at 3:26 AM, Kenneth Brotman
 wrote:
> Can I suggest a way of defining the next few progressions as a way of
approaching this?
>
> How about something like this:
> Version 4.0:  A major release of as many improvements to the code
as can be ready for a release on a date sometime next year ;to be decided on
by us this month.
> Versions 4.x: minor releases about every three months starting
after a major release with improvements to the code that can be ready for
release, with bug fixes as needed done in between.
> Version: 5.0: a major release of whatever significant improvements
are ready for release one year after the release of 4.0
> Versions 5.x: minor releases about every three months with
improvements, with bug fixes as needed done in between,
> And so on:
> A Major release every 12 months of whatever can be ready
for release in that major version,
> A minor release every 3 months of whatever can be ready
for release in that minor version.
> Bug fixes as needed.
>
> The folks working on code could then get an idea of when their code would
be ready for release version-wise.
>
> Kenneth Brotman

Hi Kenneth,
Appreciate the input, but this is quite a well-trodden path of discussion.
Please see the following two (lengthy) threads from last year for
background:

https://lists.apache.org/thread.html/f7e1fa12ea2fb9c3eb366a04dfd7cab5d0d64eb
9f4057ad65bd62ace@%3Cdev.cassandra.apache.org%3E
https://lists.apache.org/thread.html/684b559bf27b9deca0be0dd9629e6cd1fff5644
598180f950ff4f478@%3Cdev.cassandra.apache.org%3E

Let's focus this thread on 4.0 release.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-04 Thread Scott Andreas
Re-raising a point made earlier in the thread by Jeff and affirmed by Josh:

–––
Jeff:
>> A hard date for a feature freeze makes sense, a hard date for a release
>> does not.

Josh:
> Strongly agree. We should also collectively define what "Done" looks like
> post freeze so we don't end up in bike-shedding hell like we have in the
> past.
–––

Another way of saying this: ensuring that the 4.0 release is of high quality is 
more important than cutting the release on a specific date.

If we adopt Sylvain's suggestion of freezing features on a "feature complete" 
date (modulo a "definition of done" as Josh suggested), that will help us align 
toward the polish, performance work, and dog-fooding needed to feel great about 
shipping 4.0. It's a good time to start thinking about the approaches to 
testing, profiling, and dog-fooding various contributors will want to take on 
before release.

I love how Ben put it:

> An "exciting" 4.0 release to me is one that is stable and usable
> with no perf regressions on day 1 and includes some of the big
> internal changes mentioned previously.
>
> This will set the community up well for some awesome and exciting
> stuff that will still be in the pipeline if it doesn't make it to 4.0.

That sounds great to me, too.

– Scott


From: Kenneth Brotman 
Sent: Wednesday, April 4, 2018 2:20:59 PM
To: dev@cassandra.apache.org
Subject: RE: Roadmap for 4.0

Focusing on 4.0 release then, lets agree on a date next year. Whatever is ready 
for release by that date is what will be in that release.

Kenneth Brotman

-Original Message-
From: Nate McCall [mailto:zznat...@gmail.com]
Sent: Wednesday, April 04, 2018 12:59 PM
To: dev
Subject: Re: Roadmap for 4.0

On Thu, Apr 5, 2018 at 3:26 AM, Kenneth Brotman  
wrote:
> Can I suggest a way of defining the next few progressions as a way of 
> approaching this?
>
> How about something like this:
> Version 4.0:  A major release of as many improvements to the code as 
> can be ready for a release on a date sometime next year ;to be decided on by 
> us this month.
> Versions 4.x: minor releases about every three months starting after 
> a major release with improvements to the code that can be ready for release, 
> with bug fixes as needed done in between.
> Version: 5.0: a major release of whatever significant improvements 
> are ready for release one year after the release of 4.0
> Versions 5.x: minor releases about every three months with 
> improvements, with bug fixes as needed done in between,
> And so on:
> A Major release every 12 months of whatever can be ready for 
> release in that major version,
> A minor release every 3 months of whatever can be ready for 
> release in that minor version.
> Bug fixes as needed.
>
> The folks working on code could then get an idea of when their code would be 
> ready for release version-wise.
>
> Kenneth Brotman

Hi Kenneth,
Appreciate the input, but this is quite a well-trodden path of discussion. 
Please see the following two (lengthy) threads from last year for background:

https://lists.apache.org/thread.html/f7e1fa12ea2fb9c3eb366a04dfd7cab5d0d64eb9f4057ad65bd62ace@%3Cdev.cassandra.apache.org%3E
https://lists.apache.org/thread.html/684b559bf27b9deca0be0dd9629e6cd1fff5644598180f950ff4f478@%3Cdev.cassandra.apache.org%3E

Let's focus this thread on 4.0 release.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: Roadmap for 4.0

2018-04-04 Thread Kenneth Brotman
Focusing on 4.0 release then, lets agree on a date next year. Whatever is ready 
for release by that date is what will be in that release.  

Kenneth Brotman

-Original Message-
From: Nate McCall [mailto:zznat...@gmail.com] 
Sent: Wednesday, April 04, 2018 12:59 PM
To: dev
Subject: Re: Roadmap for 4.0

On Thu, Apr 5, 2018 at 3:26 AM, Kenneth Brotman  
wrote:
> Can I suggest a way of defining the next few progressions as a way of 
> approaching this?
>
> How about something like this:
> Version 4.0:  A major release of as many improvements to the code as 
> can be ready for a release on a date sometime next year ;to be decided on by 
> us this month.
> Versions 4.x: minor releases about every three months starting after 
> a major release with improvements to the code that can be ready for release, 
> with bug fixes as needed done in between.
> Version: 5.0: a major release of whatever significant improvements 
> are ready for release one year after the release of 4.0
> Versions 5.x: minor releases about every three months with 
> improvements, with bug fixes as needed done in between,
> And so on:
> A Major release every 12 months of whatever can be ready for 
> release in that major version,
> A minor release every 3 months of whatever can be ready for 
> release in that minor version.
> Bug fixes as needed.
>
> The folks working on code could then get an idea of when their code would be 
> ready for release version-wise.
>
> Kenneth Brotman

Hi Kenneth,
Appreciate the input, but this is quite a well-trodden path of discussion. 
Please see the following two (lengthy) threads from last year for background:

https://lists.apache.org/thread.html/f7e1fa12ea2fb9c3eb366a04dfd7cab5d0d64eb9f4057ad65bd62ace@%3Cdev.cassandra.apache.org%3E
https://lists.apache.org/thread.html/684b559bf27b9deca0be0dd9629e6cd1fff5644598180f950ff4f478@%3Cdev.cassandra.apache.org%3E

Let's focus this thread on 4.0 release.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-04 Thread Nate McCall
On Thu, Apr 5, 2018 at 3:26 AM, Kenneth Brotman
 wrote:
> Can I suggest a way of defining the next few progressions as a way of 
> approaching this?
>
> How about something like this:
> Version 4.0:  A major release of as many improvements to the code as 
> can be ready for a release on a date sometime next year ;to be decided on by 
> us this month.
> Versions 4.x: minor releases about every three months starting after 
> a major release with improvements to the code that can be ready for release, 
> with bug fixes as needed done in between.
> Version: 5.0: a major release of whatever significant improvements 
> are ready for release one year after the release of 4.0
> Versions 5.x: minor releases about every three months with 
> improvements, with bug fixes as needed done in between,
> And so on:
> A Major release every 12 months of whatever can be ready for 
> release in that major version,
> A minor release every 3 months of whatever can be ready for 
> release in that minor version.
> Bug fixes as needed.
>
> The folks working on code could then get an idea of when their code would be 
> ready for release version-wise.
>
> Kenneth Brotman

Hi Kenneth,
Appreciate the input, but this is quite a well-trodden path of
discussion. Please see the following two (lengthy) threads from last
year for background:

https://lists.apache.org/thread.html/f7e1fa12ea2fb9c3eb366a04dfd7cab5d0d64eb9f4057ad65bd62ace@%3Cdev.cassandra.apache.org%3E
https://lists.apache.org/thread.html/684b559bf27b9deca0be0dd9629e6cd1fff5644598180f950ff4f478@%3Cdev.cassandra.apache.org%3E

Let's focus this thread on 4.0 release.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: Roadmap for 4.0

2018-04-04 Thread Kenneth Brotman
Can I suggest a way of defining the next few progressions as a way of 
approaching this? 

How about something like this:
Version 4.0:  A major release of as many improvements to the code as 
can be ready for a release on a date sometime next year ;to be decided on by us 
this month. 
Versions 4.x: minor releases about every three months starting after a 
major release with improvements to the code that can be ready for release, with 
bug fixes as needed done in between.
Version: 5.0: a major release of whatever significant improvements are 
ready for release one year after the release of 4.0
Versions 5.x: minor releases about every three months with 
improvements, with bug fixes as needed done in between,
And so on: 
A Major release every 12 months of whatever can be ready for 
release in that major version,
A minor release every 3 months of whatever can be ready for 
release in that minor version.
Bug fixes as needed.

The folks working on code could then get an idea of when their code would be 
ready for release version-wise.

Kenneth Brotman

-Original Message-
From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Wednesday, April 04, 2018 8:00 AM
To: dev@cassandra.apache.org
Subject: Re: Roadmap for 4.0

Agreed with Josh.  There’s nothing set in stone after we release 4.0, trying to 
extrapolate what we do here for the rest of humanity’s timeline isn’t going to 
be a useful exercise.

Regarding building a big list - it’s of no value.  In fact, if we’re already 
talking about releasing 4.0 we should really only be merging in small features 
that enhance user experience like improving nodetool output or reasonable 
optimizations.  Merging in big features at the very end of the merge window is 
a really great idea to have dozens of follow up bug fix releases that nobody 
considers stable, where the Coli conjecture always wins.  IMO, it would be 
better / more responsible to merge them into trunk *after* we branch for 4.0. 
Yes, that makes the next release less exciting, but I really don’t think 
“exciting” is what we’re shooting for.  I’m more in favor of stable.

Regarding supporting 3.0 / 3.11, since we’re talking about feature freezing 4.0 
2 months from now, and releasing it *sometime* after that, then add 6 months, 
we’re talking about close to an extra year of 3.0 support.  People are, of 
course, free to continue patching 3.0, back porting fixes, etc, but I am 
completely OK with saying there’s only 9 more months of support starting today.

I’m also in the go straight to 3.11 camp.  I see no reason to upgrade to only 
3.0 if you’re on 2.x.  

Jon

> On Apr 4, 2018, at 6:29 AM, Josh McKenzie  wrote:
> 
>> 
>> This discussion was always about the release strategy. There is no 
>> separation between the release strategy for 4.0 and the release 
>> strategy for the project, they are the same thing and what is 
>> intended to be discussed here.
> 
> Not trying to be pedantic here, but the email thread is titled 
> "Roadmap for 4.0" and has been concerned with how we get 4.0 out the 
> door. I don't think it's implicit that whatever strategy we settle on 
> for 4.0 is intended to apply to subsequent releases, since the 3.0.X 
> to 3.X to 4.0 relationship/delta is different than a 4.0 to 5.0 can be 
> expected to be.
> 
> 
>> sidenote: 3.10 was released in January 2017, and while the changes 
>> list for
>> 4.0 is getting quite large there's not much there that's going to win 
>> over users. It's mostly refactorings and improvements that affect 
>> developers more so than users.
> 
> If you assume most 3. users are on 3.10, this argument makes sense. I 
> believe a majority are on 3.0.X or 2.1/2.2, which leaves a minority 
> looking at the small delta from 3.10 to 4.0 in the current form.
> 
> 
> 
> On Wed, Apr 4, 2018 at 8:25 AM, kurt greaves  wrote:
> 
>>> 
>>> I'm also a bit sad that we seem to be getting back to our old demons 
>>> of trying to shove as much as we possibly can in the next major as 
>>> if having a feature miss it means it will never happen.
>> 
>> That wasn't the intention of this thread, but that's the response I got.
>> Thought I made it pretty clear that this was about compiling a list 
>> of things that people are currently working on and can commit to 
>> getting finished soon (which should be a relatively small list 
>> considering the limited number of full time contributors).
>> 
>> Of course, we should probably (re-(re-(re-)))start a discussion on 
>> release
>>> "strategy" in parallel because it doesn't seem we have one right 
>>> now, but that's imo a discussion we should keep separate.
>> 
>> This discussion was always about the release strategy. There is no 
>> separation between the release strategy for 4.0 and the release 
>> strategy for the project, they are the same thing and what is 
>> intended to 

Re: Roadmap for 4.0

2018-04-04 Thread Ben Bromhead
+1 to what Jon and Josh said.

At this point in time, an "exciting" 4.0 release to me is one that is
stable and usable with no perf regressions on day 1 and includes some of
the big internal changes mentioned previously.

This will set the community up well for some awesome and exciting stuff
that will still be in the pipeline if it doesn't make it to 4.0.

On Wed, Apr 4, 2018 at 11:00 AM Jon Haddad  wrote:

> Agreed with Josh.  There’s nothing set in stone after we release 4.0,
> trying to extrapolate what we do here for the rest of humanity’s timeline
> isn’t going to be a useful exercise.
>
> Regarding building a big list - it’s of no value.  In fact, if we’re
> already talking about releasing 4.0 we should really only be merging in
> small features that enhance user experience like improving nodetool output
> or reasonable optimizations.  Merging in big features at the very end of
> the merge window is a really great idea to have dozens of follow up bug fix
> releases that nobody considers stable, where the Coli conjecture always
> wins.  IMO, it would be better / more responsible to merge them into trunk
> *after* we branch for 4.0. Yes, that makes the next release less exciting,
> but I really don’t think “exciting” is what we’re shooting for.  I’m more
> in favor of stable.
>
> Regarding supporting 3.0 / 3.11, since we’re talking about feature
> freezing 4.0 2 months from now, and releasing it *sometime* after that,
> then add 6 months, we’re talking about close to an extra year of 3.0
> support.  People are, of course, free to continue patching 3.0, back
> porting fixes, etc, but I am completely OK with saying there’s only 9 more
> months of support starting today.
>
> I’m also in the go straight to 3.11 camp.  I see no reason to upgrade to
> only 3.0 if you’re on 2.x.
>
> Jon
>
> > On Apr 4, 2018, at 6:29 AM, Josh McKenzie  wrote:
> >
> >>
> >> This discussion was always about the release strategy. There is no
> >> separation between the release strategy for 4.0 and the release strategy
> >> for the project, they are the same thing and what is intended to be
> >> discussed here.
> >
> > Not trying to be pedantic here, but the email thread is titled "Roadmap
> for
> > 4.0" and has been concerned with how we get 4.0 out the door. I don't
> think
> > it's implicit that whatever strategy we settle on for 4.0 is intended to
> > apply to subsequent releases, since the 3.0.X to 3.X to 4.0
> > relationship/delta is different than a 4.0 to 5.0 can be expected to be.
> >
> >
> >> sidenote: 3.10 was released in January 2017, and while the changes list
> for
> >> 4.0 is getting quite large there's not much there that's going to win
> over
> >> users. It's mostly refactorings and improvements that affect developers
> >> more so than users.
> >
> > If you assume most 3. users are on 3.10, this argument makes sense. I
> > believe a majority are on 3.0.X or 2.1/2.2, which leaves a minority
> looking
> > at the small delta from 3.10 to 4.0 in the current form.
> >
> >
> >
> > On Wed, Apr 4, 2018 at 8:25 AM, kurt greaves 
> wrote:
> >
> >>>
> >>> I'm also a bit sad that we seem to be getting back to our old demons of
> >>> trying
> >>> to shove as much as we possibly can in the next major as if having a
> >>> feature
> >>> miss it means it will never happen.
> >>
> >> That wasn't the intention of this thread, but that's the response I got.
> >> Thought I made it pretty clear that this was about compiling a list of
> >> things that people are currently working on and can commit to getting
> >> finished soon (which should be a relatively small list considering the
> >> limited number of full time contributors).
> >>
> >> Of course, we should probably (re-(re-(re-)))start a discussion on
> release
> >>> "strategy" in parallel because it doesn't seem we have one right now,
> but
> >>> that's imo a discussion we should keep separate.
> >>
> >> This discussion was always about the release strategy. There is no
> >> separation between the release strategy for 4.0 and the release strategy
> >> for the project, they are the same thing and what is intended to be
> >> discussed here. I don't think it's possible to have a separate
> discussion
> >> on these two things as the release strategy has a pretty big influence
> on
> >> how 4.0 is released.
> >>
> >> I'm all for a feature freeze and KISS, but I feel that this really
> needs a
> >> bit more thought before we just jump in and set another precedent for
> >> future releases. IMO the Cassandra project has had a seriously bad track
> >> record of releasing major versions in the past, and we should probably
> work
> >> at resolving that properly, rather than just continuing the current
> "let's
> >> just try something new every time without really thinking about it".
> >>
> >> Some points:
> >>
> >>   1.  This strategy means that we don't care about what improvements
> >>   actually make it into any given major 

Re: Repair scheduling tools

2018-04-04 Thread Ben Bromhead
+1 to including the implementation in Cassandra itself. Makes managed
repair a first-class citizen, it nicely rounds out Cassandra's consistency
story and makes it 1000x more likely that repairs will get run.




On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad  wrote:

> Implementation details aside, I’m firmly in the “it would be nice of C*
> could take care of it” camp.  Reaper is pretty damn easy to use and people
> *still* don’t put it in prod.
>
>
> > On Apr 4, 2018, at 4:16 AM, Rahul Singh 
> wrote:
> >
> > I understand the merits of both approaches. In working with other DBs In
> the “old country” of SQL, we often had to write indexing sequences manually
> for important tables. It was “built into the product” but in order to
> leverage the maximum benefits of indices we had to have different indices
> other than the clustered (physical index). The process still sucked. It’s
> never perfect.
> >
> > The JVM is already fraught with GC issues and putting another process
> being managed in the same heapspace is what I’m worried about. Technically
> the process could be in the same binary but started as a side Car or in the
> same main process.
> >
> > Consider a process called “cassandra-agent” that’s sitting around with a
> scheduler based on config or a Cassandra table. Distributed in the same
> release. Shell / service scripts would start it. The end user knows it only
> by examining the .sh files. This opens possibilities of including a GUI
> hosted in the same process without cluttering the core coolness of
> Cassandra.
> >
> > Best,
> >
> > --
> > Rahul Singh
> > rahul.si...@anant.us
> >
> > Anant Corporation
> >
> > On Apr 4, 2018, 2:50 AM -0400, Dor Laor , wrote:
> >> We at Scylla, implemented repair in a similar way to the Cassandra
> reaper.
> >> We do
> >> that using an external application, written in go that manages repair
> for
> >> multiple clusters
> >> and saves the data in an external Scylla cluster. The logic resembles
> the
> >> reaper one with
> >> some specific internal sharding optimizations and uses the Scylla rest
> api.
> >>
> >> However, I have doubts it's the ideal way. After playing a bit with
> >> CockroachDB, I realized
> >> it's super nice to have a single binary that repairs itself, provides a
> GUI
> >> and is the core DB.
> >>
> >> Even while distributed, you can elect a leader node to manage the
> repair in
> >> a consistent
> >> way so the complexity can be reduced to a minimum. Repair can write its
> >> status to the
> >> system tables and to provide an api for progress, rate control, etc.
> >>
> >> The big advantage for repair to embedded in the core is that there is no
> >> need to expose
> >> internal state to the repair logic. So an external program doesn't need
> to
> >> deal with different
> >> version of Cassandra, different repair capabilities of the core (such as
> >> incremental on/off)
> >> and so forth. A good database should schedule its own repair, it knows
> >> whether the shreshold
> >> of hintedhandoff was cross or not, it knows whether nodes where
> replaced,
> >> etc,
> >>
> >> My 2 cents. Dor
> >>
> >> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
> >> dinesh.jo...@yahoo.com.invalid> wrote:
> >>
> >>> Simon,
> >>> You could still do load aware repair outside of the main process by
> >>> reading Cassandra's metrics.
> >>> In general, I don't think the maintenance tasks necessarily need to
> live
> >>> in the main process. They could negatively impact the read / write
> path.
> >>> Unless strictly required by the serving path, it could live in a
> sidecar
> >>> process. There are multiple benefits including isolation, faster
> iteration,
> >>> loose coupling. For example - this would mean that the maintenance
> tasks
> >>> can have a different gc profile than the main process and it would be
> ok.
> >>> Today that is not the case.
> >>> The only issue I see is that the project does not provide an official
> >>> sidecar. Perhaps there should be one. We probably would've not had to
> have
> >>> this discussion ;)
> >>> Dinesh
> >>>
> >>> On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou <
> >>> zhouqing...@gmail.com> wrote:
> >>>
> >>> Repair has been a problem for us at Uber. In general I'm in favor of
> >>> including the scheduling logic in Cassandra daemon. It has the benefit
> of
> >>> introducing something like load-aware repair, eg, only schedule repair
> >>> while no ongoing compaction or traffic is low, etc. As proposed by
> others,
> >>> we can expose keyspace/table-level configurations so that users can
> opt-in.
> >>> Regarding the risk, yes there will be problems at the beginning but in
> the
> >>> long run, users will appreciate that repair works out of the box, just
> like
> >>> compaction. We have large Cassandra deployments and can work with
> Netflix
> >>> folks for intensive testing to boost user confidence.
> >>>
> >>> On the other hand, have we looked into 

Re: Roadmap for 4.0

2018-04-04 Thread Jon Haddad
Agreed with Josh.  There’s nothing set in stone after we release 4.0, trying to 
extrapolate what we do here for the rest of humanity’s timeline isn’t going to 
be a useful exercise.

Regarding building a big list - it’s of no value.  In fact, if we’re already 
talking about releasing 4.0 we should really only be merging in small features 
that enhance user experience like improving nodetool output or reasonable 
optimizations.  Merging in big features at the very end of the merge window is 
a really great idea to have dozens of follow up bug fix releases that nobody 
considers stable, where the Coli conjecture always wins.  IMO, it would be 
better / more responsible to merge them into trunk *after* we branch for 4.0. 
Yes, that makes the next release less exciting, but I really don’t think 
“exciting” is what we’re shooting for.  I’m more in favor of stable.

Regarding supporting 3.0 / 3.11, since we’re talking about feature freezing 4.0 
2 months from now, and releasing it *sometime* after that, then add 6 months, 
we’re talking about close to an extra year of 3.0 support.  People are, of 
course, free to continue patching 3.0, back porting fixes, etc, but I am 
completely OK with saying there’s only 9 more months of support starting today.

I’m also in the go straight to 3.11 camp.  I see no reason to upgrade to only 
3.0 if you’re on 2.x.  

Jon

> On Apr 4, 2018, at 6:29 AM, Josh McKenzie  wrote:
> 
>> 
>> This discussion was always about the release strategy. There is no
>> separation between the release strategy for 4.0 and the release strategy
>> for the project, they are the same thing and what is intended to be
>> discussed here.
> 
> Not trying to be pedantic here, but the email thread is titled "Roadmap for
> 4.0" and has been concerned with how we get 4.0 out the door. I don't think
> it's implicit that whatever strategy we settle on for 4.0 is intended to
> apply to subsequent releases, since the 3.0.X to 3.X to 4.0
> relationship/delta is different than a 4.0 to 5.0 can be expected to be.
> 
> 
>> sidenote: 3.10 was released in January 2017, and while the changes list for
>> 4.0 is getting quite large there's not much there that's going to win over
>> users. It's mostly refactorings and improvements that affect developers
>> more so than users.
> 
> If you assume most 3. users are on 3.10, this argument makes sense. I
> believe a majority are on 3.0.X or 2.1/2.2, which leaves a minority looking
> at the small delta from 3.10 to 4.0 in the current form.
> 
> 
> 
> On Wed, Apr 4, 2018 at 8:25 AM, kurt greaves  wrote:
> 
>>> 
>>> I'm also a bit sad that we seem to be getting back to our old demons of
>>> trying
>>> to shove as much as we possibly can in the next major as if having a
>>> feature
>>> miss it means it will never happen.
>> 
>> That wasn't the intention of this thread, but that's the response I got.
>> Thought I made it pretty clear that this was about compiling a list of
>> things that people are currently working on and can commit to getting
>> finished soon (which should be a relatively small list considering the
>> limited number of full time contributors).
>> 
>> Of course, we should probably (re-(re-(re-)))start a discussion on release
>>> "strategy" in parallel because it doesn't seem we have one right now, but
>>> that's imo a discussion we should keep separate.
>> 
>> This discussion was always about the release strategy. There is no
>> separation between the release strategy for 4.0 and the release strategy
>> for the project, they are the same thing and what is intended to be
>> discussed here. I don't think it's possible to have a separate discussion
>> on these two things as the release strategy has a pretty big influence on
>> how 4.0 is released.
>> 
>> I'm all for a feature freeze and KISS, but I feel that this really needs a
>> bit more thought before we just jump in and set another precedent for
>> future releases. IMO the Cassandra project has had a seriously bad track
>> record of releasing major versions in the past, and we should probably work
>> at resolving that properly, rather than just continuing the current "let's
>> just try something new every time without really thinking about it".
>> 
>> Some points:
>> 
>>   1.  This strategy means that we don't care about what improvements
>>   actually make it into any given major version. This means that we will
>> have
>>   major releases with nothing/very little desirable for users, and thus
>>   little reason to upgrade other than to stay on a supported version (from
>>   experience this isn't terribly important to users of a database). I
>> think
>>   this inevitably leads to supporting more versions than necessary, and in
>>   general a pretty poor experience for users as we spend more time
>> fighting
>>   bugs in production rather than before we do a release (purely because of
>>   increased frequency of releases).
>>   2. We'll always be driven by feature 

Re: Repair scheduling tools

2018-04-04 Thread Jon Haddad
Implementation details aside, I’m firmly in the “it would be nice of C* could 
take care of it” camp.  Reaper is pretty damn easy to use and people *still* 
don’t put it in prod.  


> On Apr 4, 2018, at 4:16 AM, Rahul Singh  wrote:
> 
> I understand the merits of both approaches. In working with other DBs In the 
> “old country” of SQL, we often had to write indexing sequences manually for 
> important tables. It was “built into the product” but in order to leverage 
> the maximum benefits of indices we had to have different indices other than 
> the clustered (physical index). The process still sucked. It’s never perfect.
> 
> The JVM is already fraught with GC issues and putting another process being 
> managed in the same heapspace is what I’m worried about. Technically the 
> process could be in the same binary but started as a side Car or in the same 
> main process.
> 
> Consider a process called “cassandra-agent” that’s sitting around with a 
> scheduler based on config or a Cassandra table. Distributed in the same 
> release. Shell / service scripts would start it. The end user knows it only 
> by examining the .sh files. This opens possibilities of including a GUI 
> hosted in the same process without cluttering the core coolness of Cassandra.
> 
> Best,
> 
> --
> Rahul Singh
> rahul.si...@anant.us
> 
> Anant Corporation
> 
> On Apr 4, 2018, 2:50 AM -0400, Dor Laor , wrote:
>> We at Scylla, implemented repair in a similar way to the Cassandra reaper.
>> We do
>> that using an external application, written in go that manages repair for
>> multiple clusters
>> and saves the data in an external Scylla cluster. The logic resembles the
>> reaper one with
>> some specific internal sharding optimizations and uses the Scylla rest api.
>> 
>> However, I have doubts it's the ideal way. After playing a bit with
>> CockroachDB, I realized
>> it's super nice to have a single binary that repairs itself, provides a GUI
>> and is the core DB.
>> 
>> Even while distributed, you can elect a leader node to manage the repair in
>> a consistent
>> way so the complexity can be reduced to a minimum. Repair can write its
>> status to the
>> system tables and to provide an api for progress, rate control, etc.
>> 
>> The big advantage for repair to embedded in the core is that there is no
>> need to expose
>> internal state to the repair logic. So an external program doesn't need to
>> deal with different
>> version of Cassandra, different repair capabilities of the core (such as
>> incremental on/off)
>> and so forth. A good database should schedule its own repair, it knows
>> whether the shreshold
>> of hintedhandoff was cross or not, it knows whether nodes where replaced,
>> etc,
>> 
>> My 2 cents. Dor
>> 
>> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
>> dinesh.jo...@yahoo.com.invalid> wrote:
>> 
>>> Simon,
>>> You could still do load aware repair outside of the main process by
>>> reading Cassandra's metrics.
>>> In general, I don't think the maintenance tasks necessarily need to live
>>> in the main process. They could negatively impact the read / write path.
>>> Unless strictly required by the serving path, it could live in a sidecar
>>> process. There are multiple benefits including isolation, faster iteration,
>>> loose coupling. For example - this would mean that the maintenance tasks
>>> can have a different gc profile than the main process and it would be ok.
>>> Today that is not the case.
>>> The only issue I see is that the project does not provide an official
>>> sidecar. Perhaps there should be one. We probably would've not had to have
>>> this discussion ;)
>>> Dinesh
>>> 
>>> On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou <
>>> zhouqing...@gmail.com> wrote:
>>> 
>>> Repair has been a problem for us at Uber. In general I'm in favor of
>>> including the scheduling logic in Cassandra daemon. It has the benefit of
>>> introducing something like load-aware repair, eg, only schedule repair
>>> while no ongoing compaction or traffic is low, etc. As proposed by others,
>>> we can expose keyspace/table-level configurations so that users can opt-in.
>>> Regarding the risk, yes there will be problems at the beginning but in the
>>> long run, users will appreciate that repair works out of the box, just like
>>> compaction. We have large Cassandra deployments and can work with Netflix
>>> folks for intensive testing to boost user confidence.
>>> 
>>> On the other hand, have we looked into how other NoSQL databases do repair?
>>> Is there a side car process?
>>> 
>>> 
>>> On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli >> wrote:
>>> 
 Repair is critical for running C* and I agree with Roopa that it needs to
 be part of the offering. I think we should make it easy for new users to
 run C*.
 
 Can we have a side car process which we can add to Apache Cassandra
 offering and we can put this repair their? I am 

Re: Roadmap for 4.0

2018-04-04 Thread Josh McKenzie
>
> This discussion was always about the release strategy. There is no
> separation between the release strategy for 4.0 and the release strategy
> for the project, they are the same thing and what is intended to be
> discussed here.

Not trying to be pedantic here, but the email thread is titled "Roadmap for
4.0" and has been concerned with how we get 4.0 out the door. I don't think
it's implicit that whatever strategy we settle on for 4.0 is intended to
apply to subsequent releases, since the 3.0.X to 3.X to 4.0
relationship/delta is different than a 4.0 to 5.0 can be expected to be.


> sidenote: 3.10 was released in January 2017, and while the changes list for
> 4.0 is getting quite large there's not much there that's going to win over
> users. It's mostly refactorings and improvements that affect developers
> more so than users.

If you assume most 3. users are on 3.10, this argument makes sense. I
believe a majority are on 3.0.X or 2.1/2.2, which leaves a minority looking
at the small delta from 3.10 to 4.0 in the current form.



On Wed, Apr 4, 2018 at 8:25 AM, kurt greaves  wrote:

> >
> > I'm also a bit sad that we seem to be getting back to our old demons of
> > trying
> > to shove as much as we possibly can in the next major as if having a
> > feature
> > miss it means it will never happen.
>
> That wasn't the intention of this thread, but that's the response I got.
> Thought I made it pretty clear that this was about compiling a list of
> things that people are currently working on and can commit to getting
> finished soon (which should be a relatively small list considering the
> limited number of full time contributors).
>
> Of course, we should probably (re-(re-(re-)))start a discussion on release
> > "strategy" in parallel because it doesn't seem we have one right now, but
> > that's imo a discussion we should keep separate.
>
> This discussion was always about the release strategy. There is no
> separation between the release strategy for 4.0 and the release strategy
> for the project, they are the same thing and what is intended to be
> discussed here. I don't think it's possible to have a separate discussion
> on these two things as the release strategy has a pretty big influence on
> how 4.0 is released.
>
> I'm all for a feature freeze and KISS, but I feel that this really needs a
> bit more thought before we just jump in and set another precedent for
> future releases. IMO the Cassandra project has had a seriously bad track
> record of releasing major versions in the past, and we should probably work
> at resolving that properly, rather than just continuing the current "let's
> just try something new every time without really thinking about it".
>
> Some points:
>
>1.  This strategy means that we don't care about what improvements
>actually make it into any given major version. This means that we will
> have
>major releases with nothing/very little desirable for users, and thus
>little reason to upgrade other than to stay on a supported version (from
>experience this isn't terribly important to users of a database). I
> think
>this inevitably leads to supporting more versions than necessary, and in
>general a pretty poor experience for users as we spend more time
> fighting
>bugs in production rather than before we do a release (purely because of
>increased frequency of releases).
>2. We'll always be driven by feature deadlines, which for the most part
>is fine, as long as we handle verification/quality assurance/release
>candidates appropriately. The main problem here though is that we don't
>really know what's going to be in a certain release until we hit the
>freeze, and what's in it may not really make sense at that point in
> time.
>3. We'll pump out major versions fairly regularly and end up with even
>more users that are on EOL versions with complex upgrade paths to get
> to a
>supported version or a version with a feature they need (think all those
>people still out there on 1.2).
>4. This strategy has the positive effect of allowing developers to see
>their changes in production faster, but OTOH if no one really uses the
> new
>versions this doesn't really happen anyway.
>
> I'd also note that if people hadn't noticed, users tend to be pretty
> reluctant to upgrade their databases (hello everyone still running 2.1).
> This tends to be the nature of a database to some extent (if it works on
> version x, why upgrade to y?). IMO it would make more sense to support less
> versions but for a longer period of time. I'm sure most users would
> appreciate 2 years of bug fixes for only 2 branches with a new major
> approximately every 2 years. Databases don't move that fast, there's not
> much desirable in a feature release every year for users.
>
> sidenote: 3.10 was released in January 2017, and while the changes list for
> 4.0 is getting quite large there's not much there 

Re: Roadmap for 4.0

2018-04-04 Thread kurt greaves
>
> I'm also a bit sad that we seem to be getting back to our old demons of
> trying
> to shove as much as we possibly can in the next major as if having a
> feature
> miss it means it will never happen.

That wasn't the intention of this thread, but that's the response I got.
Thought I made it pretty clear that this was about compiling a list of
things that people are currently working on and can commit to getting
finished soon (which should be a relatively small list considering the
limited number of full time contributors).

Of course, we should probably (re-(re-(re-)))start a discussion on release
> "strategy" in parallel because it doesn't seem we have one right now, but
> that's imo a discussion we should keep separate.

This discussion was always about the release strategy. There is no
separation between the release strategy for 4.0 and the release strategy
for the project, they are the same thing and what is intended to be
discussed here. I don't think it's possible to have a separate discussion
on these two things as the release strategy has a pretty big influence on
how 4.0 is released.

I'm all for a feature freeze and KISS, but I feel that this really needs a
bit more thought before we just jump in and set another precedent for
future releases. IMO the Cassandra project has had a seriously bad track
record of releasing major versions in the past, and we should probably work
at resolving that properly, rather than just continuing the current "let's
just try something new every time without really thinking about it".

Some points:

   1.  This strategy means that we don't care about what improvements
   actually make it into any given major version. This means that we will have
   major releases with nothing/very little desirable for users, and thus
   little reason to upgrade other than to stay on a supported version (from
   experience this isn't terribly important to users of a database). I think
   this inevitably leads to supporting more versions than necessary, and in
   general a pretty poor experience for users as we spend more time fighting
   bugs in production rather than before we do a release (purely because of
   increased frequency of releases).
   2. We'll always be driven by feature deadlines, which for the most part
   is fine, as long as we handle verification/quality assurance/release
   candidates appropriately. The main problem here though is that we don't
   really know what's going to be in a certain release until we hit the
   freeze, and what's in it may not really make sense at that point in time.
   3. We'll pump out major versions fairly regularly and end up with even
   more users that are on EOL versions with complex upgrade paths to get to a
   supported version or a version with a feature they need (think all those
   people still out there on 1.2).
   4. This strategy has the positive effect of allowing developers to see
   their changes in production faster, but OTOH if no one really uses the new
   versions this doesn't really happen anyway.

I'd also note that if people hadn't noticed, users tend to be pretty
reluctant to upgrade their databases (hello everyone still running 2.1).
This tends to be the nature of a database to some extent (if it works on
version x, why upgrade to y?). IMO it would make more sense to support less
versions but for a longer period of time. I'm sure most users would
appreciate 2 years of bug fixes for only 2 branches with a new major
approximately every 2 years. Databases don't move that fast, there's not
much desirable in a feature release every year for users.

sidenote: 3.10 was released in January 2017, and while the changes list for
4.0 is getting quite large there's not much there that's going to win over
users. It's mostly refactorings and improvements that affect developers
more so than users. I'm really interested in why people believe there is an
actual benefit in pumping out feature releases on a yearly basis. Who
exactly does that benefit? From what I know, the majority of "major" users
are still backporting stuff they want to 2.1, so why rush releasing more
versions?

Regardless of whatever plan we do end up following it would still be
> valuable to have a list of tickets for 4.0 which is the overall goal of
> this email - so let's not get too worked up on the details just yet (save
> that for after I summarise/follow up).
>
 lol. dreaming.

On 4 April 2018 at 10:38, Aleksey Yeshchenko  wrote:

> 3.0 will be the most popular release for probably at least another couple
> years - I see no good reason to cap its support window. We aren’t Oracle.
>
> —
> AY
>
> On 3 April 2018 at 22:29:29, Michael Shuler (mich...@pbandjelly.org)
> wrote:
>
> Apache Cassandra 3.0 is supported until 6 months after 4.0 release (date
> TBD).
>


RE: Roadmap for 4.0

2018-04-04 Thread Steinmaurer, Thomas
Having https://issues.apache.org/jira/browse/CASSANDRA-12269 in mind, 3.0 was a 
noticeable step back regarding write throughput compared to what we have been 
used with 2.1 on the same infrastructure, thus for us, 3.0.x is a no-go, thus 
planning more towards the 3.11.x series in pre-production stages this year 
before deploying into production. Production-wise, we are still "stuck" and 
(more or less) happy with 2.1.

Thomas

-Original Message-
From: alek...@apple.com [mailto:alek...@apple.com]
Sent: Mittwoch, 04. April 2018 12:38
To: dev@cassandra.apache.org
Subject: Re: Roadmap for 4.0

3.0 will be the most popular release for probably at least another couple years 
- I see no good reason to cap its support window. We aren’t Oracle.

—
AY

On 3 April 2018 at 22:29:29, Michael Shuler (mich...@pbandjelly.org) wrote:

Apache Cassandra 3.0 is supported until 6 months after 4.0 release (date TBD).
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-04 Thread Rahul Singh
I understand the merits of both approaches. In working with other DBs In the 
“old country” of SQL, we often had to write indexing sequences manually for 
important tables. It was “built into the product” but in order to leverage the 
maximum benefits of indices we had to have different indices other than the 
clustered (physical index). The process still sucked. It’s never perfect.

The JVM is already fraught with GC issues and putting another process being 
managed in the same heapspace is what I’m worried about. Technically the 
process could be in the same binary but started as a side Car or in the same 
main process.

Consider a process called “cassandra-agent” that’s sitting around with a 
scheduler based on config or a Cassandra table. Distributed in the same 
release. Shell / service scripts would start it. The end user knows it only by 
examining the .sh files. This opens possibilities of including a GUI hosted in 
the same process without cluttering the core coolness of Cassandra.

Best,

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 4, 2018, 2:50 AM -0400, Dor Laor , wrote:
> We at Scylla, implemented repair in a similar way to the Cassandra reaper.
> We do
> that using an external application, written in go that manages repair for
> multiple clusters
> and saves the data in an external Scylla cluster. The logic resembles the
> reaper one with
> some specific internal sharding optimizations and uses the Scylla rest api.
>
> However, I have doubts it's the ideal way. After playing a bit with
> CockroachDB, I realized
> it's super nice to have a single binary that repairs itself, provides a GUI
> and is the core DB.
>
> Even while distributed, you can elect a leader node to manage the repair in
> a consistent
> way so the complexity can be reduced to a minimum. Repair can write its
> status to the
> system tables and to provide an api for progress, rate control, etc.
>
> The big advantage for repair to embedded in the core is that there is no
> need to expose
> internal state to the repair logic. So an external program doesn't need to
> deal with different
> version of Cassandra, different repair capabilities of the core (such as
> incremental on/off)
> and so forth. A good database should schedule its own repair, it knows
> whether the shreshold
> of hintedhandoff was cross or not, it knows whether nodes where replaced,
> etc,
>
> My 2 cents. Dor
>
> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
> dinesh.jo...@yahoo.com.invalid> wrote:
>
> > Simon,
> > You could still do load aware repair outside of the main process by
> > reading Cassandra's metrics.
> > In general, I don't think the maintenance tasks necessarily need to live
> > in the main process. They could negatively impact the read / write path.
> > Unless strictly required by the serving path, it could live in a sidecar
> > process. There are multiple benefits including isolation, faster iteration,
> > loose coupling. For example - this would mean that the maintenance tasks
> > can have a different gc profile than the main process and it would be ok.
> > Today that is not the case.
> > The only issue I see is that the project does not provide an official
> > sidecar. Perhaps there should be one. We probably would've not had to have
> > this discussion ;)
> > Dinesh
> >
> > On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou <
> > zhouqing...@gmail.com> wrote:
> >
> > Repair has been a problem for us at Uber. In general I'm in favor of
> > including the scheduling logic in Cassandra daemon. It has the benefit of
> > introducing something like load-aware repair, eg, only schedule repair
> > while no ongoing compaction or traffic is low, etc. As proposed by others,
> > we can expose keyspace/table-level configurations so that users can opt-in.
> > Regarding the risk, yes there will be problems at the beginning but in the
> > long run, users will appreciate that repair works out of the box, just like
> > compaction. We have large Cassandra deployments and can work with Netflix
> > folks for intensive testing to boost user confidence.
> >
> > On the other hand, have we looked into how other NoSQL databases do repair?
> > Is there a side car process?
> >
> >
> > On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli  > wrote:
> >
> > > Repair is critical for running C* and I agree with Roopa that it needs to
> > > be part of the offering. I think we should make it easy for new users to
> > > run C*.
> > >
> > > Can we have a side car process which we can add to Apache Cassandra
> > > offering and we can put this repair their? I am also fine putting it in
> > C*
> > > if side car is more long term.
> > >
> > > On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala <
> > > rtangir...@netflix.com.invalid> wrote:
> > >
> > > > In seeing so many companies grapple with running repairs successfully
> > in
> > > > production, and seeing the success of distributed scheduled repair here
> > > at
> > > 

Re: Roadmap for 4.0

2018-04-04 Thread Aleksey Yeshchenko
3.0 will be the most popular release for probably at least another couple years 
- I see no good reason to cap its support window. We aren’t Oracle.

—
AY

On 3 April 2018 at 22:29:29, Michael Shuler (mich...@pbandjelly.org) wrote:

Apache Cassandra 3.0 is supported until 6 months after 4.0 release (date 
TBD). 

Re: Repair scheduling tools

2018-04-04 Thread Dor Laor
We at Scylla, implemented repair in a similar way to the Cassandra reaper.
We do
that using an external application, written in go that manages repair for
multiple clusters
and saves the data in an external Scylla cluster. The logic resembles the
reaper one with
some specific internal sharding optimizations and uses the Scylla rest api.

However, I have doubts it's the ideal way. After playing a bit with
CockroachDB, I realized
it's super nice to have a single binary that repairs itself, provides a GUI
and is the core DB.

Even while distributed, you can elect a leader node to manage the repair in
a consistent
way so the complexity can be reduced to a minimum. Repair can write its
status to the
system tables and to provide an api for progress, rate control, etc.

The big advantage for repair to embedded in the core is that there is no
need to expose
internal state to the repair logic. So an external program doesn't need to
deal with different
version of Cassandra, different repair capabilities of the core (such as
incremental on/off)
and so forth. A good database should schedule its own repair, it knows
whether the shreshold
of hintedhandoff was cross or not, it knows whether nodes where replaced,
etc,

My 2 cents. Dor

On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
dinesh.jo...@yahoo.com.invalid> wrote:

> Simon,
> You could still do load aware repair outside of the main process by
> reading Cassandra's metrics.
> In general, I don't think the maintenance tasks necessarily need to live
> in the main process. They could negatively impact the read / write path.
> Unless strictly required by the serving path, it could live in a sidecar
> process. There are multiple benefits including isolation, faster iteration,
> loose coupling. For example - this would mean that the maintenance tasks
> can have a different gc profile than the main process and it would be ok.
> Today that is not the case.
> The only issue I see is that the project does not provide an official
> sidecar. Perhaps there should be one. We probably would've not had to have
> this discussion ;)
> Dinesh
>
> On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou <
> zhouqing...@gmail.com> wrote:
>
>  Repair has been a problem for us at Uber. In general I'm in favor of
> including the scheduling logic in Cassandra daemon. It has the benefit of
> introducing something like load-aware repair, eg, only schedule repair
> while no ongoing compaction or traffic is low, etc. As proposed by others,
> we can expose keyspace/table-level configurations so that users can opt-in.
> Regarding the risk, yes there will be problems at the beginning but in the
> long run, users will appreciate that repair works out of the box, just like
> compaction. We have large Cassandra deployments and can work with Netflix
> folks for intensive testing to boost user confidence.
>
> On the other hand, have we looked into how other NoSQL databases do repair?
> Is there a side car process?
>
>
> On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli 
> wrote:
>
> > Repair is critical for running C* and I agree with Roopa that it needs to
> > be part of the offering. I think we should make it easy for new users to
> > run C*.
> >
> > Can we have a side car process which we can add to Apache Cassandra
> > offering and we can put this repair their? I am also fine putting it in
> C*
> > if side car is more long term.
> >
> > On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala <
> > rtangir...@netflix.com.invalid> wrote:
> >
> > > In seeing so many companies grapple with running repairs successfully
> in
> > > production, and seeing the success of distributed scheduled repair here
> > at
> > > Netflix, I strongly believe that adding this to Cassandra would be a
> > great
> > > addition to the database.  I am hoping, we as a community will make it
> > easy
> > > for teams to operate and run Cassandra by enhancing the core product,
> and
> > > making the maintenances like repairs and compactions part of the
> database
> > > without external tooling. We can have an experimental flag for the
> > feature
> > > and only teams who are confident with the service can enable them,
> while
> > > others can fall back to default repairs.
> > >
> > >
> > > *Regards,*
> > >
> > > *Roopa Tangirala*
> > >
> > > Engineering Manager CDE
> > >
> > > *(408) 438-3156 - mobile*
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
> > > kenbrot...@yahoo.com.invalid> wrote:
> > >
> > > > Why not make it configurable?
> > > >auto_manage_repair_consistancy: true (default: false)
> > > >
> > > > Then users can use the built in auto repair function that would be
> > > created
> > > > or continue to handle it as now.  Default behavior would be "false"
> so
> > > > nothing changes on its own.  Just wondering why not have that option?
> > It
> > > > might accelerate progress as others have already suggested.
> > > >
> > > > Kenneth Brotman
> > > >
> > > > 

Re: Repair scheduling tools

2018-04-04 Thread Dinesh Joshi
Simon,
You could still do load aware repair outside of the main process by reading 
Cassandra's metrics.
In general, I don't think the maintenance tasks necessarily need to live in the 
main process. They could negatively impact the read / write path. Unless 
strictly required by the serving path, it could live in a sidecar process. 
There are multiple benefits including isolation, faster iteration, loose 
coupling. For example - this would mean that the maintenance tasks can have a 
different gc profile than the main process and it would be ok. Today that is 
not the case.
The only issue I see is that the project does not provide an official sidecar. 
Perhaps there should be one. We probably would've not had to have this 
discussion ;)
Dinesh 

On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou 
 wrote:  
 
 Repair has been a problem for us at Uber. In general I'm in favor of
including the scheduling logic in Cassandra daemon. It has the benefit of
introducing something like load-aware repair, eg, only schedule repair
while no ongoing compaction or traffic is low, etc. As proposed by others,
we can expose keyspace/table-level configurations so that users can opt-in.
Regarding the risk, yes there will be problems at the beginning but in the
long run, users will appreciate that repair works out of the box, just like
compaction. We have large Cassandra deployments and can work with Netflix
folks for intensive testing to boost user confidence.

On the other hand, have we looked into how other NoSQL databases do repair?
Is there a side car process?


On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli 
wrote:

> Repair is critical for running C* and I agree with Roopa that it needs to
> be part of the offering. I think we should make it easy for new users to
> run C*.
>
> Can we have a side car process which we can add to Apache Cassandra
> offering and we can put this repair their? I am also fine putting it in C*
> if side car is more long term.
>
> On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala <
> rtangir...@netflix.com.invalid> wrote:
>
> > In seeing so many companies grapple with running repairs successfully in
> > production, and seeing the success of distributed scheduled repair here
> at
> > Netflix, I strongly believe that adding this to Cassandra would be a
> great
> > addition to the database.  I am hoping, we as a community will make it
> easy
> > for teams to operate and run Cassandra by enhancing the core product, and
> > making the maintenances like repairs and compactions part of the database
> > without external tooling. We can have an experimental flag for the
> feature
> > and only teams who are confident with the service can enable them, while
> > others can fall back to default repairs.
> >
> >
> > *Regards,*
> >
> > *Roopa Tangirala*
> >
> > Engineering Manager CDE
> >
> > *(408) 438-3156 - mobile*
> >
> >
> >
> >
> >
> > On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
> > kenbrot...@yahoo.com.invalid> wrote:
> >
> > > Why not make it configurable?
> > >        auto_manage_repair_consistancy: true (default: false)
> > >
> > > Then users can use the built in auto repair function that would be
> > created
> > > or continue to handle it as now.  Default behavior would be "false" so
> > > nothing changes on its own.  Just wondering why not have that option?
> It
> > > might accelerate progress as others have already suggested.
> > >
> > > Kenneth Brotman
> > >
> > > -Original Message-
> > > From: Nate McCall [mailto:zznat...@gmail.com]
> > > Sent: Tuesday, April 03, 2018 1:37 PM
> > > To: dev
> > > Subject: Re: Repair scheduling tools
> > >
> > > This document does a really good job of listing out some of the issues
> of
> > > coordinating scheduling repair. Regardless of which camp you fall into,
> > it
> > > is certainly worth a read.
> > >
> > > On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch 
> > > wrote:
> > > > I just want to say I think it would be great for our users if we
> moved
> > > > repair scheduling into Cassandra itself. The team here at Netflix has
> > > > opened the ticket
> > > > 
> > > > and have written a detailed design document
> > > >  t45rz7H3xs9G
> > > > bFSEyGzEtM/edit#heading=h.iasguic42ger>
> > > > that includes problem discussion and prior art if anyone wants to
> > > > contribute to that. We tried to fairly discuss existing solutions,
> > > > what their drawbacks are, and a proposed solution.
> > > >
> > > > If we were to put this as part of the main Cassandra daemon, I think
> > > > it should probably be marked experimental and of course be something
> > > > that users opt into (table by table or cluster by cluster) with the
> > > > understanding that it might not fully work out of the box the first
> > > > time we ship it. We have to be willing to take risks but we