Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread kurt greaves
To add to the above, hackathons would make sense in the lead up to the
feature freeze IMO to get things through the door, but not necessarily
afterwards (debatable). And they can be flexible too; I suspect once people
start on reviewing something they'll be much more likely to see it through
to completion. It's unlikely we'd end up with many commits after 72 hours,
purely because any feedback may take longer than that to be actioned, but
that doesn't mean our time is wasted.

On 6 April 2018 at 05:09, kurt greaves  wrote:

> I like the idea and we would be willing to take part (no committers here
> but I'm sure we can help).
>
> I think it better to pick some JIRAs per 2-3 weeks and have people review
>> them. In my experience, it is hard to synchronize all people across
>> companies during one 72 hour slot.
>
>
> It is hard, but there are benefits. Mostly having many people all actively
> reviewing stuff at the "same" time so communication will be much
> easier/more natural. I think a plan like ^ should be done as well as a
> hackathon, but should be a continuous thing that we always do. E.g, each
> week someone is responsible for picking a few tickets that should be
> reviewed and we hunt for volunteers to start the review work.
>
>
>>
>
>
>>
>> On Thu, Apr 5, 2018 at 9:48 PM, Nate McCall  wrote:
>>
>> > Per Kurt's point in our release thread, we have a lot to do here.
>> >
>> > What do folks feel about setting aside a 72hr period at some point soon
>> > where we get some allotment from our employers to spend a window or two
>> of
>> > time therein reviewing patches?
>> >
>> > I have seen a couple of other ASF communities do this type of thing
>> > effectively in recent months. If we pull this off I think it could set
>> an
>> > excellent precedent for swarming on other things in the future.
>> >
>>
>
>


Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread kurt greaves
I like the idea and we would be willing to take part (no committers here
but I'm sure we can help).

I think it better to pick some JIRAs per 2-3 weeks and have people review
> them. In my experience, it is hard to synchronize all people across
> companies during one 72 hour slot.


It is hard, but there are benefits. Mostly having many people all actively
reviewing stuff at the "same" time so communication will be much
easier/more natural. I think a plan like ^ should be done as well as a
hackathon, but should be a continuous thing that we always do. E.g, each
week someone is responsible for picking a few tickets that should be
reviewed and we hunt for volunteers to start the review work.


>


>
> On Thu, Apr 5, 2018 at 9:48 PM, Nate McCall  wrote:
>
> > Per Kurt's point in our release thread, we have a lot to do here.
> >
> > What do folks feel about setting aside a 72hr period at some point soon
> > where we get some allotment from our employers to spend a window or two
> of
> > time therein reviewing patches?
> >
> > I have seen a couple of other ASF communities do this type of thing
> > effectively in recent months. If we pull this off I think it could set an
> > excellent precedent for swarming on other things in the future.
> >
>


Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread Nate McCall
That could work as well. My goal is that we figure out how to resource and
focus on this for a bit.

On Fri, Apr 6, 2018, 5:02 PM sankalp kohli  wrote:

> I think it better to pick some JIRAs per 2-3 weeks and have people review
> them. In my experience, it is hard to synchronize all people across
> companies during one 72 hour slot.
>
>
>
> On Thu, Apr 5, 2018 at 9:48 PM, Nate McCall  wrote:
>
> > Per Kurt's point in our release thread, we have a lot to do here.
> >
> > What do folks feel about setting aside a 72hr period at some point soon
> > where we get some allotment from our employers to spend a window or two
> of
> > time therein reviewing patches?
> >
> > I have seen a couple of other ASF communities do this type of thing
> > effectively in recent months. If we pull this off I think it could set an
> > excellent precedent for swarming on other things in the future.
> >
>


Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread sankalp kohli
I think it better to pick some JIRAs per 2-3 weeks and have people review
them. In my experience, it is hard to synchronize all people across
companies during one 72 hour slot.



On Thu, Apr 5, 2018 at 9:48 PM, Nate McCall  wrote:

> Per Kurt's point in our release thread, we have a lot to do here.
>
> What do folks feel about setting aside a 72hr period at some point soon
> where we get some allotment from our employers to spend a window or two of
> time therein reviewing patches?
>
> I have seen a couple of other ASF communities do this type of thing
> effectively in recent months. If we pull this off I think it could set an
> excellent precedent for swarming on other things in the future.
>


[Discuss] patch review virtual hackathon

2018-04-05 Thread Nate McCall
Per Kurt's point in our release thread, we have a lot to do here.

What do folks feel about setting aside a 72hr period at some point soon
where we get some allotment from our employers to spend a window or two of
time therein reviewing patches?

I have seen a couple of other ASF communities do this type of thing
effectively in recent months. If we pull this off I think it could set an
excellent precedent for swarming on other things in the future.


Re: Roadmap for 4.0

2018-04-05 Thread kurt greaves
>
> Lay our cards on the table about what we want included in 4.0 and work to
> get those in

Are you saying we're back to where we started?  

For those wanting to delay, are we just dancing around inclusion of
> some pet features? This is fine, I just think we need to communicate
> what we are after if so.


Mostly. There are numerous large tickets that have been being worked on *for
a long time*, and have been rumoured to be nearing completion for some
time, which would be beneficial for everyone. They aren't my pet tickets
but I'd sure like to see them finally land (see: Apple tickets).

But there's more to it than that. We've also got tickets we'd like to get
committed and It's an incredibly slow process to get anyone to review your
tickets (and commit) if you're not in the club, so to speak. There's 148
tickets that are Patch available ATM, 89 of which have no reviewer. I think
it's highly unlikely a lot of these will get committed before June 1st, and
I don't think that's fair on a lot of the people who have been trying to
contribute their patches for months, potentially years in some cases, but
have been stuck waiting on feedback/reviewers/committers.

Sure it's not the end of the world, but I know from experience that it's
damn discouraging. Even missing a minor release for a bug fix because of
this is annoying as hell.

I do want to remind everyone though that each new feature is at odds
> with our stability goals for 4.0.

With all the refactoring that's already gone into 4.0 and our current lack
of testing I think we're fighting an uphill battle here anyway. Adding a
few more metres is the least of our worries IMO. The
alpha/verification/testing period is already going to be a very long one.


On 6 April 2018 at 04:01, Nate McCall  wrote:

> >>
> >> So long as non-user-visible improvements, including big ones, can still
> go
> >> in 4.0 at that stage, I’m all for it.
> >
> >
> > My understanding is that after June 1st the 4.0 branch would be created
> and
> > would be bugfix only. It's not really a feature freeze if you allow
> > improvements after that, which is why they'd instead go to trunk.
> >
> > I'm also on the "too soon" train so pushing it back to August or so is
> > desirable to me.
> >
>
> For those wanting to delay, are we just dancing around inclusion of
> some pet features? This is fine, I just think we need to communicate
> what we are after if so.
>
> We can do two things:
> 1. Lay our cards on the table about what we want included in 4.0 and
> work to get those in
> 2. Agree to keep June 1 follow up a lot quicker with a 4.1
>
> I do want to remind everyone though that each new feature is at odds
> with our stability goals for 4.0.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Repair scheduling tools

2018-04-05 Thread kurt greaves
Vnodes is related and because we made it a default lots of people are using
it. Repairing a cluster with vnodes is a catastrophe (even a small one is
often problematic), but we have to deal with it if we build in repair
scheduling.

Repair scheduling is very important and we should definitely include it
with C* (sidecar long term makes most sense to me but only if we looked at
moving other background ops to the sidecar), but I'm positive it's not
going to work well with vnodes in their current state. Having said that, it
should still support scheduling repairs on vnode clusters, but the
vnode+repair problem should be fixed separately (and probably with more
attention than we've given it) because it's a major issue.

FWIW I know of 256 vnode clusters with > 100 nodes, yet I'd be surprised if
any of them are currently successfully repairing.

On 6 April 2018 at 03:03, Nate McCall  wrote:

> I think a take away here is that we can't assume a level of operation
> maturity will coincide automatically with scale. To make our core
> features robust, we have to account for less-experienced users.
>
> A lot of folks on this thread have *really* strong ops and OpsViz
> stories. Let's not forget that most of our users don't.
> ((Un)fortunately, as a consulting firm, we tend to see the worst of
> this).
>
> On Fri, Apr 6, 2018 at 2:52 PM, Jonathan Haddad  wrote:
> > Off the top of my head I can remember clusters with 600 or 700 nodes with
> > 256 tokens.
> >
> > Not the best situation, but it’s real. 256 has been the default for
> better
> > or worse.
> >
> > On Thu, Apr 5, 2018 at 7:41 PM Joseph Lynch 
> wrote:
> >
> >> >
> >> > We see this in larger clusters regularly. Usually folks have just
> >> > 'grown into it' because it was the default.
> >> >
> >>
> >> I could understand a few dozen nodes with 256 vnodes, but hundreds is
> >> surprising. I have a whitepaper draft lying around showing how vnodes
> >> decrease availability in large clusters by orders of magnitude, I'll
> polish
> >> it up and send it out to the list when I get a second.
> >>
> >> In the meantime, sorry for de-railing a conversation about repair
> >> scheduling to talk about vnodes, let's chat about that in a different
> >> thread :-)
> >>
> >> -Joey
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Roadmap for 4.0

2018-04-05 Thread Nate McCall
>>
>> So long as non-user-visible improvements, including big ones, can still go
>> in 4.0 at that stage, I’m all for it.
>
>
> My understanding is that after June 1st the 4.0 branch would be created and
> would be bugfix only. It's not really a feature freeze if you allow
> improvements after that, which is why they'd instead go to trunk.
>
> I'm also on the "too soon" train so pushing it back to August or so is
> desirable to me.
>

For those wanting to delay, are we just dancing around inclusion of
some pet features? This is fine, I just think we need to communicate
what we are after if so.

We can do two things:
1. Lay our cards on the table about what we want included in 4.0 and
work to get those in
2. Agree to keep June 1 follow up a lot quicker with a 4.1

I do want to remind everyone though that each new feature is at odds
with our stability goals for 4.0.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-05 Thread kurt greaves
>
> So long as non-user-visible improvements, including big ones, can still go
> in 4.0 at that stage, I’m all for it.


My understanding is that after June 1st the 4.0 branch would be created and
would be bugfix only. It's not really a feature freeze if you allow
improvements after that, which is why they'd instead go to trunk.

I'm also on the "too soon" train so pushing it back to August or so is
desirable to me.


On 5 April 2018 at 21:06, Aleksey Yeshchenko  wrote:

> So long as non-user-visible improvements, including big ones, can still go
> in 4.0 at that stage, I’m all for it.
>
> —
> AY
>
> On 5 April 2018 at 21:14:03, Nate McCall (zznat...@gmail.com) wrote:
>
> >>> My understanding, from Nate's summary, was June 1 is the freeze date
> for
> >>> features. I expect we would go for at least 4 months (if not longer)
> >>> testing, fixing bugs, early dogfooding, and so on. I also equated June
> 1
> >>> with the data which we would create a 'cassandra-4.0' branch, and thus
> the
> >>> merge order becomes: 3.0->3,11->4.0->trunk.
>
> This^ (apologies - 'freeze for alpha' was a bit open for interpretation :)
>
> The idea of making this point in time the 4.0 branch date and merge
> order switch is a good one.
>
> Can we move our gelling consensus here towards this goal?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Repair scheduling tools

2018-04-05 Thread Nate McCall
I think a take away here is that we can't assume a level of operation
maturity will coincide automatically with scale. To make our core
features robust, we have to account for less-experienced users.

A lot of folks on this thread have *really* strong ops and OpsViz
stories. Let's not forget that most of our users don't.
((Un)fortunately, as a consulting firm, we tend to see the worst of
this).

On Fri, Apr 6, 2018 at 2:52 PM, Jonathan Haddad  wrote:
> Off the top of my head I can remember clusters with 600 or 700 nodes with
> 256 tokens.
>
> Not the best situation, but it’s real. 256 has been the default for better
> or worse.
>
> On Thu, Apr 5, 2018 at 7:41 PM Joseph Lynch  wrote:
>
>> >
>> > We see this in larger clusters regularly. Usually folks have just
>> > 'grown into it' because it was the default.
>> >
>>
>> I could understand a few dozen nodes with 256 vnodes, but hundreds is
>> surprising. I have a whitepaper draft lying around showing how vnodes
>> decrease availability in large clusters by orders of magnitude, I'll polish
>> it up and send it out to the list when I get a second.
>>
>> In the meantime, sorry for de-railing a conversation about repair
>> scheduling to talk about vnodes, let's chat about that in a different
>> thread :-)
>>
>> -Joey
>>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-05 Thread Jonathan Haddad
Off the top of my head I can remember clusters with 600 or 700 nodes with
256 tokens.

Not the best situation, but it’s real. 256 has been the default for better
or worse.

On Thu, Apr 5, 2018 at 7:41 PM Joseph Lynch  wrote:

> >
> > We see this in larger clusters regularly. Usually folks have just
> > 'grown into it' because it was the default.
> >
>
> I could understand a few dozen nodes with 256 vnodes, but hundreds is
> surprising. I have a whitepaper draft lying around showing how vnodes
> decrease availability in large clusters by orders of magnitude, I'll polish
> it up and send it out to the list when I get a second.
>
> In the meantime, sorry for de-railing a conversation about repair
> scheduling to talk about vnodes, let's chat about that in a different
> thread :-)
>
> -Joey
>


Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
>
> We see this in larger clusters regularly. Usually folks have just
> 'grown into it' because it was the default.
>

I could understand a few dozen nodes with 256 vnodes, but hundreds is
surprising. I have a whitepaper draft lying around showing how vnodes
decrease availability in large clusters by orders of magnitude, I'll polish
it up and send it out to the list when I get a second.

In the meantime, sorry for de-railing a conversation about repair
scheduling to talk about vnodes, let's chat about that in a different
thread :-)

-Joey


Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
Sorry sent early.

To explain further, the scheduler is entirely decentralized in the proposed
design, and no node holds all the information you're talking about in heap
at once (in fact no one node would ever hold that information). Each node
is responsible only for tokens that they are "primary" replicas of. Then
each token is split by tables and then each table range is individually
split into subranges, into at most a few hundred range splits (typically
one or two, you don't want too many otherwise you'll have too many small
sstables) at a time. This is all at most megabytes of data, and I really do
believe would not cause significant, if any, heap pressure. The repairs
*themselves* certainly would create heap pressure, but that happens
regardless of the scheduler.

-Joey

On Thu, Apr 5, 2018 at 7:25 PM, Joseph Lynch  wrote:

> I wouldn't trivialize it, scheduling can end up dealing with more than a
>> single repair. If theres 1000 keyspace/tables, with 400 nodes and 256
>> vnodes on each thats a lot of repairs to plan out and keep track of and can
>> easily cause heap allocation spikes if opted in.
>>
>> Chris
>
> The current proposal never keeps track of more than a few hundred range
> splits for a single table at a time, and nothing ever keeps state for the
> entire 400 node  Compared to the load generated by actually repairing the
> data, I actually do think it is trivial heap pressure.
>
>
> Somewhat beside the point, I wasn't aware there were any 100 node +
> clusters running with vnodes, if my math is correct they would be
> excessively vulnerable to outages with that many vnodes and that many
> nodes. Most of the large clusters I've heard of (100 nodes plus) are
> running with single or at most 4 tokens per node.
>


Re: Repair scheduling tools

2018-04-05 Thread Nate McCall
>
> Somewhat beside the point, I wasn't aware there were any 100 node +
> clusters running with vnodes, if my math is correct they would be
> excessively vulnerable to outages with that many vnodes and that many
> nodes. Most of the large clusters I've heard of (100 nodes plus) are
> running with single or at most 4 tokens per node.

We see this in larger clusters regularly. Usually folks have just
'grown into it' because it was the default.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
>
> I wouldn't trivialize it, scheduling can end up dealing with more than a
> single repair. If theres 1000 keyspace/tables, with 400 nodes and 256
> vnodes on each thats a lot of repairs to plan out and keep track of and can
> easily cause heap allocation spikes if opted in.
>
> Chris

The current proposal never keeps track of more than a few hundred range
splits for a single table at a time, and nothing ever keeps state for the
entire 400 node  Compared to the load generated by actually repairing the
data, I actually do think it is trivial heap pressure.


Somewhat beside the point, I wasn't aware there were any 100 node +
clusters running with vnodes, if my math is correct they would be
excessively vulnerable to outages with that many vnodes and that many
nodes. Most of the large clusters I've heard of (100 nodes plus) are
running with single or at most 4 tokens per node.


Re: Repair scheduling tools

2018-04-05 Thread Chris Lohfink

> I do have a hard time buying that an opt-in repair *scheduling* is going to
> cause heap problems or impact the daemon significantly; the scheduler
> literally reads a few bytes out of a Cassandra table and makes a function
> call or two, and then sleeps for 2 minutes.

I wouldn't trivialize it, scheduling can end up dealing with more than a single 
repair. If theres 1000 keyspace/tables, with 400 nodes and 256 vnodes on each 
thats a lot of repairs to plan out and keep track of and can easily cause heap 
allocation spikes if opted in.

Chris


Re: Roadmap for 4.0

2018-04-05 Thread Aleksey Yeshchenko
So long as non-user-visible improvements, including big ones, can still go in 
4.0 at that stage, I’m all for it.

—
AY

On 5 April 2018 at 21:14:03, Nate McCall (zznat...@gmail.com) wrote:

>>> My understanding, from Nate's summary, was June 1 is the freeze date for  
>>> features. I expect we would go for at least 4 months (if not longer)  
>>> testing, fixing bugs, early dogfooding, and so on. I also equated June 1  
>>> with the data which we would create a 'cassandra-4.0' branch, and thus the  
>>> merge order becomes: 3.0->3,11->4.0->trunk.  

This^ (apologies - 'freeze for alpha' was a bit open for interpretation :)  

The idea of making this point in time the 4.0 branch date and merge  
order switch is a good one.  

Can we move our gelling consensus here towards this goal?  

-  
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
For additional commands, e-mail: dev-h...@cassandra.apache.org  



Re: Roadmap for 4.0

2018-04-05 Thread Nate McCall
>>> My understanding, from Nate's summary, was June 1 is the freeze date for
>>> features. I expect we would go for at least 4 months (if not longer)
>>> testing, fixing bugs, early dogfooding, and so on. I also equated June 1
>>> with the data which we would create a 'cassandra-4.0' branch, and thus the
>>> merge order becomes: 3.0->3,11->4.0->trunk.

This^ (apologies - 'freeze for alpha' was a bit open for interpretation :)

The idea of making this point in time the 4.0 branch date and merge
order switch is a good one.

Can we move our gelling consensus here towards this goal?

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-05 Thread Rahul Singh
Simpler scheduler is never simple. I agree in principle — ala “Cassandra-Agent” 
which could manage any order of tasks, schedules, etc needing to prune and 
manage the C* engine. Cassandra has enough TPs, it needs to manage already.

On Apr 5, 2018, 3:09 PM -0400, Joseph Lynch , wrote:
> I think that getting into the various repair strategies in this discussion
> is perhaps orthogonal to how we schedule repair.
>
> Whether we end up with incremental, full, tickers (read @ALL), continuous
>  repair, mutation
> based  repair, etc
> ... something still needs to schedule them for all tables and give good
> introspection into when they ran, how long they took to run, etc. If we're
> able to get a simple scheduler into Cassandra I think we can always add
> additional repair type
> 's
> and configuration options, we could even make them an interface so that
> users can plug in their own repair strategy.
>
> For example if we added a "read-repair" repair type, we could drift
>  that pretty effortlessly.
>
> -Joey
>
> On Thu, Apr 5, 2018 at 11:48 AM, benjamin roth  wrote:
>
> > I don't say reaper is the problem. I don't want to do wrong to Reaper but
> > in the end it is "just" an instrumentation for CS's built in repairs that
> > slices and schedules, right?
> > The problem I see is that the built in repairs are rather inefficient (for
> > many, maybe not all use cases) due to many reasons. To name some of them:
> >
> > - Overstreaming as only whole partitions are repaired, not single mutations
> > - Race conditions in merkle tree calculation on nodes taking part in a
> > repair session
> > - Every stream creates a SSTable, needing to be compacted
> > - Possible SSTable creation floods can even kill a node due to "too many
> > open files" - yes we had that
> > - Incremental repairs have issues
> >
> > Today we had a super simple case where I first ran 'nodetool repair' on a
> > super small system keyspace and then ran a 'scrape-repair':
> > - nodetool took 4 minutes on a single node
> > - scraping took 1 sec repairing all nodes together
> >
> > In the beginning I was twisting my brain how this could be optimized in CS
> > - in the end going with scraping solved every problem we had.
> >
> > 2018-04-05 20:32 GMT+02:00 Jonathan Haddad :
> >
> > > To be fair, reaper in 2016 only worked with 2.0 and was just sitting
> > > around, more or less.
> > >
> > > Since then we've had 401 commits changing tens of thousands of lines of
> > > code, dealing with fault tolerance, repair retries, scalability, etc.
> > > We've had 1 reaper node managing repairs across dozens of clusters and
> > > thousands of nodes. It's a totally different situation today.
> > >
> > >
> > > On Thu, Apr 5, 2018 at 11:17 AM benjamin roth  wrote:
> > >
> > > > That would be totally awesome!
> > > >
> > > > Not sure if it helps here but for completeness:
> > > > We completely "dumped" regular repairs - no matter if 'nodetool repair'
> > > or
> > > > reaper - and run our own tool that does simply CL_ALL scraping over the
> > > > whole cluster.
> > > > It runs now for over a year in production and the only problem we
> > > > encountered was that we got timeouts when scraping (too) large /
> > > tombstoned
> > > > partitions. It turned out that the large partitions weren't even
> > readable
> > > > with CQL / cqlsh / DevCenter. So that wasn't a problem of the repair.
> > It
> > > > was rather a design problem. Storing data that can't be read doesn't
> > make
> > > > sense anyway.
> > > >
> > > > What I can tell from our experience:
> > > > - It works much more reliable than what we had before - also more
> > > reliable
> > > > than reaper (state of 2016)
> > > > - It runs totally smooth and much faster than regular repairs as it
> > only
> > > > streams what needs to be streamed
> > > > - It's easily manageable, interruptible, resumable on a very
> > fine-grained
> > > > level. The only thing you need to do is to store state (KS/CF/Last
> > Token)
> > > > in a simple storage like redis
> > > > - It works even pretty well when populating a empty node e.g. when
> > > changing
> > > > RFs / bootstrapping DCs
> > > > - You can easily control the cluster-load by tuning the concurrency of
> > > the
> > > > scrape process
> > > >
> > > > I don't see a reason for us to ever go back to built-in repairs if they
> > > > don't improve immensely. In many cases (especially with MVs) they are
> > > true
> > > > resource killers.
> > > >
> > > > Just my 2 cent and experience.
> > > >
> > > > 2018-04-04 17:00 GMT+02:00 Ben Bromhead :
> > > >
> > > 

Re: Roadmap for 4.0

2018-04-05 Thread Josh McKenzie
I'm in line w/your thinking here Jason.

On Thu, Apr 5, 2018 at 3:25 PM, Jonathan Haddad  wrote:
> That’s exactly what I was thinking too.
>
> There’s also nothing preventing features from being merged into trunk after
> we create the 4.0 branch, which in my opinion is a better approach than
> trying to jam everything in right before the release.
> On Thu, Apr 5, 2018 at 12:06 PM Jason Brown  wrote:
>
>> My understanding, from Nate's summary, was June 1 is the freeze date for
>> features. I expect we would go for at least 4 months (if not longer)
>> testing, fixing bugs, early dogfooding, and so on. I also equated June 1
>> with the data which we would create a 'cassandra-4.0' branch, and thus the
>> merge order becomes: 3.0->3,11->4.0->trunk.
>>
>> Is this different from what others are thinking? I'm open to shifting the
>> actual date, but what about the rest?
>>
>>
>> On Thu, Apr 5, 2018 at 11:39 AM, Aleksey Yeshchenko 
>> wrote:
>>
>> > June feels a bit too early to me as well.
>> >
>> > I personally would go prefer end of August / beginning of September.
>> >
>> > +1 to the idea of having a fixed date, though, just not this one.
>> >
>> > —
>> > AY
>> >
>> > On 5 April 2018 at 19:20:12, Stefan Podkowinski (s...@apache.org) wrote:
>> >
>> > June is too early.
>> >
>> >
>> > On 05.04.18 19:32, Josh McKenzie wrote:
>> > > Just as a matter of perspective, I'm personally mentally diffing from
>> > > when 3.0 hit, not 3.10.
>> > >
>> > >> commit 96f407bce56b98cd824d18e32ee012dbb99a0286
>> > >> Author: T Jake Luciani 
>> > >> Date: Fri Nov 6 14:38:34 2015 -0500
>> > >> 3.0 release versions
>> > > While June feels close to today relative to momentum for a release
>> > > before this discussion, it's certainly long enough from when the
>> > > previous traditional major released that it doesn't feel "too soon" to
>> > > me.
>> > >
>> > > On Thu, Apr 5, 2018 at 12:46 PM, sankalp kohli > >
>> > wrote:
>> > >> We can take a look on 1st June how things are then decide if we want
>> to
>> > >> freeze it and whats in and whats out.
>> > >>
>> > >> On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg 
>> > wrote:
>> > >>
>> > >>> Hi,
>> > >>>
>> > >>> +1 to having a feature freeze date. June 1st is earlier than I would
>> > have
>> > >>> picked.
>> > >>>
>> > >>> Ariel
>> > >>>
>> > >>> On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
>> >  +1 here for June 1.
>> > 
>> >  On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown 
>> > >>> wrote:
>> > > +1
>> > >
>> > > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston <
>> > beggles...@apple.com>
>> > > wrote:
>> > >
>> > >> +1
>> > >>
>> > >> On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
>> > >>
>> > >> Earlier than I’d have personally picked, but I’m +1 too
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Jeff Jirsa
>> > >>
>> > >>
>> > >> > On Apr 4, 2018, at 5:06 PM, Nate McCall 
>> > > wrote:
>> > >> >
>> > >> > Top-posting as I think this summary is on point - thanks,
>> > >>> Scott!
>> > > (And
>> > >> > great to have you back, btw).
>> > >> >
>> > >> > It feels to me like we are coalescing on two points:
>> > >> > 1. June 1 as a freeze for alpha
>> > >> > 2. "Stable" is the new "Exciting" (and the testing and
>> > >>> dogfooding
>> > >> > implied by such before a GA)
>> > >> >
>> > >> > How do folks feel about the above points?
>> > >> >
>> > >> >
>> > >> >> Re-raising a point made earlier in the thread by Jeff and
>> > >>> affirmed
>> > >> by Josh:
>> > >> >>
>> > >> >> –––
>> > >> >> Jeff:
>> > >>  A hard date for a feature freeze makes sense, a hard date
>> > >>> for a
>> > >> release
>> > >>  does not.
>> > >> >>
>> > >> >> Josh:
>> > >> >>> Strongly agree. We should also collectively define what
>> > >>> "Done"
>> > >> looks like
>> > >> >>> post freeze so we don't end up in bike-shedding hell like we
>> > >>> have
>> > >> in the
>> > >> >>> past.
>> > >> >> –––
>> > >> >>
>> > >> >> Another way of saying this: ensuring that the 4.0 release is
>> > >>> of
>> > >> high quality is more important than cutting the release on a
>> > specific
>> > > date.
>> > >> >>
>> > >> >> If we adopt Sylvain's suggestion of freezing features on a
>> > > "feature
>> > >> complete" date (modulo a "definition of done" as Josh suggested),
>> > >>> that
>> > > will
>> > >> help us align toward the polish, performance work, and dog-fooding
>> > >>> needed
>> > >> to feel great about shipping 4.0. It's a good time to start
>> thinking
>> > > about
>> > >> the approaches to testing, profiling, and dog-fooding various
>> > > contributors
>> > >> will want to 

Re: Roadmap for 4.0

2018-04-05 Thread Jonathan Haddad
That’s exactly what I was thinking too.

There’s also nothing preventing features from being merged into trunk after
we create the 4.0 branch, which in my opinion is a better approach than
trying to jam everything in right before the release.
On Thu, Apr 5, 2018 at 12:06 PM Jason Brown  wrote:

> My understanding, from Nate's summary, was June 1 is the freeze date for
> features. I expect we would go for at least 4 months (if not longer)
> testing, fixing bugs, early dogfooding, and so on. I also equated June 1
> with the data which we would create a 'cassandra-4.0' branch, and thus the
> merge order becomes: 3.0->3,11->4.0->trunk.
>
> Is this different from what others are thinking? I'm open to shifting the
> actual date, but what about the rest?
>
>
> On Thu, Apr 5, 2018 at 11:39 AM, Aleksey Yeshchenko 
> wrote:
>
> > June feels a bit too early to me as well.
> >
> > I personally would go prefer end of August / beginning of September.
> >
> > +1 to the idea of having a fixed date, though, just not this one.
> >
> > —
> > AY
> >
> > On 5 April 2018 at 19:20:12, Stefan Podkowinski (s...@apache.org) wrote:
> >
> > June is too early.
> >
> >
> > On 05.04.18 19:32, Josh McKenzie wrote:
> > > Just as a matter of perspective, I'm personally mentally diffing from
> > > when 3.0 hit, not 3.10.
> > >
> > >> commit 96f407bce56b98cd824d18e32ee012dbb99a0286
> > >> Author: T Jake Luciani 
> > >> Date: Fri Nov 6 14:38:34 2015 -0500
> > >> 3.0 release versions
> > > While June feels close to today relative to momentum for a release
> > > before this discussion, it's certainly long enough from when the
> > > previous traditional major released that it doesn't feel "too soon" to
> > > me.
> > >
> > > On Thu, Apr 5, 2018 at 12:46 PM, sankalp kohli  >
> > wrote:
> > >> We can take a look on 1st June how things are then decide if we want
> to
> > >> freeze it and whats in and whats out.
> > >>
> > >> On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg 
> > wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> +1 to having a feature freeze date. June 1st is earlier than I would
> > have
> > >>> picked.
> > >>>
> > >>> Ariel
> > >>>
> > >>> On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
> >  +1 here for June 1.
> > 
> >  On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown 
> > >>> wrote:
> > > +1
> > >
> > > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston <
> > beggles...@apple.com>
> > > wrote:
> > >
> > >> +1
> > >>
> > >> On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
> > >>
> > >> Earlier than I’d have personally picked, but I’m +1 too
> > >>
> > >>
> > >>
> > >> --
> > >> Jeff Jirsa
> > >>
> > >>
> > >> > On Apr 4, 2018, at 5:06 PM, Nate McCall 
> > > wrote:
> > >> >
> > >> > Top-posting as I think this summary is on point - thanks,
> > >>> Scott!
> > > (And
> > >> > great to have you back, btw).
> > >> >
> > >> > It feels to me like we are coalescing on two points:
> > >> > 1. June 1 as a freeze for alpha
> > >> > 2. "Stable" is the new "Exciting" (and the testing and
> > >>> dogfooding
> > >> > implied by such before a GA)
> > >> >
> > >> > How do folks feel about the above points?
> > >> >
> > >> >
> > >> >> Re-raising a point made earlier in the thread by Jeff and
> > >>> affirmed
> > >> by Josh:
> > >> >>
> > >> >> –––
> > >> >> Jeff:
> > >>  A hard date for a feature freeze makes sense, a hard date
> > >>> for a
> > >> release
> > >>  does not.
> > >> >>
> > >> >> Josh:
> > >> >>> Strongly agree. We should also collectively define what
> > >>> "Done"
> > >> looks like
> > >> >>> post freeze so we don't end up in bike-shedding hell like we
> > >>> have
> > >> in the
> > >> >>> past.
> > >> >> –––
> > >> >>
> > >> >> Another way of saying this: ensuring that the 4.0 release is
> > >>> of
> > >> high quality is more important than cutting the release on a
> > specific
> > > date.
> > >> >>
> > >> >> If we adopt Sylvain's suggestion of freezing features on a
> > > "feature
> > >> complete" date (modulo a "definition of done" as Josh suggested),
> > >>> that
> > > will
> > >> help us align toward the polish, performance work, and dog-fooding
> > >>> needed
> > >> to feel great about shipping 4.0. It's a good time to start
> thinking
> > > about
> > >> the approaches to testing, profiling, and dog-fooding various
> > > contributors
> > >> will want to take on before release.
> > >> >>
> > >> >> I love how Ben put it:
> > >> >>
> > >> >>> An "exciting" 4.0 release to me is one that is stable and
> > >>> usable
> > >> >>> with no perf regressions on day 1 and includes some of the
> > >>> big
> > 

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
I think that getting into the various repair strategies in this discussion
is perhaps orthogonal to how we schedule repair.

Whether we end up with incremental, full, tickers (read @ALL), continuous
 repair, mutation
based  repair, etc
... something still needs to schedule them for all tables and give good
introspection into when they ran, how long they took to run, etc. If we're
able to get a simple scheduler into Cassandra I think we can always add
additional repair type
's
and configuration options, we could even make them an interface so that
users can plug in their own repair strategy.

For example if we added a "read-repair" repair type, we could drift

that pretty effortlessly.

-Joey

On Thu, Apr 5, 2018 at 11:48 AM, benjamin roth  wrote:

> I don't say reaper is the problem. I don't want to do wrong to Reaper but
> in the end it is "just" an instrumentation for CS's built in repairs that
> slices and schedules, right?
> The problem I see is that the built in repairs are rather inefficient (for
> many, maybe not all use cases) due to many reasons. To name some of them:
>
> - Overstreaming as only whole partitions are repaired, not single mutations
> - Race conditions in merkle tree calculation on nodes taking part in a
> repair session
> - Every stream creates a SSTable, needing to be compacted
> - Possible SSTable creation floods can even kill a node due to "too many
> open files" - yes we had that
> - Incremental repairs have issues
>
> Today we had a super simple case where I first ran 'nodetool repair' on a
> super small system keyspace and then ran a 'scrape-repair':
> - nodetool took 4 minutes on a single node
> - scraping took 1 sec repairing all nodes together
>
> In the beginning I was twisting my brain how this could be optimized in CS
> - in the end going with scraping solved every problem we had.
>
> 2018-04-05 20:32 GMT+02:00 Jonathan Haddad :
>
> > To be fair, reaper in 2016 only worked with 2.0 and was just sitting
> > around, more or less.
> >
> > Since then we've had 401 commits changing tens of thousands of lines of
> > code, dealing with fault tolerance, repair retries, scalability, etc.
> > We've had 1 reaper node managing repairs across dozens of clusters and
> > thousands of nodes.  It's a totally different situation today.
> >
> >
> > On Thu, Apr 5, 2018 at 11:17 AM benjamin roth  wrote:
> >
> > > That would be totally awesome!
> > >
> > > Not sure if it helps here but for completeness:
> > > We completely "dumped" regular repairs - no matter if 'nodetool repair'
> > or
> > > reaper - and run our own tool that does simply CL_ALL scraping over the
> > > whole cluster.
> > > It runs now for over a year in production and the only problem we
> > > encountered was that we got timeouts when scraping (too) large /
> > tombstoned
> > > partitions. It turned out that the large partitions weren't even
> readable
> > > with CQL / cqlsh / DevCenter. So that wasn't a problem of the repair.
> It
> > > was rather a design problem. Storing data that can't be read doesn't
> make
> > > sense anyway.
> > >
> > > What I can tell from our experience:
> > > - It works much more reliable than what we had before - also more
> > reliable
> > > than reaper (state of 2016)
> > > - It runs totally smooth and much faster than regular repairs as it
> only
> > > streams what needs to be streamed
> > > - It's easily manageable, interruptible, resumable on a very
> fine-grained
> > > level. The only thing you need to do is to store state (KS/CF/Last
> Token)
> > > in a simple storage like redis
> > > - It works even pretty well when populating a empty node e.g. when
> > changing
> > > RFs / bootstrapping DCs
> > > - You can easily control the cluster-load by tuning the concurrency of
> > the
> > > scrape process
> > >
> > > I don't see a reason for us to ever go back to built-in repairs if they
> > > don't improve immensely. In many cases (especially with MVs) they are
> > true
> > > resource killers.
> > >
> > > Just my 2 cent and experience.
> > >
> > > 2018-04-04 17:00 GMT+02:00 Ben Bromhead :
> > >
> > > > +1 to including the implementation in Cassandra itself. Makes managed
> > > > repair a first-class citizen, it nicely rounds out Cassandra's
> > > consistency
> > > > story and makes it 1000x more likely that repairs will get run.
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad 
> wrote:
> > > >
> > > > > Implementation details aside, I’m firmly in the “it would be nice
> of
> > C*
> > > > > could take care of it” camp.  Reaper is pretty damn easy 

Re: Roadmap for 4.0

2018-04-05 Thread Jason Brown
My understanding, from Nate's summary, was June 1 is the freeze date for
features. I expect we would go for at least 4 months (if not longer)
testing, fixing bugs, early dogfooding, and so on. I also equated June 1
with the data which we would create a 'cassandra-4.0' branch, and thus the
merge order becomes: 3.0->3,11->4.0->trunk.

Is this different from what others are thinking? I'm open to shifting the
actual date, but what about the rest?


On Thu, Apr 5, 2018 at 11:39 AM, Aleksey Yeshchenko 
wrote:

> June feels a bit too early to me as well.
>
> I personally would go prefer end of August / beginning of September.
>
> +1 to the idea of having a fixed date, though, just not this one.
>
> —
> AY
>
> On 5 April 2018 at 19:20:12, Stefan Podkowinski (s...@apache.org) wrote:
>
> June is too early.
>
>
> On 05.04.18 19:32, Josh McKenzie wrote:
> > Just as a matter of perspective, I'm personally mentally diffing from
> > when 3.0 hit, not 3.10.
> >
> >> commit 96f407bce56b98cd824d18e32ee012dbb99a0286
> >> Author: T Jake Luciani 
> >> Date: Fri Nov 6 14:38:34 2015 -0500
> >> 3.0 release versions
> > While June feels close to today relative to momentum for a release
> > before this discussion, it's certainly long enough from when the
> > previous traditional major released that it doesn't feel "too soon" to
> > me.
> >
> > On Thu, Apr 5, 2018 at 12:46 PM, sankalp kohli 
> wrote:
> >> We can take a look on 1st June how things are then decide if we want to
> >> freeze it and whats in and whats out.
> >>
> >> On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> +1 to having a feature freeze date. June 1st is earlier than I would
> have
> >>> picked.
> >>>
> >>> Ariel
> >>>
> >>> On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
>  +1 here for June 1.
> 
>  On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown 
> >>> wrote:
> > +1
> >
> > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston <
> beggles...@apple.com>
> > wrote:
> >
> >> +1
> >>
> >> On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
> >>
> >> Earlier than I’d have personally picked, but I’m +1 too
> >>
> >>
> >>
> >> --
> >> Jeff Jirsa
> >>
> >>
> >> > On Apr 4, 2018, at 5:06 PM, Nate McCall 
> > wrote:
> >> >
> >> > Top-posting as I think this summary is on point - thanks,
> >>> Scott!
> > (And
> >> > great to have you back, btw).
> >> >
> >> > It feels to me like we are coalescing on two points:
> >> > 1. June 1 as a freeze for alpha
> >> > 2. "Stable" is the new "Exciting" (and the testing and
> >>> dogfooding
> >> > implied by such before a GA)
> >> >
> >> > How do folks feel about the above points?
> >> >
> >> >
> >> >> Re-raising a point made earlier in the thread by Jeff and
> >>> affirmed
> >> by Josh:
> >> >>
> >> >> –––
> >> >> Jeff:
> >>  A hard date for a feature freeze makes sense, a hard date
> >>> for a
> >> release
> >>  does not.
> >> >>
> >> >> Josh:
> >> >>> Strongly agree. We should also collectively define what
> >>> "Done"
> >> looks like
> >> >>> post freeze so we don't end up in bike-shedding hell like we
> >>> have
> >> in the
> >> >>> past.
> >> >> –––
> >> >>
> >> >> Another way of saying this: ensuring that the 4.0 release is
> >>> of
> >> high quality is more important than cutting the release on a
> specific
> > date.
> >> >>
> >> >> If we adopt Sylvain's suggestion of freezing features on a
> > "feature
> >> complete" date (modulo a "definition of done" as Josh suggested),
> >>> that
> > will
> >> help us align toward the polish, performance work, and dog-fooding
> >>> needed
> >> to feel great about shipping 4.0. It's a good time to start thinking
> > about
> >> the approaches to testing, profiling, and dog-fooding various
> > contributors
> >> will want to take on before release.
> >> >>
> >> >> I love how Ben put it:
> >> >>
> >> >>> An "exciting" 4.0 release to me is one that is stable and
> >>> usable
> >> >>> with no perf regressions on day 1 and includes some of the
> >>> big
> >> >>> internal changes mentioned previously.
> >> >>>
> >> >>> This will set the community up well for some awesome and
> >>> exciting
> >> >>> stuff that will still be in the pipeline if it doesn't make
> >>> it to
> >> 4.0.
> >> >>
> >> >> That sounds great to me, too.
> >> >>
> >> >> – Scott
> >> >
> >> > 
> >> -
> >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >> >
> >>
> >> 

Re: Repair scheduling tools

2018-04-05 Thread benjamin roth
I don't say reaper is the problem. I don't want to do wrong to Reaper but
in the end it is "just" an instrumentation for CS's built in repairs that
slices and schedules, right?
The problem I see is that the built in repairs are rather inefficient (for
many, maybe not all use cases) due to many reasons. To name some of them:

- Overstreaming as only whole partitions are repaired, not single mutations
- Race conditions in merkle tree calculation on nodes taking part in a
repair session
- Every stream creates a SSTable, needing to be compacted
- Possible SSTable creation floods can even kill a node due to "too many
open files" - yes we had that
- Incremental repairs have issues

Today we had a super simple case where I first ran 'nodetool repair' on a
super small system keyspace and then ran a 'scrape-repair':
- nodetool took 4 minutes on a single node
- scraping took 1 sec repairing all nodes together

In the beginning I was twisting my brain how this could be optimized in CS
- in the end going with scraping solved every problem we had.

2018-04-05 20:32 GMT+02:00 Jonathan Haddad :

> To be fair, reaper in 2016 only worked with 2.0 and was just sitting
> around, more or less.
>
> Since then we've had 401 commits changing tens of thousands of lines of
> code, dealing with fault tolerance, repair retries, scalability, etc.
> We've had 1 reaper node managing repairs across dozens of clusters and
> thousands of nodes.  It's a totally different situation today.
>
>
> On Thu, Apr 5, 2018 at 11:17 AM benjamin roth  wrote:
>
> > That would be totally awesome!
> >
> > Not sure if it helps here but for completeness:
> > We completely "dumped" regular repairs - no matter if 'nodetool repair'
> or
> > reaper - and run our own tool that does simply CL_ALL scraping over the
> > whole cluster.
> > It runs now for over a year in production and the only problem we
> > encountered was that we got timeouts when scraping (too) large /
> tombstoned
> > partitions. It turned out that the large partitions weren't even readable
> > with CQL / cqlsh / DevCenter. So that wasn't a problem of the repair. It
> > was rather a design problem. Storing data that can't be read doesn't make
> > sense anyway.
> >
> > What I can tell from our experience:
> > - It works much more reliable than what we had before - also more
> reliable
> > than reaper (state of 2016)
> > - It runs totally smooth and much faster than regular repairs as it only
> > streams what needs to be streamed
> > - It's easily manageable, interruptible, resumable on a very fine-grained
> > level. The only thing you need to do is to store state (KS/CF/Last Token)
> > in a simple storage like redis
> > - It works even pretty well when populating a empty node e.g. when
> changing
> > RFs / bootstrapping DCs
> > - You can easily control the cluster-load by tuning the concurrency of
> the
> > scrape process
> >
> > I don't see a reason for us to ever go back to built-in repairs if they
> > don't improve immensely. In many cases (especially with MVs) they are
> true
> > resource killers.
> >
> > Just my 2 cent and experience.
> >
> > 2018-04-04 17:00 GMT+02:00 Ben Bromhead :
> >
> > > +1 to including the implementation in Cassandra itself. Makes managed
> > > repair a first-class citizen, it nicely rounds out Cassandra's
> > consistency
> > > story and makes it 1000x more likely that repairs will get run.
> > >
> > >
> > >
> > >
> > > On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad  wrote:
> > >
> > > > Implementation details aside, I’m firmly in the “it would be nice of
> C*
> > > > could take care of it” camp.  Reaper is pretty damn easy to use and
> > > people
> > > > *still* don’t put it in prod.
> > > >
> > > >
> > > > > On Apr 4, 2018, at 4:16 AM, Rahul Singh <
> > rahul.xavier.si...@gmail.com>
> > > > wrote:
> > > > >
> > > > > I understand the merits of both approaches. In working with other
> DBs
> > > In
> > > > the “old country” of SQL, we often had to write indexing sequences
> > > manually
> > > > for important tables. It was “built into the product” but in order to
> > > > leverage the maximum benefits of indices we had to have different
> > indices
> > > > other than the clustered (physical index). The process still sucked.
> > It’s
> > > > never perfect.
> > > > >
> > > > > The JVM is already fraught with GC issues and putting another
> process
> > > > being managed in the same heapspace is what I’m worried about.
> > > Technically
> > > > the process could be in the same binary but started as a side Car or
> in
> > > the
> > > > same main process.
> > > > >
> > > > > Consider a process called “cassandra-agent” that’s sitting around
> > with
> > > a
> > > > scheduler based on config or a Cassandra table. Distributed in the
> same
> > > > release. Shell / service scripts would start it. The end user knows
> it
> > > only
> > > > by examining the .sh files. This opens possibilities of 

Re: Roadmap for 4.0

2018-04-05 Thread Aleksey Yeshchenko
June feels a bit too early to me as well.

I personally would go prefer end of August / beginning of September.

+1 to the idea of having a fixed date, though, just not this one.

—
AY

On 5 April 2018 at 19:20:12, Stefan Podkowinski (s...@apache.org) wrote:

June is too early.  


On 05.04.18 19:32, Josh McKenzie wrote:  
> Just as a matter of perspective, I'm personally mentally diffing from  
> when 3.0 hit, not 3.10.  
>  
>> commit 96f407bce56b98cd824d18e32ee012dbb99a0286  
>> Author: T Jake Luciani   
>> Date: Fri Nov 6 14:38:34 2015 -0500  
>> 3.0 release versions  
> While June feels close to today relative to momentum for a release  
> before this discussion, it's certainly long enough from when the  
> previous traditional major released that it doesn't feel "too soon" to  
> me.  
>  
> On Thu, Apr 5, 2018 at 12:46 PM, sankalp kohli  
> wrote:  
>> We can take a look on 1st June how things are then decide if we want to  
>> freeze it and whats in and whats out.  
>>  
>> On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg  wrote:  
>>  
>>> Hi,  
>>>  
>>> +1 to having a feature freeze date. June 1st is earlier than I would have  
>>> picked.  
>>>  
>>> Ariel  
>>>  
>>> On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:  
 +1 here for June 1.  
  
 On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown   
>>> wrote:  
> +1  
>  
> On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston   
> wrote:  
>  
>> +1  
>>  
>> On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:  
>>  
>> Earlier than I’d have personally picked, but I’m +1 too  
>>  
>>  
>>  
>> --  
>> Jeff Jirsa  
>>  
>>  
>> > On Apr 4, 2018, at 5:06 PM, Nate McCall   
> wrote:  
>> >  
>> > Top-posting as I think this summary is on point - thanks,  
>>> Scott!  
> (And  
>> > great to have you back, btw).  
>> >  
>> > It feels to me like we are coalescing on two points:  
>> > 1. June 1 as a freeze for alpha  
>> > 2. "Stable" is the new "Exciting" (and the testing and  
>>> dogfooding  
>> > implied by such before a GA)  
>> >  
>> > How do folks feel about the above points?  
>> >  
>> >  
>> >> Re-raising a point made earlier in the thread by Jeff and  
>>> affirmed  
>> by Josh:  
>> >>  
>> >> –––  
>> >> Jeff:  
>>  A hard date for a feature freeze makes sense, a hard date  
>>> for a  
>> release  
>>  does not.  
>> >>  
>> >> Josh:  
>> >>> Strongly agree. We should also collectively define what  
>>> "Done"  
>> looks like  
>> >>> post freeze so we don't end up in bike-shedding hell like we  
>>> have  
>> in the  
>> >>> past.  
>> >> –––  
>> >>  
>> >> Another way of saying this: ensuring that the 4.0 release is  
>>> of  
>> high quality is more important than cutting the release on a specific  
> date.  
>> >>  
>> >> If we adopt Sylvain's suggestion of freezing features on a  
> "feature  
>> complete" date (modulo a "definition of done" as Josh suggested),  
>>> that  
> will  
>> help us align toward the polish, performance work, and dog-fooding  
>>> needed  
>> to feel great about shipping 4.0. It's a good time to start thinking  
> about  
>> the approaches to testing, profiling, and dog-fooding various  
> contributors  
>> will want to take on before release.  
>> >>  
>> >> I love how Ben put it:  
>> >>  
>> >>> An "exciting" 4.0 release to me is one that is stable and  
>>> usable  
>> >>> with no perf regressions on day 1 and includes some of the  
>>> big  
>> >>> internal changes mentioned previously.  
>> >>>  
>> >>> This will set the community up well for some awesome and  
>>> exciting  
>> >>> stuff that will still be in the pipeline if it doesn't make  
>>> it to  
>> 4.0.  
>> >>  
>> >> That sounds great to me, too.  
>> >>  
>> >> – Scott  
>> >  
>> >   
>> -  
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org  
>> >  
>>  
>>   
> -  
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
>> For additional commands, e-mail: dev-h...@cassandra.apache.org  
>>  
>>  
>>  
>>  
>>  
>>   
>>> -  
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
>> For additional commands, e-mail: dev-h...@cassandra.apache.org  
>>  
>>  
>>> 

Re: Repair scheduling tools

2018-04-05 Thread Jonathan Haddad
To be fair, reaper in 2016 only worked with 2.0 and was just sitting
around, more or less.

Since then we've had 401 commits changing tens of thousands of lines of
code, dealing with fault tolerance, repair retries, scalability, etc.
We've had 1 reaper node managing repairs across dozens of clusters and
thousands of nodes.  It's a totally different situation today.


On Thu, Apr 5, 2018 at 11:17 AM benjamin roth  wrote:

> That would be totally awesome!
>
> Not sure if it helps here but for completeness:
> We completely "dumped" regular repairs - no matter if 'nodetool repair' or
> reaper - and run our own tool that does simply CL_ALL scraping over the
> whole cluster.
> It runs now for over a year in production and the only problem we
> encountered was that we got timeouts when scraping (too) large / tombstoned
> partitions. It turned out that the large partitions weren't even readable
> with CQL / cqlsh / DevCenter. So that wasn't a problem of the repair. It
> was rather a design problem. Storing data that can't be read doesn't make
> sense anyway.
>
> What I can tell from our experience:
> - It works much more reliable than what we had before - also more reliable
> than reaper (state of 2016)
> - It runs totally smooth and much faster than regular repairs as it only
> streams what needs to be streamed
> - It's easily manageable, interruptible, resumable on a very fine-grained
> level. The only thing you need to do is to store state (KS/CF/Last Token)
> in a simple storage like redis
> - It works even pretty well when populating a empty node e.g. when changing
> RFs / bootstrapping DCs
> - You can easily control the cluster-load by tuning the concurrency of the
> scrape process
>
> I don't see a reason for us to ever go back to built-in repairs if they
> don't improve immensely. In many cases (especially with MVs) they are true
> resource killers.
>
> Just my 2 cent and experience.
>
> 2018-04-04 17:00 GMT+02:00 Ben Bromhead :
>
> > +1 to including the implementation in Cassandra itself. Makes managed
> > repair a first-class citizen, it nicely rounds out Cassandra's
> consistency
> > story and makes it 1000x more likely that repairs will get run.
> >
> >
> >
> >
> > On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad  wrote:
> >
> > > Implementation details aside, I’m firmly in the “it would be nice of C*
> > > could take care of it” camp.  Reaper is pretty damn easy to use and
> > people
> > > *still* don’t put it in prod.
> > >
> > >
> > > > On Apr 4, 2018, at 4:16 AM, Rahul Singh <
> rahul.xavier.si...@gmail.com>
> > > wrote:
> > > >
> > > > I understand the merits of both approaches. In working with other DBs
> > In
> > > the “old country” of SQL, we often had to write indexing sequences
> > manually
> > > for important tables. It was “built into the product” but in order to
> > > leverage the maximum benefits of indices we had to have different
> indices
> > > other than the clustered (physical index). The process still sucked.
> It’s
> > > never perfect.
> > > >
> > > > The JVM is already fraught with GC issues and putting another process
> > > being managed in the same heapspace is what I’m worried about.
> > Technically
> > > the process could be in the same binary but started as a side Car or in
> > the
> > > same main process.
> > > >
> > > > Consider a process called “cassandra-agent” that’s sitting around
> with
> > a
> > > scheduler based on config or a Cassandra table. Distributed in the same
> > > release. Shell / service scripts would start it. The end user knows it
> > only
> > > by examining the .sh files. This opens possibilities of including a GUI
> > > hosted in the same process without cluttering the core coolness of
> > > Cassandra.
> > > >
> > > > Best,
> > > >
> > > > --
> > > > Rahul Singh
> > > > rahul.si...@anant.us
> > > >
> > > > Anant Corporation
> > > >
> > > > On Apr 4, 2018, 2:50 AM -0400, Dor Laor , wrote:
> > > >> We at Scylla, implemented repair in a similar way to the Cassandra
> > > reaper.
> > > >> We do
> > > >> that using an external application, written in go that manages
> repair
> > > for
> > > >> multiple clusters
> > > >> and saves the data in an external Scylla cluster. The logic
> resembles
> > > the
> > > >> reaper one with
> > > >> some specific internal sharding optimizations and uses the Scylla
> rest
> > > api.
> > > >>
> > > >> However, I have doubts it's the ideal way. After playing a bit with
> > > >> CockroachDB, I realized
> > > >> it's super nice to have a single binary that repairs itself,
> provides
> > a
> > > GUI
> > > >> and is the core DB.
> > > >>
> > > >> Even while distributed, you can elect a leader node to manage the
> > > repair in
> > > >> a consistent
> > > >> way so the complexity can be reduced to a minimum. Repair can write
> > its
> > > >> status to the
> > > >> system tables and to provide an api for progress, rate control, etc.
> > > >>
> > > >> The big 

Re: Roadmap for 4.0

2018-04-05 Thread Stefan Podkowinski
June is too early.


On 05.04.18 19:32, Josh McKenzie wrote:
> Just as a matter of perspective, I'm personally mentally diffing from
> when 3.0 hit, not 3.10.
>
>> commit 96f407bce56b98cd824d18e32ee012dbb99a0286
>> Author: T Jake Luciani 
>> Date:   Fri Nov 6 14:38:34 2015 -0500
>>  3.0 release versions
> While June feels close to today relative to momentum for a release
> before this discussion, it's certainly long enough from when the
> previous traditional major released that it doesn't feel "too soon" to
> me.
>
> On Thu, Apr 5, 2018 at 12:46 PM, sankalp kohli  wrote:
>> We can take a look on 1st June how things are then decide if we want to
>> freeze it and whats in and whats out.
>>
>> On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg  wrote:
>>
>>> Hi,
>>>
>>> +1 to having a feature freeze date. June 1st is earlier than I would have
>>> picked.
>>>
>>> Ariel
>>>
>>> On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
 +1 here for June 1.

 On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown 
>>> wrote:
> +1
>
> On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston 
> wrote:
>
>> +1
>>
>> On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
>>
>> Earlier than I’d have personally picked, but I’m +1 too
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> > On Apr 4, 2018, at 5:06 PM, Nate McCall 
> wrote:
>> >
>> > Top-posting as I think this summary is on point - thanks,
>>> Scott!
> (And
>> > great to have you back, btw).
>> >
>> > It feels to me like we are coalescing on two points:
>> > 1. June 1 as a freeze for alpha
>> > 2. "Stable" is the new "Exciting" (and the testing and
>>> dogfooding
>> > implied by such before a GA)
>> >
>> > How do folks feel about the above points?
>> >
>> >
>> >> Re-raising a point made earlier in the thread by Jeff and
>>> affirmed
>> by Josh:
>> >>
>> >> –––
>> >> Jeff:
>>  A hard date for a feature freeze makes sense, a hard date
>>> for a
>> release
>>  does not.
>> >>
>> >> Josh:
>> >>> Strongly agree. We should also collectively define what
>>> "Done"
>> looks like
>> >>> post freeze so we don't end up in bike-shedding hell like we
>>> have
>> in the
>> >>> past.
>> >> –––
>> >>
>> >> Another way of saying this: ensuring that the 4.0 release is
>>> of
>> high quality is more important than cutting the release on a specific
> date.
>> >>
>> >> If we adopt Sylvain's suggestion of freezing features on a
> "feature
>> complete" date (modulo a "definition of done" as Josh suggested),
>>> that
> will
>> help us align toward the polish, performance work, and dog-fooding
>>> needed
>> to feel great about shipping 4.0. It's a good time to start thinking
> about
>> the approaches to testing, profiling, and dog-fooding various
> contributors
>> will want to take on before release.
>> >>
>> >> I love how Ben put it:
>> >>
>> >>> An "exciting" 4.0 release to me is one that is stable and
>>> usable
>> >>> with no perf regressions on day 1 and includes some of the
>>> big
>> >>> internal changes mentioned previously.
>> >>>
>> >>> This will set the community up well for some awesome and
>>> exciting
>> >>> stuff that will still be in the pipeline if it doesn't make
>>> it to
>> 4.0.
>> >>
>> >> That sounds great to me, too.
>> >>
>> >> – Scott
>> >
>> > 
>> -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>>
>> 
> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>>
>>
>>
>> 
>>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>>>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, 

Re: Repair scheduling tools

2018-04-05 Thread benjamin roth
That would be totally awesome!

Not sure if it helps here but for completeness:
We completely "dumped" regular repairs - no matter if 'nodetool repair' or
reaper - and run our own tool that does simply CL_ALL scraping over the
whole cluster.
It runs now for over a year in production and the only problem we
encountered was that we got timeouts when scraping (too) large / tombstoned
partitions. It turned out that the large partitions weren't even readable
with CQL / cqlsh / DevCenter. So that wasn't a problem of the repair. It
was rather a design problem. Storing data that can't be read doesn't make
sense anyway.

What I can tell from our experience:
- It works much more reliable than what we had before - also more reliable
than reaper (state of 2016)
- It runs totally smooth and much faster than regular repairs as it only
streams what needs to be streamed
- It's easily manageable, interruptible, resumable on a very fine-grained
level. The only thing you need to do is to store state (KS/CF/Last Token)
in a simple storage like redis
- It works even pretty well when populating a empty node e.g. when changing
RFs / bootstrapping DCs
- You can easily control the cluster-load by tuning the concurrency of the
scrape process

I don't see a reason for us to ever go back to built-in repairs if they
don't improve immensely. In many cases (especially with MVs) they are true
resource killers.

Just my 2 cent and experience.

2018-04-04 17:00 GMT+02:00 Ben Bromhead :

> +1 to including the implementation in Cassandra itself. Makes managed
> repair a first-class citizen, it nicely rounds out Cassandra's consistency
> story and makes it 1000x more likely that repairs will get run.
>
>
>
>
> On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad  wrote:
>
> > Implementation details aside, I’m firmly in the “it would be nice of C*
> > could take care of it” camp.  Reaper is pretty damn easy to use and
> people
> > *still* don’t put it in prod.
> >
> >
> > > On Apr 4, 2018, at 4:16 AM, Rahul Singh 
> > wrote:
> > >
> > > I understand the merits of both approaches. In working with other DBs
> In
> > the “old country” of SQL, we often had to write indexing sequences
> manually
> > for important tables. It was “built into the product” but in order to
> > leverage the maximum benefits of indices we had to have different indices
> > other than the clustered (physical index). The process still sucked. It’s
> > never perfect.
> > >
> > > The JVM is already fraught with GC issues and putting another process
> > being managed in the same heapspace is what I’m worried about.
> Technically
> > the process could be in the same binary but started as a side Car or in
> the
> > same main process.
> > >
> > > Consider a process called “cassandra-agent” that’s sitting around with
> a
> > scheduler based on config or a Cassandra table. Distributed in the same
> > release. Shell / service scripts would start it. The end user knows it
> only
> > by examining the .sh files. This opens possibilities of including a GUI
> > hosted in the same process without cluttering the core coolness of
> > Cassandra.
> > >
> > > Best,
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Apr 4, 2018, 2:50 AM -0400, Dor Laor , wrote:
> > >> We at Scylla, implemented repair in a similar way to the Cassandra
> > reaper.
> > >> We do
> > >> that using an external application, written in go that manages repair
> > for
> > >> multiple clusters
> > >> and saves the data in an external Scylla cluster. The logic resembles
> > the
> > >> reaper one with
> > >> some specific internal sharding optimizations and uses the Scylla rest
> > api.
> > >>
> > >> However, I have doubts it's the ideal way. After playing a bit with
> > >> CockroachDB, I realized
> > >> it's super nice to have a single binary that repairs itself, provides
> a
> > GUI
> > >> and is the core DB.
> > >>
> > >> Even while distributed, you can elect a leader node to manage the
> > repair in
> > >> a consistent
> > >> way so the complexity can be reduced to a minimum. Repair can write
> its
> > >> status to the
> > >> system tables and to provide an api for progress, rate control, etc.
> > >>
> > >> The big advantage for repair to embedded in the core is that there is
> no
> > >> need to expose
> > >> internal state to the repair logic. So an external program doesn't
> need
> > to
> > >> deal with different
> > >> version of Cassandra, different repair capabilities of the core (such
> as
> > >> incremental on/off)
> > >> and so forth. A good database should schedule its own repair, it knows
> > >> whether the shreshold
> > >> of hintedhandoff was cross or not, it knows whether nodes where
> > replaced,
> > >> etc,
> > >>
> > >> My 2 cents. Dor
> > >>
> > >> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
> > >> dinesh.jo...@yahoo.com.invalid> wrote:
> > >>
> > >>> Simon,
> > >>> 

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
I think it's informative that Dor, Vinay, and I who have built sidecar
repair systems think that it's crucial to have the scheduling component in
the same process as the repair execution component. Like I said in the
ticket/design, it is *really* hard for repair scheduling process to
determine the internal state of the repair execution process. In our
current production system we have significant complexity in the code to
account for the differing daemon/sidecar life-cycles, repair state loss,
flakey JMX connections, authentication for the sidecar to speak JMX and
CQL, etc... It does seem though like there is significant concern that we
can't iterate quickly in the main process, and it would be easier to
iterate as a tool/sidecar, so I'll spend some time this week sketching out
in the design the additional components and resiliency factors required to
put the scheduler into such a tool.

I do have a hard time buying that an opt-in repair *scheduling* is going to
cause heap problems or impact the daemon significantly; the scheduler
literally reads a few bytes out of a Cassandra table and makes a function
call or two, and then sleeps for 2 minutes. Repair *execution* is the
actual heap intense part and is already part of the Cassandra daemon. If
the concern is that users will start actually running repair and expose
heap issues in repair, then that's great; let's fix it!

If we had a Cassandra sidecar I think it would generally be great to move
all the background tasks (compaction, repair, streaming, backup, etc...)
into the sidecar to cleanly separate the "latency critical" process from
the "throughput critical" process. This would also be great from an ops
perspective because you could choose to run the sidecar in a cgroup to
control usage of network, cpu and ram (you could even pin compaction and
repair to dedicated cores so that they do not interfere with then main
process), and you could upgrade the background process much more easily
with less risk. I think a key part of this though is the leading "if", as
far as I know we don't have a ticket or concrete proposal for a dedicated
Cassandra sidecar. Separately, sidecars are actually hard to do well, but I
think it's still a good direction for Cassandra to go longer term.

-Joey

On Wed, Apr 4, 2018 at 8:00 AM, Ben Bromhead  wrote:

> +1 to including the implementation in Cassandra itself. Makes managed
> repair a first-class citizen, it nicely rounds out Cassandra's consistency
> story and makes it 1000x more likely that repairs will get run.
>
>
>
>
> On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad  wrote:
>
> > Implementation details aside, I’m firmly in the “it would be nice of C*
> > could take care of it” camp.  Reaper is pretty damn easy to use and
> people
> > *still* don’t put it in prod.
> >
> >
> > > On Apr 4, 2018, at 4:16 AM, Rahul Singh 
> > wrote:
> > >
> > > I understand the merits of both approaches. In working with other DBs
> In
> > the “old country” of SQL, we often had to write indexing sequences
> manually
> > for important tables. It was “built into the product” but in order to
> > leverage the maximum benefits of indices we had to have different indices
> > other than the clustered (physical index). The process still sucked. It’s
> > never perfect.
> > >
> > > The JVM is already fraught with GC issues and putting another process
> > being managed in the same heapspace is what I’m worried about.
> Technically
> > the process could be in the same binary but started as a side Car or in
> the
> > same main process.
> > >
> > > Consider a process called “cassandra-agent” that’s sitting around with
> a
> > scheduler based on config or a Cassandra table. Distributed in the same
> > release. Shell / service scripts would start it. The end user knows it
> only
> > by examining the .sh files. This opens possibilities of including a GUI
> > hosted in the same process without cluttering the core coolness of
> > Cassandra.
> > >
> > > Best,
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Apr 4, 2018, 2:50 AM -0400, Dor Laor , wrote:
> > >> We at Scylla, implemented repair in a similar way to the Cassandra
> > reaper.
> > >> We do
> > >> that using an external application, written in go that manages repair
> > for
> > >> multiple clusters
> > >> and saves the data in an external Scylla cluster. The logic resembles
> > the
> > >> reaper one with
> > >> some specific internal sharding optimizations and uses the Scylla rest
> > api.
> > >>
> > >> However, I have doubts it's the ideal way. After playing a bit with
> > >> CockroachDB, I realized
> > >> it's super nice to have a single binary that repairs itself, provides
> a
> > GUI
> > >> and is the core DB.
> > >>
> > >> Even while distributed, you can elect a leader node to manage the
> > repair in
> > >> a consistent
> > >> way so the complexity can be 

Re: Roadmap for 4.0

2018-04-05 Thread Michael Shuler
On 04/05/2018 12:32 PM, Josh McKenzie wrote:
> Just as a matter of perspective, I'm personally mentally diffing from
> when 3.0 hit, not 3.10.
> 
>> commit 96f407bce56b98cd824d18e32ee012dbb99a0286
>> Author: T Jake Luciani 
>> Date:   Fri Nov 6 14:38:34 2015 -0500
>>  3.0 release versions
> 
> While June feels close to today relative to momentum for a release
> before this discussion, it's certainly long enough from when the
> previous traditional major released that it doesn't feel "too soon" to
> me.

Since I couldn't recall the dates, I went to go look. Just a little
additional history:


mshuler@hana:~/svn/cassandra$ svn log -r r1772680

r1772680 | mshuler | 2016-12-05 08:35:47 -0600 (Mon, 05 Dec 2016) | 2 lines

Change EOLs to "after 4.0 release (date TBD)"


mshuler@hana:~/svn/cassandra$ svn diff -c r1772680 site/src/download.md
Index: site/src/download.md
===
--- site/src/download.md(revision 1772679)
+++ site/src/download.md(revision 1772680)
@@ -21,9 +21,9 @@

 The following older Cassandra releases are still supported:

-* Apache Cassandra 3.0 is supported until **May 2017**. The latest
release is {{ "3.0" | full_release_link }}.
-* Apache Cassandra 2.2 is supported until **November 2016**. The latest
release is {{ "2.2" | full_release_link }}.
-* Apache Cassandra 2.1 is supported until **November 2016** with
**critical fixes only**. The latest release is
+* Apache Cassandra 3.0 is supported until **6 months after 4.0 release
(date TBD)**. The latest release is {{ "3.0" | full_release_link }}.
+* Apache Cassandra 2.2 is supported until **4.0 release (date TBD)**.
The latest release is {{ "2.2" | full_release_link }}.
+* Apache Cassandra 2.1 is supported until **4.0 release (date TBD)**
with **critical fixes only**. The latest release is
   {{ "2.1" | full_release_link }}.

 Older (unsupported) versions of Cassandra are [archived
here](http://archive.apache.org/dist/cassandra/).

-- 
Warm regards,
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-05 Thread Josh McKenzie
Just as a matter of perspective, I'm personally mentally diffing from
when 3.0 hit, not 3.10.

> commit 96f407bce56b98cd824d18e32ee012dbb99a0286
> Author: T Jake Luciani 
> Date:   Fri Nov 6 14:38:34 2015 -0500
>  3.0 release versions

While June feels close to today relative to momentum for a release
before this discussion, it's certainly long enough from when the
previous traditional major released that it doesn't feel "too soon" to
me.

On Thu, Apr 5, 2018 at 12:46 PM, sankalp kohli  wrote:
> We can take a look on 1st June how things are then decide if we want to
> freeze it and whats in and whats out.
>
> On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg  wrote:
>
>> Hi,
>>
>> +1 to having a feature freeze date. June 1st is earlier than I would have
>> picked.
>>
>> Ariel
>>
>> On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
>> > +1 here for June 1.
>> >
>> > On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown 
>> wrote:
>> >
>> > > +1
>> > >
>> > > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston 
>> > > wrote:
>> > >
>> > > > +1
>> > > >
>> > > > On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
>> > > >
>> > > > Earlier than I’d have personally picked, but I’m +1 too
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Jeff Jirsa
>> > > >
>> > > >
>> > > > > On Apr 4, 2018, at 5:06 PM, Nate McCall 
>> > > wrote:
>> > > > >
>> > > > > Top-posting as I think this summary is on point - thanks,
>> Scott!
>> > > (And
>> > > > > great to have you back, btw).
>> > > > >
>> > > > > It feels to me like we are coalescing on two points:
>> > > > > 1. June 1 as a freeze for alpha
>> > > > > 2. "Stable" is the new "Exciting" (and the testing and
>> dogfooding
>> > > > > implied by such before a GA)
>> > > > >
>> > > > > How do folks feel about the above points?
>> > > > >
>> > > > >
>> > > > >> Re-raising a point made earlier in the thread by Jeff and
>> affirmed
>> > > > by Josh:
>> > > > >>
>> > > > >> –––
>> > > > >> Jeff:
>> > > >  A hard date for a feature freeze makes sense, a hard date
>> for a
>> > > > release
>> > > >  does not.
>> > > > >>
>> > > > >> Josh:
>> > > > >>> Strongly agree. We should also collectively define what
>> "Done"
>> > > > looks like
>> > > > >>> post freeze so we don't end up in bike-shedding hell like we
>> have
>> > > > in the
>> > > > >>> past.
>> > > > >> –––
>> > > > >>
>> > > > >> Another way of saying this: ensuring that the 4.0 release is
>> of
>> > > > high quality is more important than cutting the release on a specific
>> > > date.
>> > > > >>
>> > > > >> If we adopt Sylvain's suggestion of freezing features on a
>> > > "feature
>> > > > complete" date (modulo a "definition of done" as Josh suggested),
>> that
>> > > will
>> > > > help us align toward the polish, performance work, and dog-fooding
>> needed
>> > > > to feel great about shipping 4.0. It's a good time to start thinking
>> > > about
>> > > > the approaches to testing, profiling, and dog-fooding various
>> > > contributors
>> > > > will want to take on before release.
>> > > > >>
>> > > > >> I love how Ben put it:
>> > > > >>
>> > > > >>> An "exciting" 4.0 release to me is one that is stable and
>> usable
>> > > > >>> with no perf regressions on day 1 and includes some of the
>> big
>> > > > >>> internal changes mentioned previously.
>> > > > >>>
>> > > > >>> This will set the community up well for some awesome and
>> exciting
>> > > > >>> stuff that will still be in the pipeline if it doesn't make
>> it to
>> > > > 4.0.
>> > > > >>
>> > > > >> That sounds great to me, too.
>> > > > >>
>> > > > >> – Scott
>> > > > >
>> > > > > 
>> > > > -
>> > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> > > > >
>> > > >
>> > > > 
>> > > -
>> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > 
>> -
>> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> > > >
>> > > >
>> > >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>

-
To 

Re: Roadmap for 4.0

2018-04-05 Thread sankalp kohli
We can take a look on 1st June how things are then decide if we want to
freeze it and whats in and whats out.

On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg  wrote:

> Hi,
>
> +1 to having a feature freeze date. June 1st is earlier than I would have
> picked.
>
> Ariel
>
> On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
> > +1 here for June 1.
> >
> > On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown 
> wrote:
> >
> > > +1
> > >
> > > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
> > > >
> > > > Earlier than I’d have personally picked, but I’m +1 too
> > > >
> > > >
> > > >
> > > > --
> > > > Jeff Jirsa
> > > >
> > > >
> > > > > On Apr 4, 2018, at 5:06 PM, Nate McCall 
> > > wrote:
> > > > >
> > > > > Top-posting as I think this summary is on point - thanks,
> Scott!
> > > (And
> > > > > great to have you back, btw).
> > > > >
> > > > > It feels to me like we are coalescing on two points:
> > > > > 1. June 1 as a freeze for alpha
> > > > > 2. "Stable" is the new "Exciting" (and the testing and
> dogfooding
> > > > > implied by such before a GA)
> > > > >
> > > > > How do folks feel about the above points?
> > > > >
> > > > >
> > > > >> Re-raising a point made earlier in the thread by Jeff and
> affirmed
> > > > by Josh:
> > > > >>
> > > > >> –––
> > > > >> Jeff:
> > > >  A hard date for a feature freeze makes sense, a hard date
> for a
> > > > release
> > > >  does not.
> > > > >>
> > > > >> Josh:
> > > > >>> Strongly agree. We should also collectively define what
> "Done"
> > > > looks like
> > > > >>> post freeze so we don't end up in bike-shedding hell like we
> have
> > > > in the
> > > > >>> past.
> > > > >> –––
> > > > >>
> > > > >> Another way of saying this: ensuring that the 4.0 release is
> of
> > > > high quality is more important than cutting the release on a specific
> > > date.
> > > > >>
> > > > >> If we adopt Sylvain's suggestion of freezing features on a
> > > "feature
> > > > complete" date (modulo a "definition of done" as Josh suggested),
> that
> > > will
> > > > help us align toward the polish, performance work, and dog-fooding
> needed
> > > > to feel great about shipping 4.0. It's a good time to start thinking
> > > about
> > > > the approaches to testing, profiling, and dog-fooding various
> > > contributors
> > > > will want to take on before release.
> > > > >>
> > > > >> I love how Ben put it:
> > > > >>
> > > > >>> An "exciting" 4.0 release to me is one that is stable and
> usable
> > > > >>> with no perf regressions on day 1 and includes some of the
> big
> > > > >>> internal changes mentioned previously.
> > > > >>>
> > > > >>> This will set the community up well for some awesome and
> exciting
> > > > >>> stuff that will still be in the pipeline if it doesn't make
> it to
> > > > 4.0.
> > > > >>
> > > > >> That sounds great to me, too.
> > > > >>
> > > > >> – Scott
> > > > >
> > > > > 
> > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > >
> > > >
> > > > 
> > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 
> -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Roadmap for 4.0

2018-04-05 Thread Ariel Weisberg
Hi,

+1 to having a feature freeze date. June 1st is earlier than I would have 
picked.

Ariel

On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
> +1 here for June 1.
> 
> On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown  wrote:
> 
> > +1
> >
> > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston 
> > wrote:
> >
> > > +1
> > >
> > > On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
> > >
> > > Earlier than I’d have personally picked, but I’m +1 too
> > >
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Apr 4, 2018, at 5:06 PM, Nate McCall 
> > wrote:
> > > >
> > > > Top-posting as I think this summary is on point - thanks, Scott!
> > (And
> > > > great to have you back, btw).
> > > >
> > > > It feels to me like we are coalescing on two points:
> > > > 1. June 1 as a freeze for alpha
> > > > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > > > implied by such before a GA)
> > > >
> > > > How do folks feel about the above points?
> > > >
> > > >
> > > >> Re-raising a point made earlier in the thread by Jeff and affirmed
> > > by Josh:
> > > >>
> > > >> –––
> > > >> Jeff:
> > >  A hard date for a feature freeze makes sense, a hard date for a
> > > release
> > >  does not.
> > > >>
> > > >> Josh:
> > > >>> Strongly agree. We should also collectively define what "Done"
> > > looks like
> > > >>> post freeze so we don't end up in bike-shedding hell like we have
> > > in the
> > > >>> past.
> > > >> –––
> > > >>
> > > >> Another way of saying this: ensuring that the 4.0 release is of
> > > high quality is more important than cutting the release on a specific
> > date.
> > > >>
> > > >> If we adopt Sylvain's suggestion of freezing features on a
> > "feature
> > > complete" date (modulo a "definition of done" as Josh suggested), that
> > will
> > > help us align toward the polish, performance work, and dog-fooding needed
> > > to feel great about shipping 4.0. It's a good time to start thinking
> > about
> > > the approaches to testing, profiling, and dog-fooding various
> > contributors
> > > will want to take on before release.
> > > >>
> > > >> I love how Ben put it:
> > > >>
> > > >>> An "exciting" 4.0 release to me is one that is stable and usable
> > > >>> with no perf regressions on day 1 and includes some of the big
> > > >>> internal changes mentioned previously.
> > > >>>
> > > >>> This will set the community up well for some awesome and exciting
> > > >>> stuff that will still be in the pipeline if it doesn't make it to
> > > 4.0.
> > > >>
> > > >> That sounds great to me, too.
> > > >>
> > > >> – Scott
> > > >
> > > > 
> > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > >
> > > 
> > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-05 Thread Josh McKenzie
+1 here for June 1.

On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown  wrote:

> +1
>
> On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston 
> wrote:
>
> > +1
> >
> > On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
> >
> > Earlier than I’d have personally picked, but I’m +1 too
> >
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Apr 4, 2018, at 5:06 PM, Nate McCall 
> wrote:
> > >
> > > Top-posting as I think this summary is on point - thanks, Scott!
> (And
> > > great to have you back, btw).
> > >
> > > It feels to me like we are coalescing on two points:
> > > 1. June 1 as a freeze for alpha
> > > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > > implied by such before a GA)
> > >
> > > How do folks feel about the above points?
> > >
> > >
> > >> Re-raising a point made earlier in the thread by Jeff and affirmed
> > by Josh:
> > >>
> > >> –––
> > >> Jeff:
> >  A hard date for a feature freeze makes sense, a hard date for a
> > release
> >  does not.
> > >>
> > >> Josh:
> > >>> Strongly agree. We should also collectively define what "Done"
> > looks like
> > >>> post freeze so we don't end up in bike-shedding hell like we have
> > in the
> > >>> past.
> > >> –––
> > >>
> > >> Another way of saying this: ensuring that the 4.0 release is of
> > high quality is more important than cutting the release on a specific
> date.
> > >>
> > >> If we adopt Sylvain's suggestion of freezing features on a
> "feature
> > complete" date (modulo a "definition of done" as Josh suggested), that
> will
> > help us align toward the polish, performance work, and dog-fooding needed
> > to feel great about shipping 4.0. It's a good time to start thinking
> about
> > the approaches to testing, profiling, and dog-fooding various
> contributors
> > will want to take on before release.
> > >>
> > >> I love how Ben put it:
> > >>
> > >>> An "exciting" 4.0 release to me is one that is stable and usable
> > >>> with no perf regressions on day 1 and includes some of the big
> > >>> internal changes mentioned previously.
> > >>>
> > >>> This will set the community up well for some awesome and exciting
> > >>> stuff that will still be in the pipeline if it doesn't make it to
> > 4.0.
> > >>
> > >> That sounds great to me, too.
> > >>
> > >> – Scott
> > >
> > > 
> > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> > 
> -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: Roadmap for 4.0

2018-04-05 Thread Jason Brown
+1

On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston 
wrote:

> +1
>
> On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
>
> Earlier than I’d have personally picked, but I’m +1 too
>
>
>
> --
> Jeff Jirsa
>
>
> > On Apr 4, 2018, at 5:06 PM, Nate McCall  wrote:
> >
> > Top-posting as I think this summary is on point - thanks, Scott! (And
> > great to have you back, btw).
> >
> > It feels to me like we are coalescing on two points:
> > 1. June 1 as a freeze for alpha
> > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > implied by such before a GA)
> >
> > How do folks feel about the above points?
> >
> >
> >> Re-raising a point made earlier in the thread by Jeff and affirmed
> by Josh:
> >>
> >> –––
> >> Jeff:
>  A hard date for a feature freeze makes sense, a hard date for a
> release
>  does not.
> >>
> >> Josh:
> >>> Strongly agree. We should also collectively define what "Done"
> looks like
> >>> post freeze so we don't end up in bike-shedding hell like we have
> in the
> >>> past.
> >> –––
> >>
> >> Another way of saying this: ensuring that the 4.0 release is of
> high quality is more important than cutting the release on a specific date.
> >>
> >> If we adopt Sylvain's suggestion of freezing features on a "feature
> complete" date (modulo a "definition of done" as Josh suggested), that will
> help us align toward the polish, performance work, and dog-fooding needed
> to feel great about shipping 4.0. It's a good time to start thinking about
> the approaches to testing, profiling, and dog-fooding various contributors
> will want to take on before release.
> >>
> >> I love how Ben put it:
> >>
> >>> An "exciting" 4.0 release to me is one that is stable and usable
> >>> with no perf regressions on day 1 and includes some of the big
> >>> internal changes mentioned previously.
> >>>
> >>> This will set the community up well for some awesome and exciting
> >>> stuff that will still be in the pipeline if it doesn't make it to
> 4.0.
> >>
> >> That sounds great to me, too.
> >>
> >> – Scott
> >
> > 
> -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>