Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread kurt greaves
To add to the above, hackathons would make sense in the lead up to the feature freeze IMO to get things through the door, but not necessarily afterwards (debatable). And they can be flexible too; I suspect once people start on reviewing something they'll be much more likely to see it through to

Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread kurt greaves
I like the idea and we would be willing to take part (no committers here but I'm sure we can help). I think it better to pick some JIRAs per 2-3 weeks and have people review > them. In my experience, it is hard to synchronize all people across > companies during one 72 hour slot. It is hard,

Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread Nate McCall
That could work as well. My goal is that we figure out how to resource and focus on this for a bit. On Fri, Apr 6, 2018, 5:02 PM sankalp kohli wrote: > I think it better to pick some JIRAs per 2-3 weeks and have people review > them. In my experience, it is hard to

Re: [Discuss] patch review virtual hackathon

2018-04-05 Thread sankalp kohli
I think it better to pick some JIRAs per 2-3 weeks and have people review them. In my experience, it is hard to synchronize all people across companies during one 72 hour slot. On Thu, Apr 5, 2018 at 9:48 PM, Nate McCall wrote: > Per Kurt's point in our release thread, we

[Discuss] patch review virtual hackathon

2018-04-05 Thread Nate McCall
Per Kurt's point in our release thread, we have a lot to do here. What do folks feel about setting aside a 72hr period at some point soon where we get some allotment from our employers to spend a window or two of time therein reviewing patches? I have seen a couple of other ASF communities do

Re: Roadmap for 4.0

2018-04-05 Thread kurt greaves
> > Lay our cards on the table about what we want included in 4.0 and work to > get those in Are you saying we're back to where we started?  For those wanting to delay, are we just dancing around inclusion of > some pet features? This is fine, I just think we need to communicate > what we

Re: Repair scheduling tools

2018-04-05 Thread kurt greaves
Vnodes is related and because we made it a default lots of people are using it. Repairing a cluster with vnodes is a catastrophe (even a small one is often problematic), but we have to deal with it if we build in repair scheduling. Repair scheduling is very important and we should definitely

Re: Roadmap for 4.0

2018-04-05 Thread Nate McCall
>> >> So long as non-user-visible improvements, including big ones, can still go >> in 4.0 at that stage, I’m all for it. > > > My understanding is that after June 1st the 4.0 branch would be created and > would be bugfix only. It's not really a feature freeze if you allow > improvements after

Re: Roadmap for 4.0

2018-04-05 Thread kurt greaves
> > So long as non-user-visible improvements, including big ones, can still go > in 4.0 at that stage, I’m all for it. My understanding is that after June 1st the 4.0 branch would be created and would be bugfix only. It's not really a feature freeze if you allow improvements after that, which is

Re: Repair scheduling tools

2018-04-05 Thread Nate McCall
I think a take away here is that we can't assume a level of operation maturity will coincide automatically with scale. To make our core features robust, we have to account for less-experienced users. A lot of folks on this thread have *really* strong ops and OpsViz stories. Let's not forget that

Re: Repair scheduling tools

2018-04-05 Thread Jonathan Haddad
Off the top of my head I can remember clusters with 600 or 700 nodes with 256 tokens. Not the best situation, but it’s real. 256 has been the default for better or worse. On Thu, Apr 5, 2018 at 7:41 PM Joseph Lynch wrote: > > > > We see this in larger clusters regularly.

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
> > We see this in larger clusters regularly. Usually folks have just > 'grown into it' because it was the default. > I could understand a few dozen nodes with 256 vnodes, but hundreds is surprising. I have a whitepaper draft lying around showing how vnodes decrease availability in large clusters

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
Sorry sent early. To explain further, the scheduler is entirely decentralized in the proposed design, and no node holds all the information you're talking about in heap at once (in fact no one node would ever hold that information). Each node is responsible only for tokens that they are "primary"

Re: Repair scheduling tools

2018-04-05 Thread Nate McCall
> > Somewhat beside the point, I wasn't aware there were any 100 node + > clusters running with vnodes, if my math is correct they would be > excessively vulnerable to outages with that many vnodes and that many > nodes. Most of the large clusters I've heard of (100 nodes plus) are > running with

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
> > I wouldn't trivialize it, scheduling can end up dealing with more than a > single repair. If theres 1000 keyspace/tables, with 400 nodes and 256 > vnodes on each thats a lot of repairs to plan out and keep track of and can > easily cause heap allocation spikes if opted in. > > Chris The

Re: Repair scheduling tools

2018-04-05 Thread Chris Lohfink
> I do have a hard time buying that an opt-in repair *scheduling* is going to > cause heap problems or impact the daemon significantly; the scheduler > literally reads a few bytes out of a Cassandra table and makes a function > call or two, and then sleeps for 2 minutes. I wouldn't trivialize

Re: Roadmap for 4.0

2018-04-05 Thread Aleksey Yeshchenko
So long as non-user-visible improvements, including big ones, can still go in 4.0 at that stage, I’m all for it. — AY On 5 April 2018 at 21:14:03, Nate McCall (zznat...@gmail.com) wrote: >>> My understanding, from Nate's summary, was June 1 is the freeze date for >>> features. I expect we

Re: Roadmap for 4.0

2018-04-05 Thread Nate McCall
>>> My understanding, from Nate's summary, was June 1 is the freeze date for >>> features. I expect we would go for at least 4 months (if not longer) >>> testing, fixing bugs, early dogfooding, and so on. I also equated June 1 >>> with the data which we would create a 'cassandra-4.0' branch, and

Re: Repair scheduling tools

2018-04-05 Thread Rahul Singh
Simpler scheduler is never simple. I agree in principle — ala “Cassandra-Agent” which could manage any order of tasks, schedules, etc needing to prune and manage the C* engine. Cassandra has enough TPs, it needs to manage already. On Apr 5, 2018, 3:09 PM -0400, Joseph Lynch

Re: Roadmap for 4.0

2018-04-05 Thread Josh McKenzie
I'm in line w/your thinking here Jason. On Thu, Apr 5, 2018 at 3:25 PM, Jonathan Haddad wrote: > That’s exactly what I was thinking too. > > There’s also nothing preventing features from being merged into trunk after > we create the 4.0 branch, which in my opinion is a better

Re: Roadmap for 4.0

2018-04-05 Thread Jonathan Haddad
That’s exactly what I was thinking too. There’s also nothing preventing features from being merged into trunk after we create the 4.0 branch, which in my opinion is a better approach than trying to jam everything in right before the release. On Thu, Apr 5, 2018 at 12:06 PM Jason Brown

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
I think that getting into the various repair strategies in this discussion is perhaps orthogonal to how we schedule repair. Whether we end up with incremental, full, tickers (read @ALL), continuous repair, mutation based

Re: Roadmap for 4.0

2018-04-05 Thread Jason Brown
My understanding, from Nate's summary, was June 1 is the freeze date for features. I expect we would go for at least 4 months (if not longer) testing, fixing bugs, early dogfooding, and so on. I also equated June 1 with the data which we would create a 'cassandra-4.0' branch, and thus the merge

Re: Repair scheduling tools

2018-04-05 Thread benjamin roth
I don't say reaper is the problem. I don't want to do wrong to Reaper but in the end it is "just" an instrumentation for CS's built in repairs that slices and schedules, right? The problem I see is that the built in repairs are rather inefficient (for many, maybe not all use cases) due to many

Re: Roadmap for 4.0

2018-04-05 Thread Aleksey Yeshchenko
June feels a bit too early to me as well. I personally would go prefer end of August / beginning of September. +1 to the idea of having a fixed date, though, just not this one. — AY On 5 April 2018 at 19:20:12, Stefan Podkowinski (s...@apache.org) wrote: June is too early. On 05.04.18

Re: Repair scheduling tools

2018-04-05 Thread Jonathan Haddad
To be fair, reaper in 2016 only worked with 2.0 and was just sitting around, more or less. Since then we've had 401 commits changing tens of thousands of lines of code, dealing with fault tolerance, repair retries, scalability, etc. We've had 1 reaper node managing repairs across dozens of

Re: Roadmap for 4.0

2018-04-05 Thread Stefan Podkowinski
June is too early. On 05.04.18 19:32, Josh McKenzie wrote: > Just as a matter of perspective, I'm personally mentally diffing from > when 3.0 hit, not 3.10. > >> commit 96f407bce56b98cd824d18e32ee012dbb99a0286 >> Author: T Jake Luciani >> Date: Fri Nov 6 14:38:34 2015 -0500

Re: Repair scheduling tools

2018-04-05 Thread benjamin roth
That would be totally awesome! Not sure if it helps here but for completeness: We completely "dumped" regular repairs - no matter if 'nodetool repair' or reaper - and run our own tool that does simply CL_ALL scraping over the whole cluster. It runs now for over a year in production and the only

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch
I think it's informative that Dor, Vinay, and I who have built sidecar repair systems think that it's crucial to have the scheduling component in the same process as the repair execution component. Like I said in the ticket/design, it is *really* hard for repair scheduling process to determine the

Re: Roadmap for 4.0

2018-04-05 Thread Michael Shuler
On 04/05/2018 12:32 PM, Josh McKenzie wrote: > Just as a matter of perspective, I'm personally mentally diffing from > when 3.0 hit, not 3.10. > >> commit 96f407bce56b98cd824d18e32ee012dbb99a0286 >> Author: T Jake Luciani >> Date: Fri Nov 6 14:38:34 2015 -0500 >> 3.0

Re: Roadmap for 4.0

2018-04-05 Thread Josh McKenzie
Just as a matter of perspective, I'm personally mentally diffing from when 3.0 hit, not 3.10. > commit 96f407bce56b98cd824d18e32ee012dbb99a0286 > Author: T Jake Luciani > Date: Fri Nov 6 14:38:34 2015 -0500 > 3.0 release versions While June feels close to today relative

Re: Roadmap for 4.0

2018-04-05 Thread sankalp kohli
We can take a look on 1st June how things are then decide if we want to freeze it and whats in and whats out. On Thu, Apr 5, 2018 at 9:31 AM, Ariel Weisberg wrote: > Hi, > > +1 to having a feature freeze date. June 1st is earlier than I would have > picked. > > Ariel > > On

Re: Roadmap for 4.0

2018-04-05 Thread Ariel Weisberg
Hi, +1 to having a feature freeze date. June 1st is earlier than I would have picked. Ariel On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote: > +1 here for June 1. > > On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown wrote: > > > +1 > > > > On Wed, Apr 4, 2018 at 8:31

Re: Roadmap for 4.0

2018-04-05 Thread Josh McKenzie
+1 here for June 1. On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown wrote: > +1 > > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston > wrote: > > > +1 > > > > On 4/4/18, 5:48 PM, "Jeff Jirsa" wrote: > > > > Earlier than I’d have

Re: Roadmap for 4.0

2018-04-05 Thread Jason Brown
+1 On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston wrote: > +1 > > On 4/4/18, 5:48 PM, "Jeff Jirsa" wrote: > > Earlier than I’d have personally picked, but I’m +1 too > > > > -- > Jeff Jirsa > > > > On Apr 4, 2018, at 5:06 PM, Nate