Re: Repair scheduling tools

2018-04-03 Thread Qingcun Zhou
Repair has been a problem for us at Uber. In general I'm in favor of
including the scheduling logic in Cassandra daemon. It has the benefit of
introducing something like load-aware repair, eg, only schedule repair
while no ongoing compaction or traffic is low, etc. As proposed by others,
we can expose keyspace/table-level configurations so that users can opt-in.
Regarding the risk, yes there will be problems at the beginning but in the
long run, users will appreciate that repair works out of the box, just like
compaction. We have large Cassandra deployments and can work with Netflix
folks for intensive testing to boost user confidence.

On the other hand, have we looked into how other NoSQL databases do repair?
Is there a side car process?


On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli 
wrote:

> Repair is critical for running C* and I agree with Roopa that it needs to
> be part of the offering. I think we should make it easy for new users to
> run C*.
>
> Can we have a side car process which we can add to Apache Cassandra
> offering and we can put this repair their? I am also fine putting it in C*
> if side car is more long term.
>
> On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala <
> rtangir...@netflix.com.invalid> wrote:
>
> > In seeing so many companies grapple with running repairs successfully in
> > production, and seeing the success of distributed scheduled repair here
> at
> > Netflix, I strongly believe that adding this to Cassandra would be a
> great
> > addition to the database.  I am hoping, we as a community will make it
> easy
> > for teams to operate and run Cassandra by enhancing the core product, and
> > making the maintenances like repairs and compactions part of the database
> > without external tooling. We can have an experimental flag for the
> feature
> > and only teams who are confident with the service can enable them, while
> > others can fall back to default repairs.
> >
> >
> > *Regards,*
> >
> > *Roopa Tangirala*
> >
> > Engineering Manager CDE
> >
> > *(408) 438-3156 - mobile*
> >
> >
> >
> >
> >
> > On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
> > kenbrot...@yahoo.com.invalid> wrote:
> >
> > > Why not make it configurable?
> > > auto_manage_repair_consistancy: true (default: false)
> > >
> > > Then users can use the built in auto repair function that would be
> > created
> > > or continue to handle it as now.  Default behavior would be "false" so
> > > nothing changes on its own.  Just wondering why not have that option?
> It
> > > might accelerate progress as others have already suggested.
> > >
> > > Kenneth Brotman
> > >
> > > -Original Message-
> > > From: Nate McCall [mailto:zznat...@gmail.com]
> > > Sent: Tuesday, April 03, 2018 1:37 PM
> > > To: dev
> > > Subject: Re: Repair scheduling tools
> > >
> > > This document does a really good job of listing out some of the issues
> of
> > > coordinating scheduling repair. Regardless of which camp you fall into,
> > it
> > > is certainly worth a read.
> > >
> > > On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch 
> > > wrote:
> > > > I just want to say I think it would be great for our users if we
> moved
> > > > repair scheduling into Cassandra itself. The team here at Netflix has
> > > > opened the ticket
> > > > 
> > > > and have written a detailed design document
> > > >  t45rz7H3xs9G
> > > > bFSEyGzEtM/edit#heading=h.iasguic42ger>
> > > > that includes problem discussion and prior art if anyone wants to
> > > > contribute to that. We tried to fairly discuss existing solutions,
> > > > what their drawbacks are, and a proposed solution.
> > > >
> > > > If we were to put this as part of the main Cassandra daemon, I think
> > > > it should probably be marked experimental and of course be something
> > > > that users opt into (table by table or cluster by cluster) with the
> > > > understanding that it might not fully work out of the box the first
> > > > time we ship it. We have to be willing to take risks but we also have
> > > > to be honest with our users. It may help build confidence if a few
> > > > major deployments use it (such as Netflix) and we are happy of course
> > > > to provide that QA as best we can.
> > > >
> > > > -Joey
> > > >
> > > > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston
> > > > 
> > > > wrote:
> > > >
> > > >> Hi dev@,
> > > >>
> > > >>
> > > >>
> > > >> The question of the best way to schedule repairs came up on
> > > >> CASSANDRA-14346, and I thought it would be good to bring up the idea
> > > >> of an external tool on the dev list.
> > > >>
> > > >>
> > > >>
> > > >> Cassandra lacks any sort of tools for automating routine tasks that
> > > >> are required for running clusters, specifically repair. Regular
> > > >> repair is a must for most clusters, like compaction. This means
> that,
> > > >> 

Re: Repair scheduling tools

2018-04-03 Thread sankalp kohli
Repair is critical for running C* and I agree with Roopa that it needs to
be part of the offering. I think we should make it easy for new users to
run C*.

Can we have a side car process which we can add to Apache Cassandra
offering and we can put this repair their? I am also fine putting it in C*
if side car is more long term.

On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala <
rtangir...@netflix.com.invalid> wrote:

> In seeing so many companies grapple with running repairs successfully in
> production, and seeing the success of distributed scheduled repair here at
> Netflix, I strongly believe that adding this to Cassandra would be a great
> addition to the database.  I am hoping, we as a community will make it easy
> for teams to operate and run Cassandra by enhancing the core product, and
> making the maintenances like repairs and compactions part of the database
> without external tooling. We can have an experimental flag for the feature
> and only teams who are confident with the service can enable them, while
> others can fall back to default repairs.
>
>
> *Regards,*
>
> *Roopa Tangirala*
>
> Engineering Manager CDE
>
> *(408) 438-3156 - mobile*
>
>
>
>
>
> On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> > Why not make it configurable?
> > auto_manage_repair_consistancy: true (default: false)
> >
> > Then users can use the built in auto repair function that would be
> created
> > or continue to handle it as now.  Default behavior would be "false" so
> > nothing changes on its own.  Just wondering why not have that option?  It
> > might accelerate progress as others have already suggested.
> >
> > Kenneth Brotman
> >
> > -Original Message-
> > From: Nate McCall [mailto:zznat...@gmail.com]
> > Sent: Tuesday, April 03, 2018 1:37 PM
> > To: dev
> > Subject: Re: Repair scheduling tools
> >
> > This document does a really good job of listing out some of the issues of
> > coordinating scheduling repair. Regardless of which camp you fall into,
> it
> > is certainly worth a read.
> >
> > On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch 
> > wrote:
> > > I just want to say I think it would be great for our users if we moved
> > > repair scheduling into Cassandra itself. The team here at Netflix has
> > > opened the ticket
> > > 
> > > and have written a detailed design document
> > >  > > bFSEyGzEtM/edit#heading=h.iasguic42ger>
> > > that includes problem discussion and prior art if anyone wants to
> > > contribute to that. We tried to fairly discuss existing solutions,
> > > what their drawbacks are, and a proposed solution.
> > >
> > > If we were to put this as part of the main Cassandra daemon, I think
> > > it should probably be marked experimental and of course be something
> > > that users opt into (table by table or cluster by cluster) with the
> > > understanding that it might not fully work out of the box the first
> > > time we ship it. We have to be willing to take risks but we also have
> > > to be honest with our users. It may help build confidence if a few
> > > major deployments use it (such as Netflix) and we are happy of course
> > > to provide that QA as best we can.
> > >
> > > -Joey
> > >
> > > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston
> > > 
> > > wrote:
> > >
> > >> Hi dev@,
> > >>
> > >>
> > >>
> > >> The question of the best way to schedule repairs came up on
> > >> CASSANDRA-14346, and I thought it would be good to bring up the idea
> > >> of an external tool on the dev list.
> > >>
> > >>
> > >>
> > >> Cassandra lacks any sort of tools for automating routine tasks that
> > >> are required for running clusters, specifically repair. Regular
> > >> repair is a must for most clusters, like compaction. This means that,
> > >> especially as far as eventual consistency is concerned, Cassandra
> > >> isn’t totally functional out of the box. Operators either need to
> > >> find a 3rd party solution or implement one themselves. Adding this to
> > >> Cassandra would make it easier to use.
> > >>
> > >>
> > >>
> > >> Is this something we should be doing? If so, what should it look like?
> > >>
> > >>
> > >>
> > >> Personally, I feel like this is a pretty big gap in the project and
> > >> would like to see an out of process tool offered. Ideally, Cassandra
> > >> would just take care of itself, but writing a distributed repair
> > >> scheduler that you trust to run in production is a lot harder than
> > >> writing a single process management application that can failover.
> > >>
> > >>
> > >>
> > >> Any thoughts on this?
> > >>
> > >>
> > >>
> > >> Thanks,
> > >>
> > >>
> > >>
> > >> Blake
> > >>
> > >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional 

Re: Repair scheduling tools

2018-04-03 Thread Roopa Tangirala
In seeing so many companies grapple with running repairs successfully in
production, and seeing the success of distributed scheduled repair here at
Netflix, I strongly believe that adding this to Cassandra would be a great
addition to the database.  I am hoping, we as a community will make it easy
for teams to operate and run Cassandra by enhancing the core product, and
making the maintenances like repairs and compactions part of the database
without external tooling. We can have an experimental flag for the feature
and only teams who are confident with the service can enable them, while
others can fall back to default repairs.


*Regards,*

*Roopa Tangirala*

Engineering Manager CDE

*(408) 438-3156 - mobile*





On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Why not make it configurable?
> auto_manage_repair_consistancy: true (default: false)
>
> Then users can use the built in auto repair function that would be created
> or continue to handle it as now.  Default behavior would be "false" so
> nothing changes on its own.  Just wondering why not have that option?  It
> might accelerate progress as others have already suggested.
>
> Kenneth Brotman
>
> -Original Message-
> From: Nate McCall [mailto:zznat...@gmail.com]
> Sent: Tuesday, April 03, 2018 1:37 PM
> To: dev
> Subject: Re: Repair scheduling tools
>
> This document does a really good job of listing out some of the issues of
> coordinating scheduling repair. Regardless of which camp you fall into, it
> is certainly worth a read.
>
> On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch 
> wrote:
> > I just want to say I think it would be great for our users if we moved
> > repair scheduling into Cassandra itself. The team here at Netflix has
> > opened the ticket
> > 
> > and have written a detailed design document
> >  > bFSEyGzEtM/edit#heading=h.iasguic42ger>
> > that includes problem discussion and prior art if anyone wants to
> > contribute to that. We tried to fairly discuss existing solutions,
> > what their drawbacks are, and a proposed solution.
> >
> > If we were to put this as part of the main Cassandra daemon, I think
> > it should probably be marked experimental and of course be something
> > that users opt into (table by table or cluster by cluster) with the
> > understanding that it might not fully work out of the box the first
> > time we ship it. We have to be willing to take risks but we also have
> > to be honest with our users. It may help build confidence if a few
> > major deployments use it (such as Netflix) and we are happy of course
> > to provide that QA as best we can.
> >
> > -Joey
> >
> > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston
> > 
> > wrote:
> >
> >> Hi dev@,
> >>
> >>
> >>
> >> The question of the best way to schedule repairs came up on
> >> CASSANDRA-14346, and I thought it would be good to bring up the idea
> >> of an external tool on the dev list.
> >>
> >>
> >>
> >> Cassandra lacks any sort of tools for automating routine tasks that
> >> are required for running clusters, specifically repair. Regular
> >> repair is a must for most clusters, like compaction. This means that,
> >> especially as far as eventual consistency is concerned, Cassandra
> >> isn’t totally functional out of the box. Operators either need to
> >> find a 3rd party solution or implement one themselves. Adding this to
> >> Cassandra would make it easier to use.
> >>
> >>
> >>
> >> Is this something we should be doing? If so, what should it look like?
> >>
> >>
> >>
> >> Personally, I feel like this is a pretty big gap in the project and
> >> would like to see an out of process tool offered. Ideally, Cassandra
> >> would just take care of itself, but writing a distributed repair
> >> scheduler that you trust to run in production is a lot harder than
> >> writing a single process management application that can failover.
> >>
> >>
> >>
> >> Any thoughts on this?
> >>
> >>
> >>
> >> Thanks,
> >>
> >>
> >>
> >> Blake
> >>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


RE: Repair scheduling tools

2018-04-03 Thread Kenneth Brotman
Why not make it configurable?  
auto_manage_repair_consistancy: true (default: false)

Then users can use the built in auto repair function that would be created or 
continue to handle it as now.  Default behavior would be "false" so nothing 
changes on its own.  Just wondering why not have that option?  It might 
accelerate progress as others have already suggested.  

Kenneth Brotman

-Original Message-
From: Nate McCall [mailto:zznat...@gmail.com] 
Sent: Tuesday, April 03, 2018 1:37 PM
To: dev
Subject: Re: Repair scheduling tools

This document does a really good job of listing out some of the issues of 
coordinating scheduling repair. Regardless of which camp you fall into, it is 
certainly worth a read.

On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch  wrote:
> I just want to say I think it would be great for our users if we moved 
> repair scheduling into Cassandra itself. The team here at Netflix has 
> opened the ticket 
> 
> and have written a detailed design document 
>  bFSEyGzEtM/edit#heading=h.iasguic42ger>
> that includes problem discussion and prior art if anyone wants to 
> contribute to that. We tried to fairly discuss existing solutions, 
> what their drawbacks are, and a proposed solution.
>
> If we were to put this as part of the main Cassandra daemon, I think 
> it should probably be marked experimental and of course be something 
> that users opt into (table by table or cluster by cluster) with the 
> understanding that it might not fully work out of the box the first 
> time we ship it. We have to be willing to take risks but we also have 
> to be honest with our users. It may help build confidence if a few 
> major deployments use it (such as Netflix) and we are happy of course 
> to provide that QA as best we can.
>
> -Joey
>
> On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston 
> 
> wrote:
>
>> Hi dev@,
>>
>>
>>
>> The question of the best way to schedule repairs came up on 
>> CASSANDRA-14346, and I thought it would be good to bring up the idea 
>> of an external tool on the dev list.
>>
>>
>>
>> Cassandra lacks any sort of tools for automating routine tasks that 
>> are required for running clusters, specifically repair. Regular 
>> repair is a must for most clusters, like compaction. This means that, 
>> especially as far as eventual consistency is concerned, Cassandra 
>> isn’t totally functional out of the box. Operators either need to 
>> find a 3rd party solution or implement one themselves. Adding this to 
>> Cassandra would make it easier to use.
>>
>>
>>
>> Is this something we should be doing? If so, what should it look like?
>>
>>
>>
>> Personally, I feel like this is a pretty big gap in the project and 
>> would like to see an out of process tool offered. Ideally, Cassandra 
>> would just take care of itself, but writing a distributed repair 
>> scheduler that you trust to run in production is a lot harder than 
>> writing a single process management application that can failover.
>>
>>
>>
>> Any thoughts on this?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Blake
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-03 Thread Rahul Singh
Agree on including in the distribution but I think repair can live 
independently and be run / configured separately.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 3, 2018, 4:37 PM -0400, Nate McCall , wrote:
> This document does a really good job of listing out some of the issues
> of coordinating scheduling repair. Regardless of which camp you fall
> into, it is certainly worth a read.
>
> On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch  wrote:
> > I just want to say I think it would be great for our users if we moved
> > repair scheduling into Cassandra itself. The team here at Netflix has
> > opened the ticket  > and have written a detailed design document
> >  > that includes problem discussion and prior art if anyone wants to
> > contribute to that. We tried to fairly discuss existing solutions, what
> > their drawbacks are, and a proposed solution.
> >
> > If we were to put this as part of the main Cassandra daemon, I think it
> > should probably be marked experimental and of course be something that
> > users opt into (table by table or cluster by cluster) with the
> > understanding that it might not fully work out of the box the first time we
> > ship it. We have to be willing to take risks but we also have to be honest
> > with our users. It may help build confidence if a few major deployments use
> > it (such as Netflix) and we are happy of course to provide that QA as best
> > we can.
> >
> > -Joey
> >
> > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston  > wrote:
> >
> > > Hi dev@,
> > >
> > >
> > >
> > > The question of the best way to schedule repairs came up on
> > > CASSANDRA-14346, and I thought it would be good to bring up the idea of an
> > > external tool on the dev list.
> > >
> > >
> > >
> > > Cassandra lacks any sort of tools for automating routine tasks that are
> > > required for running clusters, specifically repair. Regular repair is a
> > > must for most clusters, like compaction. This means that, especially as 
> > > far
> > > as eventual consistency is concerned, Cassandra isn’t totally functional
> > > out of the box. Operators either need to find a 3rd party solution or
> > > implement one themselves. Adding this to Cassandra would make it easier to
> > > use.
> > >
> > >
> > >
> > > Is this something we should be doing? If so, what should it look like?
> > >
> > >
> > >
> > > Personally, I feel like this is a pretty big gap in the project and would
> > > like to see an out of process tool offered. Ideally, Cassandra would just
> > > take care of itself, but writing a distributed repair scheduler that you
> > > trust to run in production is a lot harder than writing a single process
> > > management application that can failover.
> > >
> > >
> > >
> > > Any thoughts on this?
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Blake
> > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: Roadmap for 4.0

2018-04-03 Thread Josh McKenzie
>
> A hard date for a feature freeze makes sense, a hard date for a release
> does not.

 Strongly agree. We should also collectively define what "Done" looks like
post freeze so we don't end up in bike-shedding hell like we have in the
past.



On Tue, Apr 3, 2018 at 5:35 PM, Jeff Jirsa  wrote:

> A hard date for a feature freeze makes sense, a hard date for a release
> does not.
>
> --
> Jeff Jirsa
>
>
> > On Apr 3, 2018, at 2:29 PM, Michael Shuler 
> wrote:
> >
> > On 04/03/2018 03:51 PM, Nate McCall wrote:
> >>> My concrete proposal would be to declare a feature freeze for 4.0 in 2
> >>> months,
> >>> so say June 1th. That leave some time for finishing features that are
> in
> >>> progress, but not too much to get derailed. And let's be strict on that
> >>> freeze.
> >>
> >> I quite like this suggestion. Thanks, Sylvain.
> >
> > Should we s/TBD/somedate/ on the downloads page and get the word out?
> >
> > Apache Cassandra 3.0 is supported until 6 months after 4.0 release (date
> > TBD).
> > Apache Cassandra 2.2 is supported until 4.0 release (date TBD).
> > Apache Cassandra 2.1 is supported until 4.0 release (date TBD) with
> > critical fixes only.
> >
> > --
> > Kind regards,
> > Michael
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Roadmap for 4.0

2018-04-03 Thread Jeff Jirsa
A hard date for a feature freeze makes sense, a hard date for a release does 
not.

-- 
Jeff Jirsa


> On Apr 3, 2018, at 2:29 PM, Michael Shuler  wrote:
> 
> On 04/03/2018 03:51 PM, Nate McCall wrote:
>>> My concrete proposal would be to declare a feature freeze for 4.0 in 2
>>> months,
>>> so say June 1th. That leave some time for finishing features that are in
>>> progress, but not too much to get derailed. And let's be strict on that
>>> freeze.
>> 
>> I quite like this suggestion. Thanks, Sylvain.
> 
> Should we s/TBD/somedate/ on the downloads page and get the word out?
> 
> Apache Cassandra 3.0 is supported until 6 months after 4.0 release (date
> TBD).
> Apache Cassandra 2.2 is supported until 4.0 release (date TBD).
> Apache Cassandra 2.1 is supported until 4.0 release (date TBD) with
> critical fixes only.
> 
> -- 
> Kind regards,
> Michael
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-03 Thread Michael Shuler
On 04/03/2018 03:51 PM, Nate McCall wrote:
>> My concrete proposal would be to declare a feature freeze for 4.0 in 2
>> months,
>> so say June 1th. That leave some time for finishing features that are in
>> progress, but not too much to get derailed. And let's be strict on that
>> freeze.
> 
> I quite like this suggestion. Thanks, Sylvain.

Should we s/TBD/somedate/ on the downloads page and get the word out?

Apache Cassandra 3.0 is supported until 6 months after 4.0 release (date
TBD).
Apache Cassandra 2.2 is supported until 4.0 release (date TBD).
Apache Cassandra 2.1 is supported until 4.0 release (date TBD) with
critical fixes only.

-- 
Kind regards,
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-03 Thread Nate McCall
> My concrete proposal would be to declare a feature freeze for 4.0 in 2
> months,
> so say June 1th. That leave some time for finishing features that are in
> progress, but not too much to get derailed. And let's be strict on that
> freeze.

I quite like this suggestion. Thanks, Sylvain.

> After that, we'll see how quickly we can get stuffs to stabilize but I'd
> suggest aiming for an alpha 3-4 weeks after that.

Yes. We've had folks step up and offer to dog-food 4.0 and can
probably solicit some more with some outreach.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-03 Thread Nate McCall
This document does a really good job of listing out some of the issues
of coordinating scheduling repair. Regardless of which camp you fall
into, it is certainly worth a read.

On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch  wrote:
> I just want to say I think it would be great for our users if we moved
> repair scheduling into Cassandra itself. The team here at Netflix has
> opened the ticket 
> and have written a detailed design document
> 
> that includes problem discussion and prior art if anyone wants to
> contribute to that. We tried to fairly discuss existing solutions, what
> their drawbacks are, and a proposed solution.
>
> If we were to put this as part of the main Cassandra daemon, I think it
> should probably be marked experimental and of course be something that
> users opt into (table by table or cluster by cluster) with the
> understanding that it might not fully work out of the box the first time we
> ship it. We have to be willing to take risks but we also have to be honest
> with our users. It may help build confidence if a few major deployments use
> it (such as Netflix) and we are happy of course to provide that QA as best
> we can.
>
> -Joey
>
> On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston 
> wrote:
>
>> Hi dev@,
>>
>>
>>
>> The question of the best way to schedule repairs came up on
>> CASSANDRA-14346, and I thought it would be good to bring up the idea of an
>> external tool on the dev list.
>>
>>
>>
>> Cassandra lacks any sort of tools for automating routine tasks that are
>> required for running clusters, specifically repair. Regular repair is a
>> must for most clusters, like compaction. This means that, especially as far
>> as eventual consistency is concerned, Cassandra isn’t totally functional
>> out of the box. Operators either need to find a 3rd party solution or
>> implement one themselves. Adding this to Cassandra would make it easier to
>> use.
>>
>>
>>
>> Is this something we should be doing? If so, what should it look like?
>>
>>
>>
>> Personally, I feel like this is a pretty big gap in the project and would
>> like to see an out of process tool offered. Ideally, Cassandra would just
>> take care of itself, but writing a distributed repair scheduler that you
>> trust to run in production is a lot harder than writing a single process
>> management application that can failover.
>>
>>
>>
>> Any thoughts on this?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Blake
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-03 Thread Joseph Lynch
I just want to say I think it would be great for our users if we moved
repair scheduling into Cassandra itself. The team here at Netflix has
opened the ticket 
and have written a detailed design document

that includes problem discussion and prior art if anyone wants to
contribute to that. We tried to fairly discuss existing solutions, what
their drawbacks are, and a proposed solution.

If we were to put this as part of the main Cassandra daemon, I think it
should probably be marked experimental and of course be something that
users opt into (table by table or cluster by cluster) with the
understanding that it might not fully work out of the box the first time we
ship it. We have to be willing to take risks but we also have to be honest
with our users. It may help build confidence if a few major deployments use
it (such as Netflix) and we are happy of course to provide that QA as best
we can.

-Joey

On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston 
wrote:

> Hi dev@,
>
>
>
> The question of the best way to schedule repairs came up on
> CASSANDRA-14346, and I thought it would be good to bring up the idea of an
> external tool on the dev list.
>
>
>
> Cassandra lacks any sort of tools for automating routine tasks that are
> required for running clusters, specifically repair. Regular repair is a
> must for most clusters, like compaction. This means that, especially as far
> as eventual consistency is concerned, Cassandra isn’t totally functional
> out of the box. Operators either need to find a 3rd party solution or
> implement one themselves. Adding this to Cassandra would make it easier to
> use.
>
>
>
> Is this something we should be doing? If so, what should it look like?
>
>
>
> Personally, I feel like this is a pretty big gap in the project and would
> like to see an out of process tool offered. Ideally, Cassandra would just
> take care of itself, but writing a distributed repair scheduler that you
> trust to run in production is a lot harder than writing a single process
> management application that can failover.
>
>
>
> Any thoughts on this?
>
>
>
> Thanks,
>
>
>
> Blake
>
>


Re: Repair scheduling tools

2018-04-03 Thread Carl Mueller
LastPickle's reaper should be the starting point of any discussion on
repair scheduling.

On Tue, Apr 3, 2018 at 12:48 PM, Blake Eggleston 
wrote:

> Hi dev@,
>
>
>
> The question of the best way to schedule repairs came up on
> CASSANDRA-14346, and I thought it would be good to bring up the idea of an
> external tool on the dev list.
>
>
>
> Cassandra lacks any sort of tools for automating routine tasks that are
> required for running clusters, specifically repair. Regular repair is a
> must for most clusters, like compaction. This means that, especially as far
> as eventual consistency is concerned, Cassandra isn’t totally functional
> out of the box. Operators either need to find a 3rd party solution or
> implement one themselves. Adding this to Cassandra would make it easier to
> use.
>
>
>
> Is this something we should be doing? If so, what should it look like?
>
>
>
> Personally, I feel like this is a pretty big gap in the project and would
> like to see an out of process tool offered. Ideally, Cassandra would just
> take care of itself, but writing a distributed repair scheduler that you
> trust to run in production is a lot harder than writing a single process
> management application that can failover.
>
>
>
> Any thoughts on this?
>
>
>
> Thanks,
>
>
>
> Blake
>
>


Re: Roadmap for 4.0

2018-04-03 Thread Jon Haddad
I’d prefer to time box it as well.  I like Sylvain’s suggestion, although I’d 
also be comfortable with setting a more aggressive cutoff date for features 
(maybe a month), given all the stuff that’s already in.

If we plan on a follow up (4.1/5.0) in 6 months I *hope* there will be less of 
a desire to do a bunch of last minute feature merges, maybe I’m too optimistic.

Jon

 

> On Apr 3, 2018, at 9:48 AM, Ben Bromhead  wrote:
> 
> +1
> 
> Even though I suggested clearing blockers, I'm equally happy with a
> time-boxed event to draw the line in the sand. As long as its something
> clear to work towards with appropriate commitment from folk.
> 
> On Tue, Apr 3, 2018 at 8:10 AM Sylvain Lebresne  >
> wrote:
> 
>> For what it's worth (and based on the project experience), I think the
>> strategy
>> of "let's agree on a list of tickets everyone would love to get in before
>> we
>> freeze 4.0" doesn't work very well (it's largely useless, expect for making
>> us
>> feel good about not releasing anything). Those lists always end up being
>> too
>> big especially given we have no control on people's ability to contribute
>> (some stuffs will always lag for a very long time, even when they sound
>> really cool on paper).
>> 
>> I'm also a bit sad that we seem to be getting back to our old demons of
>> trying
>> to shove as much as we possibly can in the next major as if having a
>> feature
>> miss it means it will never happen. The 4.0 changelog is big already and we
>> haven't made a release with new features in almost a year now, so I
>> personally
>> think we should start being a bit more aggressive with it and learn to get
>> comfortable letting feature slip if they are not ready.
>> 
>> My concrete proposal would be to declare a feature freeze for 4.0 in 2
>> months,
>> so say June 1th. That leave some time for finishing features that are in
>> progress, but not too much to get derailed. And let's be strict on that
>> freeze.
>> After that, we'll see how quickly we can get stuffs to stabilize but I'd
>> suggest aiming for an alpha 3-4 weeks after that.
>> 
>> Of course, we should probably (re-(re-(re-)))start a discussion on release
>> "strategy" in parallel because it doesn't seem we have one right now, but
>> that's imo a discussion we should keep separate.
>> 
>> --
>> Sylvain
>> 
>> 
>> On Mon, Apr 2, 2018 at 4:54 PM DuyHai Doan  wrote:
>> 
>>> My wish list:
>>> 
>>> * Add support for arithmetic operators (CASSANDRA-11935)
>>> * Allow IN restrictions on column families with collections
>>> (CASSANDRA-12654)
>>> * Add support for + and - operations on dates (CASSANDRA-11936)
>>> * Add the currentTimestamp, currentDate, currentTime and currentTimeUUID
>>> functions (CASSANDRA-13132)
>>> * Allow selecting Map values and Set elements (CASSANDRA-7396)
>>> 
>>> Those are mostly useful for timeseries data models and I guess has no
>>> significant impact on the internals and operations so the risk of
>>> regression is low
>>> 
>>> On Mon, Apr 2, 2018 at 4:33 PM, Jeff Jirsa  wrote:
>>> 
 9608 (java9)
 
 --
 Jeff Jirsa
 
 
> On Apr 2, 2018, at 3:45 AM, Jason Brown 
>> wrote:
> 
> The only additional tickets I'd like to mention are:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-13971 - Automatic
> certificate management using Vault
> - Stefan's Vault integration work. A sub-ticket, CASSANDRA-14102,
 addresses
> encryption at-rest, subsumes CASSANDRA-9633 (SSTable encryption) -
>>> which
 I
> doubt I would be able to get to any time this year. It would
>> definitely
 be
> nice to have a clarified encryption/security story for 4.0.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-11990 - Address rows
 rather
> than partitions in SASI
> - a nice update for SASI, but not critical.
> 
> -Jason
> 
>> On Sat, Mar 31, 2018 at 6:53 PM, Ben Bromhead 
 wrote:
>> 
>> Apologies all, I didn't realize I was responding to this discussion
 only on
>> the @user list. One of the perils of responding to a thread that is
>> on
 both
>> user and dev...
>> 
>> For context, I have included my response to Kurt's previous
>> discussion
 on
>> this topic as it only ended up on the user list.
>> 
>> *After some further discussions with folks offline, I'd like to
>> revive
 this
>> discussion. *
>> 
>> *As Kurt mentioned, to keep it simple I if we can simply build
>>> consensus
>> around what is in for 4.0 and what is out. We can then start the
 process of
>> working off a 4.0 branch towards betas and release candidates. Again
>>> as
>> Kurt mentioned, assigning a timeline to it right now is difficult,
>> but
>> having a firm line in the sand around what features/patches are in,

Repair scheduling tools

2018-04-03 Thread Blake Eggleston
Hi dev@,

 

The question of the best way to schedule repairs came up on CASSANDRA-14346, 
and I thought it would be good to bring up the idea of an external tool on the 
dev list.

 

Cassandra lacks any sort of tools for automating routine tasks that are 
required for running clusters, specifically repair. Regular repair is a must 
for most clusters, like compaction. This means that, especially as far as 
eventual consistency is concerned, Cassandra isn’t totally functional out of 
the box. Operators either need to find a 3rd party solution or implement one 
themselves. Adding this to Cassandra would make it easier to use.

 

Is this something we should be doing? If so, what should it look like?

 

Personally, I feel like this is a pretty big gap in the project and would like 
to see an out of process tool offered. Ideally, Cassandra would just take care 
of itself, but writing a distributed repair scheduler that you trust to run in 
production is a lot harder than writing a single process management application 
that can failover.

 

Any thoughts on this?

 

Thanks,

 

Blake



Re: Roadmap for 4.0

2018-04-03 Thread Ben Bromhead
+1

Even though I suggested clearing blockers, I'm equally happy with a
time-boxed event to draw the line in the sand. As long as its something
clear to work towards with appropriate commitment from folk.

On Tue, Apr 3, 2018 at 8:10 AM Sylvain Lebresne 
wrote:

> For what it's worth (and based on the project experience), I think the
> strategy
> of "let's agree on a list of tickets everyone would love to get in before
> we
> freeze 4.0" doesn't work very well (it's largely useless, expect for making
> us
> feel good about not releasing anything). Those lists always end up being
> too
> big especially given we have no control on people's ability to contribute
> (some stuffs will always lag for a very long time, even when they sound
> really cool on paper).
>
> I'm also a bit sad that we seem to be getting back to our old demons of
> trying
> to shove as much as we possibly can in the next major as if having a
> feature
> miss it means it will never happen. The 4.0 changelog is big already and we
> haven't made a release with new features in almost a year now, so I
> personally
> think we should start being a bit more aggressive with it and learn to get
> comfortable letting feature slip if they are not ready.
>
> My concrete proposal would be to declare a feature freeze for 4.0 in 2
> months,
> so say June 1th. That leave some time for finishing features that are in
> progress, but not too much to get derailed. And let's be strict on that
> freeze.
> After that, we'll see how quickly we can get stuffs to stabilize but I'd
> suggest aiming for an alpha 3-4 weeks after that.
>
> Of course, we should probably (re-(re-(re-)))start a discussion on release
> "strategy" in parallel because it doesn't seem we have one right now, but
> that's imo a discussion we should keep separate.
>
> --
> Sylvain
>
>
> On Mon, Apr 2, 2018 at 4:54 PM DuyHai Doan  wrote:
>
> > My wish list:
> >
> > * Add support for arithmetic operators (CASSANDRA-11935)
> > * Allow IN restrictions on column families with collections
> > (CASSANDRA-12654)
> > * Add support for + and - operations on dates (CASSANDRA-11936)
> > * Add the currentTimestamp, currentDate, currentTime and currentTimeUUID
> > functions (CASSANDRA-13132)
> > * Allow selecting Map values and Set elements (CASSANDRA-7396)
> >
> > Those are mostly useful for timeseries data models and I guess has no
> > significant impact on the internals and operations so the risk of
> > regression is low
> >
> > On Mon, Apr 2, 2018 at 4:33 PM, Jeff Jirsa  wrote:
> >
> > > 9608 (java9)
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Apr 2, 2018, at 3:45 AM, Jason Brown 
> wrote:
> > > >
> > > > The only additional tickets I'd like to mention are:
> > > >
> > > > https://issues.apache.org/jira/browse/CASSANDRA-13971 - Automatic
> > > > certificate management using Vault
> > > > - Stefan's Vault integration work. A sub-ticket, CASSANDRA-14102,
> > > addresses
> > > > encryption at-rest, subsumes CASSANDRA-9633 (SSTable encryption) -
> > which
> > > I
> > > > doubt I would be able to get to any time this year. It would
> definitely
> > > be
> > > > nice to have a clarified encryption/security story for 4.0.
> > > >
> > > > https://issues.apache.org/jira/browse/CASSANDRA-11990 - Address rows
> > > rather
> > > > than partitions in SASI
> > > > - a nice update for SASI, but not critical.
> > > >
> > > > -Jason
> > > >
> > > >> On Sat, Mar 31, 2018 at 6:53 PM, Ben Bromhead 
> > > wrote:
> > > >>
> > > >> Apologies all, I didn't realize I was responding to this discussion
> > > only on
> > > >> the @user list. One of the perils of responding to a thread that is
> on
> > > both
> > > >> user and dev...
> > > >>
> > > >> For context, I have included my response to Kurt's previous
> discussion
> > > on
> > > >> this topic as it only ended up on the user list.
> > > >>
> > > >> *After some further discussions with folks offline, I'd like to
> revive
> > > this
> > > >> discussion. *
> > > >>
> > > >> *As Kurt mentioned, to keep it simple I if we can simply build
> > consensus
> > > >> around what is in for 4.0 and what is out. We can then start the
> > > process of
> > > >> working off a 4.0 branch towards betas and release candidates. Again
> > as
> > > >> Kurt mentioned, assigning a timeline to it right now is difficult,
> but
> > > >> having a firm line in the sand around what features/patches are in,
> > then
> > > >> limiting future 4.0 work to bug fixes will give folks a less
> nebulous
> > > >> target to work on. *
> > > >>
> > > >> *The other thing to mention is that once we have a 4.0 branch to
> work
> > > off,
> > > >> we at Instaclustr have a commitment to dogfooding the release
> > > candidates on
> > > >> our internal staging and internal production workloads before 4.0
> > > becomes
> > > >> generally available. I know other folks have similar commitments and
> > > simply
> 

Re: CommitLogSegmentManager verbose debug log

2018-04-03 Thread Nicolas Guyomar
Hi Jay,

Well the log in itself does not provide useful information (like segment
number or sthg like that), so IMHO trace would be a better level for this
one

I agree that one log per sec may not be seen that verbose !

Thank you

On 30 March 2018 at 06:36, Jay Zhuang  wrote:

> It's changed to trace() in cassandra-3.0 with CASSANDRA-10241:
> https://github.com/pauloricardomg/cassandra/commit/
> 3ef1b18fa76dce7cd65b73977fc30e51301f3fed#diff-
> d07279710c482983e537aed26df80400
>
> In cassandra-3.11 (and trunk), it's changed back to debug() with
> CASSANDRA-10202:
> https://github.com/apache/cassandra/commit/e8907c16abcd84021a39cdaac79b60
> 9fcc64a43c#diff-85e13493c70723764c539dd222455979
>
> The message is logged when a new commit-log is created, so it's not that
> verbose from my point of view. But I'm also fine to change it back to trace.
>
> Here is a sample of debug.log while running cassandra-stress:
> https://gist.githubusercontent.com/cooldoger/
> 12f507da9b41b232d8869bbcd2bfd02b/raw/241cd8f0639269966aa53e2b10cee6
> 13f8ed8cfe/gistfile1.txt
>
>
>
> On Thursday, March 29, 2018, 8:47:54 AM PDT, Nicolas Guyomar <
> nicolas.guyo...@gmail.com> wrote:
>
>
> Hi guys,
>
> I'm trying to understand the meaning of the following log
> in org.apache.cassandra.db.commitlog.CommitLogSegmentManager.java
>
> logger.debug("No segments in reserve; creating a fresh one");
>
> I feel like it could be remove, as it seems to be kind of a continuous task
> of providing segment
>
> Any thought on removing this log ? (my debug.log is quite full of it)
>
> Thank you
>
> Nicolas
>