Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-20 Thread Jason Brown
+1 to everything Blake said. Bonus points for property-based state testing
(a la ScalaCheck/QuickCheck).

This being said, are we drifting from the original topic of this thread?
Should we decide features for 4.0? It seems to me testing concerns might be
for another thread? Is it worthwhile voting on features without a clearer
testing/release roadmap/plan?

Thanks,

Jason

On Sun, Nov 20, 2016 at 11:23 Blake Eggleston  wrote:

> > I'm not sure how the apache team does this. Perhaps individual engineers
> can run some modern version at a company of theirs, altho that seems
> unlikely, but as an Apache org, i just don't see how that happens.
>
> > To me it seems like the Apache Cassandra infrastructure itself needs to
> stand up a multinode live instance running some 'real-world' example
> that is getting pounded, so that we can stage feature branches to really
> test them.
>
> Not having access to test hardware as an apache org is a problem, but
> there’s also a lot of room for improvement on the junit testing and
> testability side of things. That’s true for both local and distributed
> components, but more junit coverage of the distributed mechanisms would
> make not having test hardware suck less. With distributed algorithms (like
> gossip 2.0) one of the limitations of testing with live nodes is that
> you’re often just testing the happy path. Reliably and repeatably testing
> how the system responds to weird edge cases involving specific ordering of
> events across nodes is very difficult to do.
>
> I’d written epaxos with this sort of testing in mind, and was able to do a
> lot of testing of obscure failure scenarios (see
> https://github.com/bdeggleston/cassandra/blob/CASSANDRA-6246-trunk/test/unit/org/apache/cassandra/service/epaxos/integration/EpaxosIntegrationRF3Test.java#L144
>  for
> an example). This doesn’t obviate the need to test on real clusters of
> course, but it does increase confidence that the system will behave
> correctly under load, and reduce the amount of things you’re relying on a
> loaded test cluster to reveal.
>
> On November 20, 2016 at 9:02:55 AM, Dave Brosius (dbros...@mebigfatguy.com)
> wrote:
>
> >> We fully intend to "engineer and test the snot out of" the changes
> we are working on as the whole point of us working on them is so we
> *can* run them in production, at our scale.
>
> I'm not sure how the apache team does this. Perhaps individual engineers
> can run some modern version at a company of theirs, altho that seems
> unlikely, but as an Apache org, i just don't see how that happens.
>
> To me it seems like the Apache Cassandra infrastructure itself needs to
> stand up a multinode live instance running some 'real-world' example
> that is getting pounded, so that we can stage feature branches to really
> test them.
>
> Otherwise we will forever be basing versions on the poor test saps who
> decide they are willing to risk all to upgrade to the cutting edge, and
> why, everyone believes in the adage, don't upgrade until at least .6
>
> --dave
>
>
> On 11/20/2016 09:50 AM, Jason Brown wrote:
> > Hey all,
> >
> > One of the goals on my team, when working on large patches, is to get
> > community feedback on these initiatives before throwing them into prod.
> > This gets us a wider net of feedback (see Sylvain's continuing excellent
> > rounds of feedback to my work on CASSANDRA-8457), as well as making sure
> we
> > don't go too far off the deep end in terms of straying from the community
> > version. The latter point is crucial because if we make too many
> > incompatible changes to, for example, the internode messaging protocol or
> > the CQL protocol or the sstable file format, and deploy that, it may be
> > very difficult, if not impossible, to rectify with future, in-development
> > versions of cassandra.
> >
> > We fully intend to "engineer and test the snot out of" the changes we are
> > working on as the whole point of us working on them is so we *can* run
> them
> > in production, at our scale. We aren't expecting others in the community
> to
> > dog food it for us. There will be a delay between committing something
> > upstream, and us backporting it to a current version we run in production
> > and actually deploying it. However, you can be sure that any bugs we find
> > will be fixed ASAP; we have many users counting on it.
> >
> > Thanks for listening,
> >
> > -Jason
> >
> >
> > On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston 
> > wrote:
> >
> >> I think Ed's just using gossip 2.0 as a hypothetical example. His point
> is
> >> that we should only commit things when we have a high degree of
> confidence
> >> that they work correctly, not with the expectation that they don't.
> >>
> >>
> >> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (
> >> mkjell...@internalcircle.com) wrote:
> >>
> >> Jason has asked for review and feedback many times. Maybe be
> constructive
> >> and review his code instead of just 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-20 Thread Blake Eggleston
> I'm not sure how the apache team does this. Perhaps individual engineers 
can run some modern version at a company of theirs, altho that seems 
unlikely, but as an Apache org, i just don't see how that happens. 

> To me it seems like the Apache Cassandra infrastructure itself needs to 
stand up a multinode live instance running some 'real-world' example 
that is getting pounded, so that we can stage feature branches to really 
test them. 

Not having access to test hardware as an apache org is a problem, but there’s 
also a lot of room for improvement on the junit testing and testability side of 
things. That’s true for both local and distributed components, but more junit 
coverage of the distributed mechanisms would make not having test hardware suck 
less. With distributed algorithms (like gossip 2.0) one of the limitations of 
testing with live nodes is that you’re often just testing the happy path. 
Reliably and repeatably testing how the system responds to weird edge cases 
involving specific ordering of events across nodes is very difficult to do.

I’d written epaxos with this sort of testing in mind, and was able to do a lot 
of testing of obscure failure scenarios (see 
https://github.com/bdeggleston/cassandra/blob/CASSANDRA-6246-trunk/test/unit/org/apache/cassandra/service/epaxos/integration/EpaxosIntegrationRF3Test.java#L144
 for an example). This doesn’t obviate the need to test on real clusters of 
course, but it does increase confidence that the system will behave correctly 
under load, and reduce the amount of things you’re relying on a loaded test 
cluster to reveal.

On November 20, 2016 at 9:02:55 AM, Dave Brosius (dbros...@mebigfatguy.com) 
wrote:

>> We fully intend to "engineer and test the snot out of" the changes  
we are working on as the whole point of us working on them is so we  
*can* run them in production, at our scale.  

I'm not sure how the apache team does this. Perhaps individual engineers  
can run some modern version at a company of theirs, altho that seems  
unlikely, but as an Apache org, i just don't see how that happens.  

To me it seems like the Apache Cassandra infrastructure itself needs to  
stand up a multinode live instance running some 'real-world' example  
that is getting pounded, so that we can stage feature branches to really  
test them.  

Otherwise we will forever be basing versions on the poor test saps who  
decide they are willing to risk all to upgrade to the cutting edge, and  
why, everyone believes in the adage, don't upgrade until at least .6  

--dave  


On 11/20/2016 09:50 AM, Jason Brown wrote:  
> Hey all,  
>  
> One of the goals on my team, when working on large patches, is to get  
> community feedback on these initiatives before throwing them into prod.  
> This gets us a wider net of feedback (see Sylvain's continuing excellent  
> rounds of feedback to my work on CASSANDRA-8457), as well as making sure we  
> don't go too far off the deep end in terms of straying from the community  
> version. The latter point is crucial because if we make too many  
> incompatible changes to, for example, the internode messaging protocol or  
> the CQL protocol or the sstable file format, and deploy that, it may be  
> very difficult, if not impossible, to rectify with future, in-development  
> versions of cassandra.  
>  
> We fully intend to "engineer and test the snot out of" the changes we are  
> working on as the whole point of us working on them is so we *can* run them  
> in production, at our scale. We aren't expecting others in the community to  
> dog food it for us. There will be a delay between committing something  
> upstream, and us backporting it to a current version we run in production  
> and actually deploying it. However, you can be sure that any bugs we find  
> will be fixed ASAP; we have many users counting on it.  
>  
> Thanks for listening,  
>  
> -Jason  
>  
>  
> On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston   
> wrote:  
>  
>> I think Ed's just using gossip 2.0 as a hypothetical example. His point is  
>> that we should only commit things when we have a high degree of confidence  
>> that they work correctly, not with the expectation that they don't.  
>>  
>>  
>> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (  
>> mkjell...@internalcircle.com) wrote:  
>>  
>> Jason has asked for review and feedback many times. Maybe be constructive  
>> and review his code instead of just complaining (once again)?  
>>  
>> Sent from my iPhone  
>>  
>>> On Nov 19, 2016, at 1:49 PM, Edward Capriolo   
>> wrote:  
>>> I would say start with a mindset like 'people will run this in  
>> production'  
>>> not like 'why would you expect this to work'.  
>>>  
>>> Now how does this logic effect feature develement? Maybe use gossip 2.0  
>> as  
>>> an example.  
>>>  
>>> I will play my given debby downer role. I could imagine 1 or 2 dtests and  
>>> the logic of 'dont 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-20 Thread Sankalp Kohli
This was not for the Dev list :)

> On Nov 20, 2016, at 09:06, Sankalp Kohli  wrote:
> 
> I have asked him to calm down as these things are never constructive for the 
> community. Making personal comments put him in bad light more than anytime 
> else. 
> I will speak with him in person when we are in office.
> 
> Thanks for keeping an eye on these things for us. I will setup another 
> meeting with you to talk about Cassandra strategies. 
> 
>> On Nov 20, 2016, at 06:50, Jason Brown  wrote:
>> 
>> Hey all,
>> 
>> One of the goals on my team, when working on large patches, is to get
>> community feedback on these initiatives before throwing them into prod.
>> This gets us a wider net of feedback (see Sylvain's continuing excellent
>> rounds of feedback to my work on CASSANDRA-8457), as well as making sure we
>> don't go too far off the deep end in terms of straying from the community
>> version. The latter point is crucial because if we make too many
>> incompatible changes to, for example, the internode messaging protocol or
>> the CQL protocol or the sstable file format, and deploy that, it may be
>> very difficult, if not impossible, to rectify with future, in-development
>> versions of cassandra.
>> 
>> We fully intend to "engineer and test the snot out of" the changes we are
>> working on as the whole point of us working on them is so we *can* run them
>> in production, at our scale. We aren't expecting others in the community to
>> dog food it for us. There will be a delay between committing something
>> upstream, and us backporting it to a current version we run in production
>> and actually deploying it. However, you can be sure that any bugs we find
>> will be fixed ASAP; we have many users counting on it.
>> 
>> Thanks for listening,
>> 
>> -Jason
>> 
>> 
>> On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston 
>> wrote:
>> 
>>> I think Ed's just using gossip 2.0 as a hypothetical example. His point is
>>> that we should only commit things when we have a high degree of confidence
>>> that they work correctly, not with the expectation that they don't.
>>> 
>>> 
>>> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (
>>> mkjell...@internalcircle.com) wrote:
>>> 
>>> Jason has asked for review and feedback many times. Maybe be constructive
>>> and review his code instead of just complaining (once again)?
>>> 
>>> Sent from my iPhone
>>> 
 On Nov 19, 2016, at 1:49 PM, Edward Capriolo 
>>> wrote:
 
 I would say start with a mindset like 'people will run this in
>>> production'
 not like 'why would you expect this to work'.
 
 Now how does this logic effect feature develement? Maybe use gossip 2.0
>>> as
 an example.
 
 I will play my given debby downer role. I could imagine 1 or 2 dtests and
 the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes
>>> with
 twitter announce of the release let bugs trickle in.
 
 One could also do something comprehensive like test on clusters of 2 to
 1000 nodes. Test with jepsen to see what happens during partitions,
>>> inject
 things like jvm pauses and account for behaivor. Log convergence times
 after given events.
 
 Take a stand and say look "we engineered and beat the crap out of this
 feature. I deployed this release feature at my company and eat my
>>> dogfood.
 You are not my crash test dummy."
 
 
> On Saturday, November 19, 2016, Jeff Jirsa  wrote:
> 
> Any proposal to solve the problem you describe?
> 
> --
> Jeff Jirsa
> 
> 
>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo  <;>> wrote:
>> 
>> This is especially relevant if people wish to focus on removing things.
>> 
>> For example, gossip 2.0 sounds great, but seems geared toward huge
> clusters
>> which is not likely a majority of users. For those with a 20 node
>>> cluster
>> are the indirect benefits woth it?
>> 
>> Also there seems to be a first push to remove things like compact
>>> storage
>> or thrift. Fine great. But what is the realistic update path for
>>> someone.
>> If the big players are running 2.1 and maintaining backports, the
>>> average
>> shop without a dedicated team is going to be stuck saying (great
>>> features
>> in 4.0 that improve performance, i would probably switch but its not
> stable
>> and we have that one compact storage cf and who knows what is going to
>> happen performance wise when)
>> 
>> We really need to lose this realease wont be stable for 6 minor
>>> versions
>> concept.
>> 
>> On Saturday, November 19, 2016, Edward Capriolo  <;>>
>> wrote:
>> 
>>> 
>>> 
>>> On Friday, November 18, 2016, Jeff Jirsa  <;>
>>> 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-20 Thread Sankalp Kohli
I have asked him to calm down as these things are never constructive for the 
community. Making personal comments put him in bad light more than anytime 
else. 
I will speak with him in person when we are in office.

Thanks for keeping an eye on these things for us. I will setup another meeting 
with you to talk about Cassandra strategies. 

> On Nov 20, 2016, at 06:50, Jason Brown  wrote:
> 
> Hey all,
> 
> One of the goals on my team, when working on large patches, is to get
> community feedback on these initiatives before throwing them into prod.
> This gets us a wider net of feedback (see Sylvain's continuing excellent
> rounds of feedback to my work on CASSANDRA-8457), as well as making sure we
> don't go too far off the deep end in terms of straying from the community
> version. The latter point is crucial because if we make too many
> incompatible changes to, for example, the internode messaging protocol or
> the CQL protocol or the sstable file format, and deploy that, it may be
> very difficult, if not impossible, to rectify with future, in-development
> versions of cassandra.
> 
> We fully intend to "engineer and test the snot out of" the changes we are
> working on as the whole point of us working on them is so we *can* run them
> in production, at our scale. We aren't expecting others in the community to
> dog food it for us. There will be a delay between committing something
> upstream, and us backporting it to a current version we run in production
> and actually deploying it. However, you can be sure that any bugs we find
> will be fixed ASAP; we have many users counting on it.
> 
> Thanks for listening,
> 
> -Jason
> 
> 
> On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston 
> wrote:
> 
>> I think Ed's just using gossip 2.0 as a hypothetical example. His point is
>> that we should only commit things when we have a high degree of confidence
>> that they work correctly, not with the expectation that they don't.
>> 
>> 
>> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (
>> mkjell...@internalcircle.com) wrote:
>> 
>> Jason has asked for review and feedback many times. Maybe be constructive
>> and review his code instead of just complaining (once again)?
>> 
>> Sent from my iPhone
>> 
>>> On Nov 19, 2016, at 1:49 PM, Edward Capriolo 
>> wrote:
>>> 
>>> I would say start with a mindset like 'people will run this in
>> production'
>>> not like 'why would you expect this to work'.
>>> 
>>> Now how does this logic effect feature develement? Maybe use gossip 2.0
>> as
>>> an example.
>>> 
>>> I will play my given debby downer role. I could imagine 1 or 2 dtests and
>>> the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes
>> with
>>> twitter announce of the release let bugs trickle in.
>>> 
>>> One could also do something comprehensive like test on clusters of 2 to
>>> 1000 nodes. Test with jepsen to see what happens during partitions,
>> inject
>>> things like jvm pauses and account for behaivor. Log convergence times
>>> after given events.
>>> 
>>> Take a stand and say look "we engineered and beat the crap out of this
>>> feature. I deployed this release feature at my company and eat my
>> dogfood.
>>> You are not my crash test dummy."
>>> 
>>> 
 On Saturday, November 19, 2016, Jeff Jirsa  wrote:
 
 Any proposal to solve the problem you describe?
 
 --
 Jeff Jirsa
 
 
> On Nov 19, 2016, at 8:50 AM, Edward Capriolo > wrote:
> 
> This is especially relevant if people wish to focus on removing things.
> 
> For example, gossip 2.0 sounds great, but seems geared toward huge
 clusters
> which is not likely a majority of users. For those with a 20 node
>> cluster
> are the indirect benefits woth it?
> 
> Also there seems to be a first push to remove things like compact
>> storage
> or thrift. Fine great. But what is the realistic update path for
>> someone.
> If the big players are running 2.1 and maintaining backports, the
>> average
> shop without a dedicated team is going to be stuck saying (great
>> features
> in 4.0 that improve performance, i would probably switch but its not
 stable
> and we have that one compact storage cf and who knows what is going to
> happen performance wise when)
> 
> We really need to lose this realease wont be stable for 6 minor
>> versions
> concept.
> 
> On Saturday, November 19, 2016, Edward Capriolo >
> wrote:
> 
>> 
>> 
>> On Friday, November 18, 2016, Jeff Jirsa 
>> <_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com <;>');>>
 wrote:
>> 
>>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>>> 4.0-and-beyond here in a few minutes.
>>> 
>>> The advantage of a prod release every 6 months 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-20 Thread Dave Brosius
>> We fully intend to "engineer and test the snot out of" the changes 
we are working on as the whole point of us working on them is so we 
*can* run them in production, at our scale.


I'm not sure how the apache team does this. Perhaps individual engineers 
can run some modern version at a company of theirs, altho that seems 
unlikely, but as an Apache org, i just don't see how that happens.


To me it seems like the Apache Cassandra infrastructure itself needs to 
stand up a multinode live instance running some 'real-world' example 
that is getting pounded, so that we can stage feature branches to really 
test them.


Otherwise we will forever be basing versions on the poor test saps who 
decide they are willing to risk all to upgrade to the cutting edge, and 
why, everyone believes in the adage, don't upgrade until at least .6


--dave


On 11/20/2016 09:50 AM, Jason Brown wrote:

Hey all,

One of the goals on my team, when working on large patches, is to get
community feedback on these initiatives before throwing them into prod.
This gets us a wider net of feedback (see Sylvain's continuing excellent
rounds of feedback to my work on CASSANDRA-8457), as well as making sure we
don't go too far off the deep end in terms of straying from the community
version. The latter point is crucial because if we make too many
incompatible changes to, for example, the internode messaging protocol or
the CQL protocol or the sstable file format, and deploy that, it may be
very difficult, if not impossible, to rectify with future, in-development
versions of cassandra.

We fully intend to "engineer and test the snot out of" the changes we are
working on as the whole point of us working on them is so we *can* run them
in production, at our scale. We aren't expecting others in the community to
dog food it for us. There will be a delay between committing something
upstream, and us backporting it to a current version we run in production
and actually deploying it. However, you can be sure that any bugs we find
will be fixed ASAP; we have many users counting on it.

Thanks for listening,

-Jason


On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston 
wrote:


I think Ed's just using gossip 2.0 as a hypothetical example. His point is
that we should only commit things when we have a high degree of confidence
that they work correctly, not with the expectation that they don't.


On November 19, 2016 at 10:52:38 AM, Michael Kjellman (
mkjell...@internalcircle.com) wrote:

Jason has asked for review and feedback many times. Maybe be constructive
and review his code instead of just complaining (once again)?

Sent from my iPhone


On Nov 19, 2016, at 1:49 PM, Edward Capriolo 

wrote:

I would say start with a mindset like 'people will run this in

production'

not like 'why would you expect this to work'.

Now how does this logic effect feature develement? Maybe use gossip 2.0

as

an example.

I will play my given debby downer role. I could imagine 1 or 2 dtests and
the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes

with

twitter announce of the release let bugs trickle in.

One could also do something comprehensive like test on clusters of 2 to
1000 nodes. Test with jepsen to see what happens during partitions,

inject

things like jvm pauses and account for behaivor. Log convergence times
after given events.

Take a stand and say look "we engineered and beat the crap out of this
feature. I deployed this release feature at my company and eat my

dogfood.

You are not my crash test dummy."



On Saturday, November 19, 2016, Jeff Jirsa  wrote:

Any proposal to solve the problem you describe?

--
Jeff Jirsa



On Nov 19, 2016, at 8:50 AM, Edward Capriolo > wrote:

This is especially relevant if people wish to focus on removing things.

For example, gossip 2.0 sounds great, but seems geared toward huge

clusters

which is not likely a majority of users. For those with a 20 node

cluster

are the indirect benefits woth it?

Also there seems to be a first push to remove things like compact

storage

or thrift. Fine great. But what is the realistic update path for

someone.

If the big players are running 2.1 and maintaining backports, the

average

shop without a dedicated team is going to be stuck saying (great

features

in 4.0 that improve performance, i would probably switch but its not

stable

and we have that one compact storage cf and who knows what is going to
happen performance wise when)

We really need to lose this realease wont be stable for 6 minor

versions

concept.

On Saturday, November 19, 2016, Edward Capriolo >

wrote:



On Friday, November 18, 2016, Jeff Jirsa 

<_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com <;>');>>

wrote:

We should assume that we’re ditching tick/tock. I’ll post a thread on
4.0-and-beyond here in a few minutes.

The advantage of a 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-20 Thread Jason Brown
Hey all,

One of the goals on my team, when working on large patches, is to get
community feedback on these initiatives before throwing them into prod.
This gets us a wider net of feedback (see Sylvain's continuing excellent
rounds of feedback to my work on CASSANDRA-8457), as well as making sure we
don't go too far off the deep end in terms of straying from the community
version. The latter point is crucial because if we make too many
incompatible changes to, for example, the internode messaging protocol or
the CQL protocol or the sstable file format, and deploy that, it may be
very difficult, if not impossible, to rectify with future, in-development
versions of cassandra.

We fully intend to "engineer and test the snot out of" the changes we are
working on as the whole point of us working on them is so we *can* run them
in production, at our scale. We aren't expecting others in the community to
dog food it for us. There will be a delay between committing something
upstream, and us backporting it to a current version we run in production
and actually deploying it. However, you can be sure that any bugs we find
will be fixed ASAP; we have many users counting on it.

Thanks for listening,

-Jason


On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston 
wrote:

> I think Ed's just using gossip 2.0 as a hypothetical example. His point is
> that we should only commit things when we have a high degree of confidence
> that they work correctly, not with the expectation that they don't.
>
>
> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (
> mkjell...@internalcircle.com) wrote:
>
> Jason has asked for review and feedback many times. Maybe be constructive
> and review his code instead of just complaining (once again)?
>
> Sent from my iPhone
>
> > On Nov 19, 2016, at 1:49 PM, Edward Capriolo 
> wrote:
> >
> > I would say start with a mindset like 'people will run this in
> production'
> > not like 'why would you expect this to work'.
> >
> > Now how does this logic effect feature develement? Maybe use gossip 2.0
> as
> > an example.
> >
> > I will play my given debby downer role. I could imagine 1 or 2 dtests and
> > the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes
> with
> > twitter announce of the release let bugs trickle in.
> >
> > One could also do something comprehensive like test on clusters of 2 to
> > 1000 nodes. Test with jepsen to see what happens during partitions,
> inject
> > things like jvm pauses and account for behaivor. Log convergence times
> > after given events.
> >
> > Take a stand and say look "we engineered and beat the crap out of this
> > feature. I deployed this release feature at my company and eat my
> dogfood.
> > You are not my crash test dummy."
> >
> >
> >> On Saturday, November 19, 2016, Jeff Jirsa  wrote:
> >>
> >> Any proposal to solve the problem you describe?
> >>
> >> --
> >> Jeff Jirsa
> >>
> >>
> >>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo  >> <;>> wrote:
> >>>
> >>> This is especially relevant if people wish to focus on removing things.
> >>>
> >>> For example, gossip 2.0 sounds great, but seems geared toward huge
> >> clusters
> >>> which is not likely a majority of users. For those with a 20 node
> cluster
> >>> are the indirect benefits woth it?
> >>>
> >>> Also there seems to be a first push to remove things like compact
> storage
> >>> or thrift. Fine great. But what is the realistic update path for
> someone.
> >>> If the big players are running 2.1 and maintaining backports, the
> average
> >>> shop without a dedicated team is going to be stuck saying (great
> features
> >>> in 4.0 that improve performance, i would probably switch but its not
> >> stable
> >>> and we have that one compact storage cf and who knows what is going to
> >>> happen performance wise when)
> >>>
> >>> We really need to lose this realease wont be stable for 6 minor
> versions
> >>> concept.
> >>>
> >>> On Saturday, November 19, 2016, Edward Capriolo  >> <;>>
> >>> wrote:
> >>>
> 
> 
>  On Friday, November 18, 2016, Jeff Jirsa  >> <;>
>  <_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com <;>');>>
> >> wrote:
> 
> > We should assume that we’re ditching tick/tock. I’ll post a thread on
> > 4.0-and-beyond here in a few minutes.
> >
> > The advantage of a prod release every 6 months is fewer incentive to
> >> push
> > unfinished work into a release.
> > The disadvantage of a prod release every 6 months is then we either
> >> have
> > a very short lifespan per-release, or we have to maintain lots of
> >> active
> > releases.
> >
> > 2.1 has been out for over 2 years, and a lot of people (including us)
> >> are
> > running it in prod – if we have a release every 6 months, that means
> >> we’d
> > be supporting 4+ releases at a time, just to keep parity with what we
> >> 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Michael Kjellman
Jason has asked for review and feedback many times. Maybe be constructive and 
review his code instead of just complaining (once again)?

Sent from my iPhone

> On Nov 19, 2016, at 1:49 PM, Edward Capriolo  wrote:
> 
> I would say start with a mindset like 'people will run this in production'
> not like 'why would you expect this to work'.
> 
> Now how does this logic effect feature develement? Maybe use gossip 2.0 as
> an example.
> 
> I will play my given debby downer role. I could imagine 1 or 2 dtests and
> the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes with
> twitter announce of the release let bugs trickle in.
> 
> One could also do something comprehensive like test on clusters of 2 to
> 1000 nodes. Test with jepsen to see what happens during partitions, inject
> things like jvm pauses and account for behaivor. Log convergence times
> after given events.
> 
> Take a stand and say look "we engineered and beat the crap out of this
> feature. I deployed this release feature at my company and eat my dogfood.
> You are not my crash test dummy."
> 
> 
>> On Saturday, November 19, 2016, Jeff Jirsa  wrote:
>> 
>> Any proposal to solve the problem you describe?
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo > > wrote:
>>> 
>>> This is especially relevant if people wish to focus on removing things.
>>> 
>>> For example, gossip 2.0 sounds great, but seems geared toward huge
>> clusters
>>> which is not likely a majority of users. For those with a 20 node cluster
>>> are the indirect benefits woth it?
>>> 
>>> Also there seems to be a first push to remove things like compact storage
>>> or thrift. Fine great. But what is the realistic update path for someone.
>>> If the big players are running 2.1 and maintaining backports, the average
>>> shop without a dedicated team is going to be stuck saying (great features
>>> in 4.0 that improve performance, i would probably switch but its not
>> stable
>>> and we have that one compact storage cf and who knows what is going to
>>> happen performance wise when)
>>> 
>>> We really need to lose this realease wont be stable for 6 minor versions
>>> concept.
>>> 
>>> On Saturday, November 19, 2016, Edward Capriolo > >
>>> wrote:
>>> 
 
 
 On Friday, November 18, 2016, Jeff Jirsa > 
 ');>>
>> wrote:
 
> We should assume that we’re ditching tick/tock. I’ll post a thread on
> 4.0-and-beyond here in a few minutes.
> 
> The advantage of a prod release every 6 months is fewer incentive to
>> push
> unfinished work into a release.
> The disadvantage of a prod release every 6 months is then we either
>> have
> a very short lifespan per-release, or we have to maintain lots of
>> active
> releases.
> 
> 2.1 has been out for over 2 years, and a lot of people (including us)
>> are
> running it in prod – if we have a release every 6 months, that means
>> we’d
> be supporting 4+ releases at a time, just to keep parity with what we
>> have
> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
>> year
> old branches.
> 
> 
> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf
>> of Blake
> Eggleston" > wrote:
> 
>>> While stability is important if we push back large "core" changes
> until later we're just setting ourselves up to face the same issues
>> later on
>> 
>> In theory, yes. In practice, when incomplete features are earmarked
>> for
> a certain release, those features are often rushed out, and not always
> fully baked.
>> 
>> In any case, I don’t think it makes sense to spend too much time
> planning what goes into 4.0, and what goes into the next major release
>> with
> so many release strategy related decisions still up in the air. Are we
> going to ditch tick-tock? If so, what will it’s replacement look like?
> Specifically, when will the next “production” release happen? Without
> knowing that, it's hard to say if something should go in 4.0, or 4.5,
>> or
> 5.0, or whatever.
>> 
>> The reason I suggested a production release every 6 months is because
> (in my mind) it’s frequent enough that people won’t be tempted to rush
> features to hit a given release, but not so frequent that it’s not
> practical to support. It wouldn’t be the end of the world if some of
>> these
> tickets didn’t make it into 4.0, because 4.5 would fine.
>> 
>> On November 18, 2016 at 1:57:21 PM, kurt Greaves (
>> k...@instaclustr.com )
> wrote:
>> 
>>> On 18 November 2016 at 18:25, Jason Brown > > wrote:

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo
I would say start with a mindset like 'people will run this in production'
not like 'why would you expect this to work'.

Now how does this logic effect feature develement? Maybe use gossip 2.0 as
an example.

I will play my given debby downer role. I could imagine 1 or 2 dtests and
the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes with
twitter announce of the release let bugs trickle in.

One could also do something comprehensive like test on clusters of 2 to
1000 nodes. Test with jepsen to see what happens during partitions, inject
things like jvm pauses and account for behaivor. Log convergence times
after given events.

Take a stand and say look "we engineered and beat the crap out of this
feature. I deployed this release feature at my company and eat my dogfood.
You are not my crash test dummy."


On Saturday, November 19, 2016, Jeff Jirsa  wrote:

> Any proposal to solve the problem you describe?
>
> --
> Jeff Jirsa
>
>
> > On Nov 19, 2016, at 8:50 AM, Edward Capriolo  > wrote:
> >
> > This is especially relevant if people wish to focus on removing things.
> >
> > For example, gossip 2.0 sounds great, but seems geared toward huge
> clusters
> > which is not likely a majority of users. For those with a 20 node cluster
> > are the indirect benefits woth it?
> >
> > Also there seems to be a first push to remove things like compact storage
> > or thrift. Fine great. But what is the realistic update path for someone.
> > If the big players are running 2.1 and maintaining backports, the average
> > shop without a dedicated team is going to be stuck saying (great features
> > in 4.0 that improve performance, i would probably switch but its not
> stable
> > and we have that one compact storage cf and who knows what is going to
> > happen performance wise when)
> >
> > We really need to lose this realease wont be stable for 6 minor versions
> > concept.
> >
> > On Saturday, November 19, 2016, Edward Capriolo  >
> > wrote:
> >
> >>
> >>
> >> On Friday, November 18, 2016, Jeff Jirsa  
> >>  >> ');>>
> wrote:
> >>
> >>> We should assume that we’re ditching tick/tock. I’ll post a thread on
> >>> 4.0-and-beyond here in a few minutes.
> >>>
> >>> The advantage of a prod release every 6 months is fewer incentive to
> push
> >>> unfinished work into a release.
> >>> The disadvantage of a prod release every 6 months is then we either
> have
> >>> a very short lifespan per-release, or we have to maintain lots of
> active
> >>> releases.
> >>>
> >>> 2.1 has been out for over 2 years, and a lot of people (including us)
> are
> >>> running it in prod – if we have a release every 6 months, that means
> we’d
> >>> be supporting 4+ releases at a time, just to keep parity with what we
> have
> >>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
> year
> >>> old branches.
> >>>
> >>>
> >>> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf
> of Blake
> >>> Eggleston" > wrote:
> >>>
> > While stability is important if we push back large "core" changes
> >>> until later we're just setting ourselves up to face the same issues
> later on
> 
>  In theory, yes. In practice, when incomplete features are earmarked
> for
> >>> a certain release, those features are often rushed out, and not always
> >>> fully baked.
> 
>  In any case, I don’t think it makes sense to spend too much time
> >>> planning what goes into 4.0, and what goes into the next major release
> with
> >>> so many release strategy related decisions still up in the air. Are we
> >>> going to ditch tick-tock? If so, what will it’s replacement look like?
> >>> Specifically, when will the next “production” release happen? Without
> >>> knowing that, it's hard to say if something should go in 4.0, or 4.5,
> or
> >>> 5.0, or whatever.
> 
>  The reason I suggested a production release every 6 months is because
> >>> (in my mind) it’s frequent enough that people won’t be tempted to rush
> >>> features to hit a given release, but not so frequent that it’s not
> >>> practical to support. It wouldn’t be the end of the world if some of
> these
> >>> tickets didn’t make it into 4.0, because 4.5 would fine.
> 
>  On November 18, 2016 at 1:57:21 PM, kurt Greaves (
> k...@instaclustr.com )
> >>> wrote:
> 
> > On 18 November 2016 at 18:25, Jason Brown  > wrote:
> >
> > #11559 (enhanced node representation) - decided it's *not* something
> we
> > need wrt #7544 storage port configurable per node, so we are punting
> on
> >
> 
>  #12344 - Forward writes to replacement node with same address during
> >>> replace
>  depends on #11559. To be honest I'd say #12344 is 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Jeff Jirsa
Any proposal to solve the problem you describe?

-- 
Jeff Jirsa


> On Nov 19, 2016, at 8:50 AM, Edward Capriolo  wrote:
> 
> This is especially relevant if people wish to focus on removing things.
> 
> For example, gossip 2.0 sounds great, but seems geared toward huge clusters
> which is not likely a majority of users. For those with a 20 node cluster
> are the indirect benefits woth it?
> 
> Also there seems to be a first push to remove things like compact storage
> or thrift. Fine great. But what is the realistic update path for someone.
> If the big players are running 2.1 and maintaining backports, the average
> shop without a dedicated team is going to be stuck saying (great features
> in 4.0 that improve performance, i would probably switch but its not stable
> and we have that one compact storage cf and who knows what is going to
> happen performance wise when)
> 
> We really need to lose this realease wont be stable for 6 minor versions
> concept.
> 
> On Saturday, November 19, 2016, Edward Capriolo 
> wrote:
> 
>> 
>> 
>> On Friday, November 18, 2016, Jeff Jirsa > > wrote:
>> 
>>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>>> 4.0-and-beyond here in a few minutes.
>>> 
>>> The advantage of a prod release every 6 months is fewer incentive to push
>>> unfinished work into a release.
>>> The disadvantage of a prod release every 6 months is then we either have
>>> a very short lifespan per-release, or we have to maintain lots of active
>>> releases.
>>> 
>>> 2.1 has been out for over 2 years, and a lot of people (including us) are
>>> running it in prod – if we have a release every 6 months, that means we’d
>>> be supporting 4+ releases at a time, just to keep parity with what we have
>>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
>>> old branches.
>>> 
>>> 
>>> On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake
>>> Eggleston"  wrote:
>>> 
> While stability is important if we push back large "core" changes
>>> until later we're just setting ourselves up to face the same issues later on
 
 In theory, yes. In practice, when incomplete features are earmarked for
>>> a certain release, those features are often rushed out, and not always
>>> fully baked.
 
 In any case, I don’t think it makes sense to spend too much time
>>> planning what goes into 4.0, and what goes into the next major release with
>>> so many release strategy related decisions still up in the air. Are we
>>> going to ditch tick-tock? If so, what will it’s replacement look like?
>>> Specifically, when will the next “production” release happen? Without
>>> knowing that, it's hard to say if something should go in 4.0, or 4.5, or
>>> 5.0, or whatever.
 
 The reason I suggested a production release every 6 months is because
>>> (in my mind) it’s frequent enough that people won’t be tempted to rush
>>> features to hit a given release, but not so frequent that it’s not
>>> practical to support. It wouldn’t be the end of the world if some of these
>>> tickets didn’t make it into 4.0, because 4.5 would fine.
 
 On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com)
>>> wrote:
 
> On 18 November 2016 at 18:25, Jason Brown  wrote:
> 
> #11559 (enhanced node representation) - decided it's *not* something we
> need wrt #7544 storage port configurable per node, so we are punting on
> 
 
 #12344 - Forward writes to replacement node with same address during
>>> replace
 depends on #11559. To be honest I'd say #12344 is pretty important,
 otherwise it makes it difficult to replace nodes without potentially
 requiring client code/configuration changes. It would be nice to get
>>> #12344
 in for 4.0. It's marked as an improvement but I'd consider it a bug and
 thus think it could be included in a later minor release.
 
 Introducing all of these in a single release seems pretty risky. I think
>>> it
> would be safer to spread these out over a few 4.x releases (as they’re
> finished) and give them time to stabilize before including them in an
>>> LTS
> release. The downside would be having to maintain backwards
>>> compatibility
> across the 4.x versions, but that seems preferable to delaying the
>>> release
> of 4.0 to include these, and having another big bang release.
 
 
 I don't think anyone expects 4.0.0 to be stable. It's a major version
 change with lots of new features; in the production world people don't
 normally move to a new major version until it has been out for quite some
 time and several minor releases have passed. Really, most people are only
 migrating to 3.0.x now. While stability is important if we push back
>>> large
 "core" 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Michael Kjellman
Honest question: are you *ever* positive Ed? 

Maybe give it a shot once in a while. It will be good for your mental health. 


Sent from my iPhone

> On Nov 19, 2016, at 11:50 AM, Edward Capriolo  wrote:
> 
> This is especially relevant if people wish to focus on removing things.
> 
> For example, gossip 2.0 sounds great, but seems geared toward huge clusters
> which is not likely a majority of users. For those with a 20 node cluster
> are the indirect benefits woth it?
> 
> Also there seems to be a first push to remove things like compact storage
> or thrift. Fine great. But what is the realistic update path for someone.
> If the big players are running 2.1 and maintaining backports, the average
> shop without a dedicated team is going to be stuck saying (great features
> in 4.0 that improve performance, i would probably switch but its not stable
> and we have that one compact storage cf and who knows what is going to
> happen performance wise when)
> 
> We really need to lose this realease wont be stable for 6 minor versions
> concept.
> 
> On Saturday, November 19, 2016, Edward Capriolo 
> wrote:
> 
>> 
>> 
>> On Friday, November 18, 2016, Jeff Jirsa > > wrote:
>> 
>>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>>> 4.0-and-beyond here in a few minutes.
>>> 
>>> The advantage of a prod release every 6 months is fewer incentive to push
>>> unfinished work into a release.
>>> The disadvantage of a prod release every 6 months is then we either have
>>> a very short lifespan per-release, or we have to maintain lots of active
>>> releases.
>>> 
>>> 2.1 has been out for over 2 years, and a lot of people (including us) are
>>> running it in prod – if we have a release every 6 months, that means we’d
>>> be supporting 4+ releases at a time, just to keep parity with what we have
>>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
>>> old branches.
>>> 
>>> 
>>> On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake
>>> Eggleston"  wrote:
>>> 
> While stability is important if we push back large "core" changes
>>> until later we're just setting ourselves up to face the same issues later on
 
 In theory, yes. In practice, when incomplete features are earmarked for
>>> a certain release, those features are often rushed out, and not always
>>> fully baked.
 
 In any case, I don’t think it makes sense to spend too much time
>>> planning what goes into 4.0, and what goes into the next major release with
>>> so many release strategy related decisions still up in the air. Are we
>>> going to ditch tick-tock? If so, what will it’s replacement look like?
>>> Specifically, when will the next “production” release happen? Without
>>> knowing that, it's hard to say if something should go in 4.0, or 4.5, or
>>> 5.0, or whatever.
 
 The reason I suggested a production release every 6 months is because
>>> (in my mind) it’s frequent enough that people won’t be tempted to rush
>>> features to hit a given release, but not so frequent that it’s not
>>> practical to support. It wouldn’t be the end of the world if some of these
>>> tickets didn’t make it into 4.0, because 4.5 would fine.
 
 On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com)
>>> wrote:
 
> On 18 November 2016 at 18:25, Jason Brown  wrote:
> 
> #11559 (enhanced node representation) - decided it's *not* something we
> need wrt #7544 storage port configurable per node, so we are punting on
> 
 
 #12344 - Forward writes to replacement node with same address during
>>> replace
 depends on #11559. To be honest I'd say #12344 is pretty important,
 otherwise it makes it difficult to replace nodes without potentially
 requiring client code/configuration changes. It would be nice to get
>>> #12344
 in for 4.0. It's marked as an improvement but I'd consider it a bug and
 thus think it could be included in a later minor release.
 
 Introducing all of these in a single release seems pretty risky. I think
>>> it
> would be safer to spread these out over a few 4.x releases (as they’re
> finished) and give them time to stabilize before including them in an
>>> LTS
> release. The downside would be having to maintain backwards
>>> compatibility
> across the 4.x versions, but that seems preferable to delaying the
>>> release
> of 4.0 to include these, and having another big bang release.
 
 
 I don't think anyone expects 4.0.0 to be stable. It's a major version
 change with lots of new features; in the production world people don't
 normally move to a new major version until it has been out for quite some
 time and several minor releases have passed. Really, most people are only
 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo
This is especially relevant if people wish to focus on removing things.

For example, gossip 2.0 sounds great, but seems geared toward huge clusters
which is not likely a majority of users. For those with a 20 node cluster
are the indirect benefits woth it?

Also there seems to be a first push to remove things like compact storage
or thrift. Fine great. But what is the realistic update path for someone.
If the big players are running 2.1 and maintaining backports, the average
shop without a dedicated team is going to be stuck saying (great features
in 4.0 that improve performance, i would probably switch but its not stable
and we have that one compact storage cf and who knows what is going to
happen performance wise when)

We really need to lose this realease wont be stable for 6 minor versions
concept.

On Saturday, November 19, 2016, Edward Capriolo 
wrote:

>
>
> On Friday, November 18, 2016, Jeff Jirsa  > wrote:
>
>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>> 4.0-and-beyond here in a few minutes.
>>
>> The advantage of a prod release every 6 months is fewer incentive to push
>> unfinished work into a release.
>> The disadvantage of a prod release every 6 months is then we either have
>> a very short lifespan per-release, or we have to maintain lots of active
>> releases.
>>
>> 2.1 has been out for over 2 years, and a lot of people (including us) are
>> running it in prod – if we have a release every 6 months, that means we’d
>> be supporting 4+ releases at a time, just to keep parity with what we have
>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
>> old branches.
>>
>>
>> On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake
>> Eggleston"  wrote:
>>
>> >> While stability is important if we push back large "core" changes
>> until later we're just setting ourselves up to face the same issues later on
>> >
>> >In theory, yes. In practice, when incomplete features are earmarked for
>> a certain release, those features are often rushed out, and not always
>> fully baked.
>> >
>> >In any case, I don’t think it makes sense to spend too much time
>> planning what goes into 4.0, and what goes into the next major release with
>> so many release strategy related decisions still up in the air. Are we
>> going to ditch tick-tock? If so, what will it’s replacement look like?
>> Specifically, when will the next “production” release happen? Without
>> knowing that, it's hard to say if something should go in 4.0, or 4.5, or
>> 5.0, or whatever.
>> >
>> >The reason I suggested a production release every 6 months is because
>> (in my mind) it’s frequent enough that people won’t be tempted to rush
>> features to hit a given release, but not so frequent that it’s not
>> practical to support. It wouldn’t be the end of the world if some of these
>> tickets didn’t make it into 4.0, because 4.5 would fine.
>> >
>> >On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com)
>> wrote:
>> >
>> >On 18 November 2016 at 18:25, Jason Brown  wrote:
>> >
>> >> #11559 (enhanced node representation) - decided it's *not* something we
>> >> need wrt #7544 storage port configurable per node, so we are punting on
>> >>
>> >
>> >#12344 - Forward writes to replacement node with same address during
>> replace
>> >depends on #11559. To be honest I'd say #12344 is pretty important,
>> >otherwise it makes it difficult to replace nodes without potentially
>> >requiring client code/configuration changes. It would be nice to get
>> #12344
>> >in for 4.0. It's marked as an improvement but I'd consider it a bug and
>> >thus think it could be included in a later minor release.
>> >
>> >Introducing all of these in a single release seems pretty risky. I think
>> it
>> >> would be safer to spread these out over a few 4.x releases (as they’re
>> >> finished) and give them time to stabilize before including them in an
>> LTS
>> >> release. The downside would be having to maintain backwards
>> compatibility
>> >> across the 4.x versions, but that seems preferable to delaying the
>> release
>> >> of 4.0 to include these, and having another big bang release.
>> >
>> >
>> >I don't think anyone expects 4.0.0 to be stable. It's a major version
>> >change with lots of new features; in the production world people don't
>> >normally move to a new major version until it has been out for quite some
>> >time and several minor releases have passed. Really, most people are only
>> >migrating to 3.0.x now. While stability is important if we push back
>> large
>> >"core" changes until later we're just setting ourselves up to face the
>> same
>> >issues later on. There should be enough uptake on the early releases of
>> 4.0
>> >from new users to help test and get it to a production-ready state.
>> >
>> >
>> >Kurt Greaves
>> 

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo
On Friday, November 18, 2016, Jeff Jirsa  wrote:

> We should assume that we’re ditching tick/tock. I’ll post a thread on
> 4.0-and-beyond here in a few minutes.
>
> The advantage of a prod release every 6 months is fewer incentive to push
> unfinished work into a release.
> The disadvantage of a prod release every 6 months is then we either have a
> very short lifespan per-release, or we have to maintain lots of active
> releases.
>
> 2.1 has been out for over 2 years, and a lot of people (including us) are
> running it in prod – if we have a release every 6 months, that means we’d
> be supporting 4+ releases at a time, just to keep parity with what we have
> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
> old branches.
>
>
> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf of
> Blake Eggleston" > wrote:
>
> >> While stability is important if we push back large "core" changes until
> later we're just setting ourselves up to face the same issues later on
> >
> >In theory, yes. In practice, when incomplete features are earmarked for a
> certain release, those features are often rushed out, and not always fully
> baked.
> >
> >In any case, I don’t think it makes sense to spend too much time planning
> what goes into 4.0, and what goes into the next major release with so many
> release strategy related decisions still up in the air. Are we going to
> ditch tick-tock? If so, what will it’s replacement look like? Specifically,
> when will the next “production” release happen? Without knowing that, it's
> hard to say if something should go in 4.0, or 4.5, or 5.0, or whatever.
> >
> >The reason I suggested a production release every 6 months is because (in
> my mind) it’s frequent enough that people won’t be tempted to rush features
> to hit a given release, but not so frequent that it’s not practical to
> support. It wouldn’t be the end of the world if some of these tickets
> didn’t make it into 4.0, because 4.5 would fine.
> >
> >On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com
> ) wrote:
> >
> >On 18 November 2016 at 18:25, Jason Brown  > wrote:
> >
> >> #11559 (enhanced node representation) - decided it's *not* something we
> >> need wrt #7544 storage port configurable per node, so we are punting on
> >>
> >
> >#12344 - Forward writes to replacement node with same address during
> replace
> >depends on #11559. To be honest I'd say #12344 is pretty important,
> >otherwise it makes it difficult to replace nodes without potentially
> >requiring client code/configuration changes. It would be nice to get
> #12344
> >in for 4.0. It's marked as an improvement but I'd consider it a bug and
> >thus think it could be included in a later minor release.
> >
> >Introducing all of these in a single release seems pretty risky. I think
> it
> >> would be safer to spread these out over a few 4.x releases (as they’re
> >> finished) and give them time to stabilize before including them in an
> LTS
> >> release. The downside would be having to maintain backwards
> compatibility
> >> across the 4.x versions, but that seems preferable to delaying the
> release
> >> of 4.0 to include these, and having another big bang release.
> >
> >
> >I don't think anyone expects 4.0.0 to be stable. It's a major version
> >change with lots of new features; in the production world people don't
> >normally move to a new major version until it has been out for quite some
> >time and several minor releases have passed. Really, most people are only
> >migrating to 3.0.x now. While stability is important if we push back large
> >"core" changes until later we're just setting ourselves up to face the
> same
> >issues later on. There should be enough uptake on the early releases of
> 4.0
> >from new users to help test and get it to a production-ready state.
> >
> >
> >Kurt Greaves
> >k...@instaclustr.com 
>
>
 I don't think anyone expects 4.0.0 to be stable

Someone previously described 3.0 as the "break everything release".

We know that many people are still 2.1 and 3.0. Cassandra will always be
maintaining 3 or 4 active branches and have adoption issues if releases are
not stable and usable.

Being that cassandra was 1.0 years ago I expect things to be stable. Half
working features , or added this broke that are not appealing to me.



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-18 Thread Jeff Jirsa
We should assume that we’re ditching tick/tock. I’ll post a thread on 
4.0-and-beyond here in a few minutes.

The advantage of a prod release every 6 months is fewer incentive to push 
unfinished work into a release.
The disadvantage of a prod release every 6 months is then we either have a very 
short lifespan per-release, or we have to maintain lots of active releases. 

2.1 has been out for over 2 years, and a lot of people (including us) are 
running it in prod – if we have a release every 6 months, that means we’d be 
supporting 4+ releases at a time, just to keep parity with what we have now? 
Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year old 
branches. 


On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake Eggleston" 
 wrote:

>> While stability is important if we push back large "core" changes until 
>> later we're just setting ourselves up to face the same issues later on
>
>In theory, yes. In practice, when incomplete features are earmarked for a 
>certain release, those features are often rushed out, and not always fully 
>baked.
>
>In any case, I don’t think it makes sense to spend too much time planning what 
>goes into 4.0, and what goes into the next major release with so many release 
>strategy related decisions still up in the air. Are we going to ditch 
>tick-tock? If so, what will it’s replacement look like? Specifically, when 
>will the next “production” release happen? Without knowing that, it's hard to 
>say if something should go in 4.0, or 4.5, or 5.0, or whatever.
>
>The reason I suggested a production release every 6 months is because (in my 
>mind) it’s frequent enough that people won’t be tempted to rush features to 
>hit a given release, but not so frequent that it’s not practical to support. 
>It wouldn’t be the end of the world if some of these tickets didn’t make it 
>into 4.0, because 4.5 would fine.
>
>On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com) wrote:
>
>On 18 November 2016 at 18:25, Jason Brown  wrote:  
>
>> #11559 (enhanced node representation) - decided it's *not* something we  
>> need wrt #7544 storage port configurable per node, so we are punting on  
>>  
>
>#12344 - Forward writes to replacement node with same address during replace  
>depends on #11559. To be honest I'd say #12344 is pretty important,  
>otherwise it makes it difficult to replace nodes without potentially  
>requiring client code/configuration changes. It would be nice to get #12344  
>in for 4.0. It's marked as an improvement but I'd consider it a bug and  
>thus think it could be included in a later minor release.  
>
>Introducing all of these in a single release seems pretty risky. I think it  
>> would be safer to spread these out over a few 4.x releases (as they’re  
>> finished) and give them time to stabilize before including them in an LTS  
>> release. The downside would be having to maintain backwards compatibility  
>> across the 4.x versions, but that seems preferable to delaying the release  
>> of 4.0 to include these, and having another big bang release.  
>
>
>I don't think anyone expects 4.0.0 to be stable. It's a major version  
>change with lots of new features; in the production world people don't  
>normally move to a new major version until it has been out for quite some  
>time and several minor releases have passed. Really, most people are only  
>migrating to 3.0.x now. While stability is important if we push back large  
>"core" changes until later we're just setting ourselves up to face the same  
>issues later on. There should be enough uptake on the early releases of 4.0  
>from new users to help test and get it to a production-ready state.  
>
>
>Kurt Greaves  
>k...@instaclustr.com  



smime.p7s
Description: S/MIME cryptographic signature


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-18 Thread Blake Eggleston
> While stability is important if we push back large "core" changes until later 
> we're just setting ourselves up to face the same issues later on

In theory, yes. In practice, when incomplete features are earmarked for a 
certain release, those features are often rushed out, and not always fully 
baked.

In any case, I don’t think it makes sense to spend too much time planning what 
goes into 4.0, and what goes into the next major release with so many release 
strategy related decisions still up in the air. Are we going to ditch 
tick-tock? If so, what will it’s replacement look like? Specifically, when will 
the next “production” release happen? Without knowing that, it's hard to say if 
something should go in 4.0, or 4.5, or 5.0, or whatever.

The reason I suggested a production release every 6 months is because (in my 
mind) it’s frequent enough that people won’t be tempted to rush features to hit 
a given release, but not so frequent that it’s not practical to support. It 
wouldn’t be the end of the world if some of these tickets didn’t make it into 
4.0, because 4.5 would fine.

On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com) wrote:

On 18 November 2016 at 18:25, Jason Brown  wrote:  

> #11559 (enhanced node representation) - decided it's *not* something we  
> need wrt #7544 storage port configurable per node, so we are punting on  
>  

#12344 - Forward writes to replacement node with same address during replace  
  
depends on #11559. To be honest I'd say #12344 is pretty important,  
otherwise it makes it difficult to replace nodes without potentially  
requiring client code/configuration changes. It would be nice to get #12344  
in for 4.0. It's marked as an improvement but I'd consider it a bug and  
thus think it could be included in a later minor release.  

Introducing all of these in a single release seems pretty risky. I think it  
> would be safer to spread these out over a few 4.x releases (as they’re  
> finished) and give them time to stabilize before including them in an LTS  
> release. The downside would be having to maintain backwards compatibility  
> across the 4.x versions, but that seems preferable to delaying the release  
> of 4.0 to include these, and having another big bang release.  


I don't think anyone expects 4.0.0 to be stable. It's a major version  
change with lots of new features; in the production world people don't  
normally move to a new major version until it has been out for quite some  
time and several minor releases have passed. Really, most people are only  
migrating to 3.0.x now. While stability is important if we push back large  
"core" changes until later we're just setting ourselves up to face the same  
issues later on. There should be enough uptake on the early releases of 4.0  
from new users to help test and get it to a production-ready state.  


Kurt Greaves  
k...@instaclustr.com  
www.instaclustr.com  


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-18 Thread kurt Greaves
On 18 November 2016 at 18:25, Jason Brown  wrote:

> #11559 (enhanced node representation) - decided it's *not* something we
> need wrt #7544 storage port configurable per node, so we are punting on
>

#12344 - Forward writes to replacement node with same address during replace

depends on #11559. To be honest I'd say #12344 is pretty important,
otherwise it makes it difficult to replace nodes without potentially
requiring client code/configuration changes. It would be nice to get #12344
in for 4.0. It's marked as an improvement but I'd consider it a bug and
thus think it could be included in a later minor release.

Introducing all of these in a single release seems pretty risky. I think it
> would be safer to spread these out over a few 4.x releases (as they’re
> finished) and give them time to stabilize before including them in an LTS
> release. The downside would be having to maintain backwards compatibility
> across the 4.x versions, but that seems preferable to delaying the release
> of 4.0 to include these, and having another big bang release.


I don't think anyone expects 4.0.0 to be stable. It's a major version
change with lots of new features; in the production world people don't
normally move to a new major version until it has been out for quite some
time and several minor releases have passed. Really, most people are only
migrating to 3.0.x now. While stability is important if we push back large
"core" changes until later we're just setting ourselves up to face the same
issues later on. There should be enough uptake on the early releases of 4.0
from new users to help test and get it to a production-ready state.


Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-18 Thread Blake Eggleston
Introducing all of these in a single release seems pretty risky. I think it 
would be safer to spread these out over a few 4.x releases (as they’re 
finished) and give them time to stabilize before including them in an LTS 
release. The downside would be having to maintain backwards compatibility 
across the 4.x versions, but that seems preferable to delaying the release of 
4.0 to include these, and having another big bang release.

The other problem here is uncertainty about the frequency and length of support 
of so called LTS releases. There was a thread about getting off of tick-tock a 
while ago, but it died without coming to any kind of conclusion. Personally, 
I’d like to see us do (non-dev) releases every 6 months, support them for 1 
year, and critical fixes for 2 years.


On November 18, 2016 at 10:25:31 AM, Jason Brown (jasedbr...@gmail.com) wrote:

Hey all,  

Here's an update on the following items:  

NIO meassing/streaming - first is being reviewed; second is getting close  
to review time  
Gossip 2.0 - TL;DR I don't plan on moving cluster metadata (the current  
"gossip" data) onto the new gossip/membership stack until 5.0, so it's not  
a 4.0 blocker. I'll update #12345 with the details I'm thinking about. I  
still want to start getting this code in, though, or at least in discussion.  
Birch - on track  
#11559 (enhanced node representation) - decided it's *not* something we  
need wrt #7544 storage port configurable per node, so we are punting on  
#11559  
#6246 epaxos - if we're targeting Q1 2017 for 4.0, we probably can't get it  
ready by then  
#7544 storage port configurable per node - on track  

So basically, I've removed two items off that list of blockers for 4.0.  
Hope that helps  

-Jason  



On Fri, Nov 18, 2016 at 9:25 AM, sankalp kohli   
wrote:  

> Hi Nate,  
> Most of the JIRAs in the middle are being rebased or being  
> reviewed and code is already out there. These will make 4.0 a very solid  
> release.  
>  
> Thanks,  
> Sankalp  
>  
> On Thu, Nov 17, 2016 at 5:10 PM, Ben Bromhead  wrote:  
>  
> > We are happy to start testing against completed features. Ideally once  
> > everything is ready for an RC (to catch interaction bugs), but we can do  
> > sooner for features where it make sense and are finished earlier.  
> >  
> > On Thu, 17 Nov 2016 at 16:47 Nate McCall  wrote:  
> >  
> > > To sum up that other thread (I very much appreciate everyone's input,  
> > > btw), here is an aggregate list of large, breaking 4.0 proposed  
> > > changes:  
> > >  
> > > CASSANDRA-9425 Immutable node-local schema  
> > > CASSANDRA-10699 Strongly consistent schema alterations  
> > > --  
> > > CASSANDRA-12229 NIO streaming  
> > > CASSANDRA-8457 NIO messaging  
> > > CASSANDRA-12345 Gossip 2.0  
> > > CASSANDRA-9754 Birch trees  
> > > CASSANDRA-11559 enhanced node representation  
> > > CASSANDRA-6246 epaxos  
> > > CASSANDRA-7544 storage port configurable per node  
> > > --  
> > > CASSANDRA-5 remove thrift support  
> > > CASSANDRA-10857 dropping compact storage  
> > >  
> > > Again, this is the "big things that will probably break stuff" list  
> > > and thus should happen with a major (did I miss anything?). There  
> > > were/are/will be other smaller issues, but we don't really need to  
> > > keep them in front of us for this discussion as they can/will just  
> > > kind of happen w/o necessarily affecting anything else.  
> > >  
> > > That all said, since we are 'doing a software' we need to start  
> > > thinking about the above in balance with resources and time. However,  
> > > a lot of the above items do have a substantial amount of code written  
> > > against them so it's not as daunting as it seems.  
> > >  
> > > What I would like us to discuss is rough timelines and what is needed  
> > > to get these out the door.  
> > >  
> > > One thing that sticks out to me: that big chunk in the middle there is  
> > > coming out of the same shop in Cupertino. I'm nervous about that. Not  
> > > that that ya'll are not capable, I'm solely looking at it from the  
> > > "that is a big list of some pretty hard shit" perspective.  
> > >  
> > > So what else do we need to discuss to get these completed? How and  
> > > where can other folks pitch in?  
> > >  
> > > -Nate  
> > >  
> > --  
> > Ben Bromhead  
> > CTO | Instaclustr   
> > +1 650 284 9692  
> > Managed Cassandra / Spark on AWS, Azure and Softlayer  
> >  
>  


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-18 Thread sankalp kohli
Hi Nate,
 Most of the JIRAs in the middle are being rebased or being
reviewed and code is already out there. These will make 4.0 a very solid
release.

Thanks,
Sankalp

On Thu, Nov 17, 2016 at 5:10 PM, Ben Bromhead  wrote:

> We are happy to start testing against completed features. Ideally once
> everything is ready for an RC (to catch interaction bugs), but we can do
> sooner for features where it make sense and are finished earlier.
>
> On Thu, 17 Nov 2016 at 16:47 Nate McCall  wrote:
>
> > To sum up that other thread (I very much appreciate everyone's input,
> > btw), here is an aggregate list of large, breaking 4.0 proposed
> > changes:
> >
> > CASSANDRA-9425 Immutable node-local schema
> > CASSANDRA-10699 Strongly consistent schema alterations
> > --
> > CASSANDRA-12229 NIO streaming
> > CASSANDRA-8457 NIO messaging
> > CASSANDRA-12345 Gossip 2.0
> > CASSANDRA-9754 Birch trees
> > CASSANDRA-11559 enhanced node representation
> > CASSANDRA-6246 epaxos
> > CASSANDRA-7544 storage port configurable per node
> > --
> > CASSANDRA-5 remove thrift support
> > CASSANDRA-10857 dropping compact storage
> >
> > Again, this is the "big things that will probably break stuff" list
> > and thus should happen with a major (did I miss anything?). There
> > were/are/will be other smaller issues, but we don't really need to
> > keep them in front of us for this discussion as they can/will just
> > kind of happen w/o necessarily affecting anything else.
> >
> > That all said, since we are 'doing a software' we need to start
> > thinking about the above in balance with resources and time. However,
> > a lot of the above items do have a substantial amount of code written
> > against them so it's not as daunting as it seems.
> >
> > What I would like us to discuss is rough timelines and what is needed
> > to get these out the door.
> >
> > One thing that sticks out to me: that big chunk in the middle there is
> > coming out of the same shop in Cupertino. I'm nervous about that. Not
> > that that ya'll are not capable, I'm solely looking at it from the
> > "that is a big list of some pretty hard shit" perspective.
> >
> > So what else do we need to discuss to get these completed? How and
> > where can other folks pitch in?
> >
> > -Nate
> >
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-17 Thread Ben Bromhead
We are happy to start testing against completed features. Ideally once
everything is ready for an RC (to catch interaction bugs), but we can do
sooner for features where it make sense and are finished earlier.

On Thu, 17 Nov 2016 at 16:47 Nate McCall  wrote:

> To sum up that other thread (I very much appreciate everyone's input,
> btw), here is an aggregate list of large, breaking 4.0 proposed
> changes:
>
> CASSANDRA-9425 Immutable node-local schema
> CASSANDRA-10699 Strongly consistent schema alterations
> --
> CASSANDRA-12229 NIO streaming
> CASSANDRA-8457 NIO messaging
> CASSANDRA-12345 Gossip 2.0
> CASSANDRA-9754 Birch trees
> CASSANDRA-11559 enhanced node representation
> CASSANDRA-6246 epaxos
> CASSANDRA-7544 storage port configurable per node
> --
> CASSANDRA-5 remove thrift support
> CASSANDRA-10857 dropping compact storage
>
> Again, this is the "big things that will probably break stuff" list
> and thus should happen with a major (did I miss anything?). There
> were/are/will be other smaller issues, but we don't really need to
> keep them in front of us for this discussion as they can/will just
> kind of happen w/o necessarily affecting anything else.
>
> That all said, since we are 'doing a software' we need to start
> thinking about the above in balance with resources and time. However,
> a lot of the above items do have a substantial amount of code written
> against them so it's not as daunting as it seems.
>
> What I would like us to discuss is rough timelines and what is needed
> to get these out the door.
>
> One thing that sticks out to me: that big chunk in the middle there is
> coming out of the same shop in Cupertino. I'm nervous about that. Not
> that that ya'll are not capable, I'm solely looking at it from the
> "that is a big list of some pretty hard shit" perspective.
>
> So what else do we need to discuss to get these completed? How and
> where can other folks pitch in?
>
> -Nate
>
-- 
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer


Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-17 Thread Nate McCall
To sum up that other thread (I very much appreciate everyone's input,
btw), here is an aggregate list of large, breaking 4.0 proposed
changes:

CASSANDRA-9425 Immutable node-local schema
CASSANDRA-10699 Strongly consistent schema alterations
--
CASSANDRA-12229 NIO streaming
CASSANDRA-8457 NIO messaging
CASSANDRA-12345 Gossip 2.0
CASSANDRA-9754 Birch trees
CASSANDRA-11559 enhanced node representation
CASSANDRA-6246 epaxos
CASSANDRA-7544 storage port configurable per node
--
CASSANDRA-5 remove thrift support
CASSANDRA-10857 dropping compact storage

Again, this is the "big things that will probably break stuff" list
and thus should happen with a major (did I miss anything?). There
were/are/will be other smaller issues, but we don't really need to
keep them in front of us for this discussion as they can/will just
kind of happen w/o necessarily affecting anything else.

That all said, since we are 'doing a software' we need to start
thinking about the above in balance with resources and time. However,
a lot of the above items do have a substantial amount of code written
against them so it's not as daunting as it seems.

What I would like us to discuss is rough timelines and what is needed
to get these out the door.

One thing that sticks out to me: that big chunk in the middle there is
coming out of the same shop in Cupertino. I'm nervous about that. Not
that that ya'll are not capable, I'm solely looking at it from the
"that is a big list of some pretty hard shit" perspective.

So what else do we need to discuss to get these completed? How and
where can other folks pitch in?

-Nate