+1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks!
On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang <ud1...@gmail.com> wrote: > Can I regard the odd version as the "development preview" and the even > version as the "production ready"? > > IMO, as a database infrastructure project, "stable" is more important than > other kinds of projects. LTS is a good idea, but if we don't support > non-LTS releases for enough time to fix their bugs, users on non-LTS > release may have to upgrade a new major release to fix the bugs and may > have to handle some new bugs by the new features. I'm afraid that > eventually people would only think about the LTS one. > > > 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich <pove...@gmail.com>: > > > +1 > > > > On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman < > > mkjell...@internalcircle.com> wrote: > > > > > For most of my life I’ve lived on the software bleeding edge both > > > personally and professionally. Maybe it’s a personal weakness, but I > > guess > > > I get a thrill out of the problem solving aspect? > > > > > > Recently I came to a bit of an epiphany — the closer I keep to the > daily > > > build — generally the happier I am on a daily basis. Bugs happen, but > for > > > the most part (aside from show stopper bugs), pain points for myself > in a > > > given daily build can generally can be debugged to 1 or maybe 2 root > > > causes, fixed in ~24 hours, and then life is better the next day again. > > In > > > comparison, the old waterfall model generally means taking an > “official” > > > release at some point and waiting for some poor soul (or developer) to > > > actually run the thing. No matter how good the QA team is, until it’s > > > actually used in the real world, most bugs aren’t found. > > > > > > If you and your organization can wait 24 hours * number of bugs > > discovered > > > after people actually started using the thing, you end up with a > “usable > > > build” around the holy-grail minor X.X.5 release of Cassandra. > > > > > > I love the idea of the LTS model Jonathan describes because it means > more > > > code can get real testing and “bake” for longer instead of sitting > > largely > > > unused on some git repository in a datacenter far far away. A lot of > code > > > has changed between 2.0 and trunk today. The code has diverged to the > > point > > > that if you write something for 2.0 (as the most stable major branch > > > currently available), merging it forward to 3.0 or after generally > means > > > rewriting it. If the only thing that comes out of this is a smaller > delta > > > of LOC between the deployable version/branch and what we can develop > > > against and what QA is focused on I think that’s a massive win. > > > > > > Something like CASSANDRA-8099 will need 2x the baking time of even many > > of > > > the more risky changes the project has made. While I wouldn’t want to > > run a > > > build with CASSANDRA-8099 in it anytime soon, there are now hundreds of > > > other changes blocked, most likely many containing new bugs of their > own, > > > but have no exposure at all to even the most involved C* developers. > > > > > > I really think this will be a huge win for the project and I’m super > > > thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding > this > > > change to a much more sustainable release model for the entire > community. > > > > > > best, > > > kjellman > > > > > > > > > > On Mar 18, 2015, at 3:02 PM, Ariel Weisberg < > > ariel.weisb...@datastax.com> > > > wrote: > > > > > > > > Hi, > > > > > > > > Keep in mind it is a bug fix release every month and a feature > release > > > every two months. > > > > > > > > For development that is really a two month cycle with all bug fixes > > > being backported one release. As a developer if you want to get > something > > > in a release you have two months and you should be sizing pieces of > large > > > tasks so they ship at least every two months. > > > > > > > > Ariel > > > >> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd <tscana...@gmail.com > > > > > wrote: > > > >> > > > >> I like the idea but I agree that every month is a bit aggressive. I > > > have no > > > >> say but: > > > >> > > > >> I would say 4 releases a year instead of 12. with 2 months of new > > > features > > > >> and 1 month of bug squashing per a release. With the 4th quarter > just > > > bugs. > > > >> > > > >> I would also proposed 2 year LTS releases for the releases after the > > 4th > > > >> quarter. So everyone could get a new feature release every quarter > and > > > the > > > >> stability of super major versions for 2 years. > > > >> > > > >> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius < > > dbros...@mebigfatguy.com > > > > > > > >> wrote: > > > >> > > > >>> It would seem the practical implications of this is that there > would > > be > > > >>> significantly more development on branches, with potentially more > > > >>> significant delays on merging these branches. This would imply to > me > > > that > > > >>> more Jenkins servers would need to be set up to handle auto-testing > > of > > > more > > > >>> branches, as if feature work spends more time on external branches, > > it > > > is > > > >>> then likely to be be less tested (even if by accident) as less > > > developers > > > >>> would be working on that branch. Only when a feature was blessed to > > > make it > > > >>> to the release-tracked branch, would it become exposed to the > > majority > > > of > > > >>> developers/testers, etc doing normal running/playing/testing. > > > >>> > > > >>> This isn't to knock the idea in anyway, just wanted to mention > what i > > > >>> think the outcome would be. > > > >>> > > > >>> dave > > > >>> > > > >>> > > > >>> > > > >>>> > > > >>>>>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis < > > jbel...@gmail.com> > > > >>>>> wrote: > > > >>>>>>> Cassandra 2.1 was released in September, which means that if we > > > were > > > >>>>> on > > > >>>>>>> track with our stated goal of six month releases, 3.0 would be > > done > > > >>>>> about > > > >>>>>>> now. Instead, we haven't even delivered a beta. The immediate > > > cause > > > >>>>>> this > > > >>>>>>> time is blocking for 8099 > > > >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but > the > > > >>>>> reality > > > >>>>>> is > > > >>>>>>> that nobody should really be surprised. Something always comes > > up > > > -- > > > >>>>>> we've > > > >>>>>>> averaged about nine months since 1.0, with 2.1 taking an entire > > > year. > > > >>>>>>> > > > >>>>>>> We could make theory align with reality by acknowledging, "if > > nine > > > >>>>> months > > > >>>>>>> is our 'natural' release schedule, then so be it." But I think > > we > > > >>>>> can > > > >>>>> do > > > >>>>>>> better. > > > >>>>>>> > > > >>>>>>> Broadly speaking, we have two constituencies with Cassandra > > > releases: > > > >>>>>>> > > > >>>>>>> First, we have the users who are building or porting an > > application > > > >>>>> on > > > >>>>>>> Cassandra. These users want the newest features to make their > > job > > > >>>>>> easier. > > > >>>>>>> If 2.1.0 has a few bugs, it's not the end of the world. They > > have > > > >>>>> time > > > >>>>>> to > > > >>>>>>> wait for 2.1.x to stabilize while they write their code. They > > > would > > > >>>>> like > > > >>>>>>> to see us deliver on our six month schedule or even faster. > > > >>>>>>> > > > >>>>>>> Second, we have the users who have an application in > production. > > > >>>>> These > > > >>>>>>> users, or their bosses, want Cassandra to be as stable as > > possible. > > > >>>>>>> Assuming they deploy on a stable release like 2.0.12, they > don't > > > want > > > >>>>> to > > > >>>>>>> touch it. They would like to see us release *less* often. > > > (Because > > > >>>>> that > > > >>>>>>> means they have to do less upgrades while remaining in our > > > backwards > > > >>>>>>> compatibility window.) > > > >>>>>>> > > > >>>>>>> With our current "big release every X months" model, these > users' > > > >>>>> needs > > > >>>>>> are > > > >>>>>>> in tension. > > > >>>>>>> > > > >>>>>>> We discussed this six months ago, and ended up with this: > > > >>>>>>> > > > >>>>>>> What if we tried a [four month] release cycle, BUT we would > > > guarantee > > > >>>>>> that > > > >>>>>>>> you could do a rolling upgrade until we bump the supermajor > > > version? > > > >>>>> So > > > >>>>>> 2.0 > > > >>>>>>>> could upgrade to 3.0 without having to go through 2.1. (But > to > > go > > > >>>>> to > > > >>>>>> 3.1 > > > >>>>>>>> or 4.0 you would have to go through 3.0.) > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> Crucially, I added > > > >>>>>>> > > > >>>>>>> Whether this is reasonable depends on how fast we can stabilize > > > >>>>> releases. > > > >>>>>>>> 2.1.0 will be a good test of this. > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> Unfortunately, even after DataStax hired half a dozen full-time > > > test > > > >>>>>>> engineers, 2.1.0 continued the proud tradition of being unready > > for > > > >>>>>>> production use, with "wait for .5 before upgrading" once again > > > >>>>> looking > > > >>>>>> like > > > >>>>>>> a good guideline. > > > >>>>>>> > > > >>>>>>> I’m starting to think that the entire model of “write a bunch > of > > > new > > > >>>>>>> features all at once and then try to stabilize it for release” > is > > > >>>>> broken. > > > >>>>>>> We’ve been trying that for years and empirically speaking the > > > >>>>> evidence > > > >>>>> is > > > >>>>>>> that it just doesn’t work, either from a stability standpoint > or > > > even > > > >>>>>> just > > > >>>>>>> shipping on time. > > > >>>>>>> > > > >>>>>>> A big reason that it takes us so long to stabilize new releases > > now > > > >>>>> is > > > >>>>>>> that, because our major release cycle is so long, it’s super > > > tempting > > > >>>>> to > > > >>>>>>> slip in “just one” new feature into bugfix releases, and I’m as > > > >>>>> guilty > > > >>>>> of > > > >>>>>>> that as anyone. > > > >>>>>>> > > > >>>>>>> For similar reasons, it’s difficult to do a meaningful freeze > > with > > > >>>>> big > > > >>>>>>> feature releases. A look at 3.0 shows why: we have 8099 > coming, > > > but > > > >>>>> we > > > >>>>>>> also have significant work done (but not finished) on 6230, > 7970, > > > >>>>> 6696, > > > >>>>>> and > > > >>>>>>> 6477, all of which are meaningful improvements that address > > > >>>>> demonstrated > > > >>>>>>> user pain. So if we keep doing what we’ve been doing, our > > choices > > > >>>>> are > > > >>>>> to > > > >>>>>>> either delay 3.0 further while we finish and stabilize these, > or > > we > > > >>>>> wait > > > >>>>>>> nine months to a year for the next release. Either way, one of > > our > > > >>>>>>> constituencies gets disappointed. > > > >>>>>>> > > > >>>>>>> So, I’d like to try something different. I think we were on > the > > > >>>>> right > > > >>>>>>> track with shorter releases with more compatibility. But I’d > > like > > > to > > > >>>>>> throw > > > >>>>>>> in a twist. Intel cuts down on risk with a “tick-tock” > schedule > > > for > > > >>>>> new > > > >>>>>>> architectures and process shrinks instead of trying to do both > at > > > >>>>> once. > > > >>>>>> We > > > >>>>>>> can do something similar here: > > > >>>>>>> > > > >>>>>>> One month releases. Period. If it’s not done, it can wait. > > > >>>>>>> *Every other release only accepts bug fixes.* > > > >>>>>>> > > > >>>>>>> By itself, one-month releases are going to dramatically reduce > > the > > > >>>>>>> complexity of testing and debugging new releases -- and bugs > that > > > do > > > >>>>> slip > > > >>>>>>> past us will only affect a smaller percentage of users, > avoiding > > > the > > > >>>>> “big > > > >>>>>>> release has a bunch of bugs no one has seen before and pretty > > much > > > >>>>>> everyone > > > >>>>>>> is hit by something” scenario. But by adding in the second > > rule, I > > > >>>>> think > > > >>>>>>> we have a real chance to make a quantum leap here: stable, > > > >>>>>> production-ready > > > >>>>>>> releases every two months. > > > >>>>>>> > > > >>>>>>> So here is my proposal for 3.0: > > > >>>>>>> > > > >>>>>>> We’re just about ready to start serious review of 8099. When > > > that’s > > > >>>>>> done, > > > >>>>>>> we branch 3.0 and cut a beta and then release candidates. > > Whatever > > > >>>>> isn’t > > > >>>>>>> done by then, has to wait; unlike prior betas, we will only > > accept > > > >>>>> bug > > > >>>>>>> fixes into 3.0 after branching. > > > >>>>>>> > > > >>>>>>> One month after 3.0, we will ship 3.1 (with new features). At > > the > > > >>>>> same > > > >>>>>>> time, we will branch 3.2. New features in trunk will go into > > 3.3. > > > >>>>> The > > > >>>>>> 3.2 > > > >>>>>>> branch will only get bug fixes. We will maintain backwards > > > >>>>> compatibility > > > >>>>>>> for all of 3.x; eventually (no less than a year) we will pick a > > > >>>>> release > > > >>>>>> to > > > >>>>>>> be 4.0, and drop deprecated features and old backwards > > > >>>>> compatibilities. > > > >>>>>>> Otherwise there will be nothing special about the 4.0 > > designation. > > > >>>>> (Note > > > >>>>>>> that with an “odd releases have new features, even releases > only > > > have > > > >>>>> bug > > > >>>>>>> fixes” policy, 4.0 will actually be *more* stable than 3.11.) > > > >>>>>>> > > > >>>>>>> Larger features can continue to be developed in separate > > branches, > > > >>>>> the > > > >>>>>> way > > > >>>>>>> 8099 is being worked on today, and committed to trunk when > ready. > > > So > > > >>>>>> this > > > >>>>>>> is not saying that we are limited only to features we can build > > in > > > a > > > >>>>>> single > > > >>>>>>> month. > > > >>>>>>> > > > >>>>>>> Some things will have to change with our dev process, for the > > > better. > > > >>>>> In > > > >>>>>>> particular, with one month to commit new features, we don’t > have > > > room > > > >>>>> for > > > >>>>>>> committing sloppy work and stabilizing it later. Trunk has to > be > > > >>>>> stable > > > >>>>>> at > > > >>>>>>> all times. I asked Ariel Weisberg to put together his thoughts > > > >>>>>> separately > > > >>>>>>> on what worked for his team at VoltDB, and how we can apply > that > > to > > > >>>>>>> Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX > >. > > > >>>>> (TLDR: > > > >>>>>>> Redefine “done” to include automated tests. Infrastructure to > > run > > > >>>>> tests > > > >>>>>>> against github branches before merging to trunk. A new test > > > harness > > > >>>>> for > > > >>>>>>> long-running regression tests.) > > > >>>>>>> > > > >>>>>>> I’m optimistic that as we improve our process this way, our > even > > > >>>>> releases > > > >>>>>>> will become increasingly stable. If so, we can skip sub-minor > > > >>>>> releases > > > >>>>>>> (3.2.x) entirely, and focus on keeping the release train > moving. > > > In > > > >>>>> the > > > >>>>>>> meantime, we will continue delivering 2.1.x stability releases. > > > >>>>>>> > > > >>>>>>> This won’t be an entirely smooth transition. In particular, > you > > > will > > > >>>>>> have > > > >>>>>>> noticed that 3.1 will get more than a month’s worth of new > > features > > > >>>>> while > > > >>>>>>> we stabilize 3.0 as the last of the old way of doing things, so > > > some > > > >>>>>>> patience is in order as we try this out. By 3.4 and 3.6 later > > this > > > >>>>> year > > > >>>>>> we > > > >>>>>>> should have a good idea if this is working, and we can make > > > >>>>> adjustments > > > >>>>>> as > > > >>>>>>> warranted. > > > >>>>>>> > > > >>>>>>> -- > > > >>>>>>> Jonathan Ellis > > > >>>>>>> Project Chair, Apache Cassandra > > > >>>>>>> co-founder, http://www.datastax.com > > > >>>>>>> @spyced > > > >>>>> > > > >>>> > > > >>> > > > > > > > > > > > > > > > > -- > Thanks, > Phil Yang >