I agree that regular (monthly) releases, and smaller, more frequent feature releases are the best part of tick/tock. The downside of tick/tock, as mentioned above, is that there isn't enough time for user feedback and testing to catch new bugs before the next feature release.
I would personally like to see a hybrid. The proposal that Jon mentions of doing a new feature release every three months plus 6 months of bugfixes for any release seems like like a good balance to me. On Thu, Sep 15, 2016 at 1:59 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > I don't think it's binary - we don't have to do year long insanity or > bleeding edge crazyness. > > How about a release every 3 months, with each release accepting 6 months of > patches? (oldstable & newstable) Also provide nightly builds & stick to > the idea of stable trunk. > > The issue is the number of bug fixes a given release gets. 1 bug fix > release for a new feature is just terrible. The community as a whole > despises this system and is lowering confidence in the project. > > Jon > > > On Thu, Sep 15, 2016 at 11:48 AM Jake Luciani <jak...@gmail.com> wrote: > > > I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to > > change. > > > > The problem for me is going back to the old way doesn't sound great. > There > > are parts of tick-tock I really like, > > for example, the cadence and limited scope per release. > > > > I know at the summit there were a lot of ideas thrown around I can > > regurgitate but perhaps people > > who have been thinking about this would like to chime in and present > ideas? > > > > -Jake > > > > On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith < > > bened...@apache.org > > > wrote: > > > > > I agree tick-tock is a failure. But for two reasons IMO: > > > > > > 1) Ultimately, the users are the real testers and it takes a while for > a > > > release to percolate into the wild for feedback. The reality is that a > > > release doesn't have its tires properly kicked for at least three > months > > > after it's cut. So if we are to have any tocks, they should be > > completely > > > unwed from the ticks, and should probably happen on a ~3M cadence to > keep > > > the labour down but the utility up (and there should probably still be > > more > > > than one tock per tick) > > > > > > 2) Those promised resources to improved process never happened. We > > haven't > > > even reached parity with the 2.1 release until very recently, i.e. no > > > failing u/dtests. > > > > > > > > > On 15 September 2016 at 19:08, Jeff Jirsa <jeff.ji...@crowdstrike.com> > > > wrote: > > > > > > > I know we’ve got a lot of folks following the dev list without a lot > of > > > > background, so let’s make sure we get some context here so everyone > can > > > be > > > > on the same page. > > > > > > > > Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and > > > 3.3.1, > > > > etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first > > before > > > > the RE manpower is spent on backporting fixes, even critical fixes, > > > because > > > > 3.9 has multiple critical fixes for people running 3.7). > > > > > > > > Now some background: > > > > > > > > For many years, Cassandra used to have a dev process that kept 3 > active > > > > branches - “bleeding edge”, a “stable”, and an “old stable” branch, > > where > > > > developers would be committing ALL new contributions to the bleeding > > > edge, > > > > non-api-breaking changes to stable, and bugfixes only to old stable. > > > While > > > > the api changed and major features were added, that bleeding edge > would > > > > just be ‘trunk’, and it’d get cut into a major version when it was > > ready > > > to > > > > ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 / > > 1.2, > > > > and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got > released > > > as > > > > a major x.y.0, the third, oldest, most stable branch went EOL, and > new > > > > features would go into trunk for the next major version. > > > > > > > > There were two big negatives observed with this: > > > > > > > > The first big negative is that if multiple major new features were in > > > > flight, releases were prone to delay. Nobody wants to break an API > on a > > > > x.y.1 release, and nobody wants to add a new feature to a x.y.2 > > release, > > > so > > > > the project would delay the x.y releases if major features were > close, > > > and > > > > then there’d be pressure to slip them in before they were fully > tested, > > > or > > > > cut features to avoid delaying the release. This pressure was > observed > > to > > > > be bad for the project – it forced technical compromises. > > > > > > > > The second downside that was observed was that nobody would try to > run > > > the > > > > new versions when they launched, because they were buggy because they > > > were > > > > filled with new features. 2.2, for example, introduced RBAC, > commitlog > > > > compression, and user defined functions – major features that needed > to > > > be > > > > tested. Unfortunately, because there were few real-world testers, > there > > > > were still major bugs being found for months – the first > > production-ready > > > > version of 2.2 is probably in the 2.2.5 or 2.2.6 range. > > > > > > > > For version 3, we moved to an alternate release, modeled on Intel’s > > > > tick/tock https://en.wikipedia.org/wiki/Tick-Tock_model > > > > > > > > The intention was to allow new features into 3.even releases (3.0, > 3.2, > > > > 3.4, 3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The > > hope > > > > was to allow more frequent releases to address the first big negative > > > > (flood of new features that blocked releases), while also helping to > > > > address the second – with fewer major features in a release, they > > better > > > > get more/better test coverage. > > > > > > > > In the tick/tock model, anyone running 3.odd (like 3.5) should be > > looking > > > > for bugfixes in 3.7. It’s certainly true that 3.5 is horribly broken > > (as > > > is > > > > 3.3, and 3.4, etc), but with this release model, the bugfix SHOULD BE > > in > > > > 3.7. As I mentioned previously, we have precedent for backporting > > > critical > > > > fixes, but we don’t have a well defined bar (that I see) for what’s > > > > critical enough for a backport. > > > > > > > > Jon is noting (and what many of us who run Cassandra in production > have > > > > really known for a very long time) is that nobody wants to run > 3.newest > > > > (even or odd), because 3.newest is likely broken (because it’s a > > complex > > > > distributed database, and testing is hard, and it takes time and > > complex > > > > workloads to find bugs). In the tick/tock model, because new features > > > went > > > > into 3.6, there are new features that may not be adequately > > > > tested/validated in 3.7 a user of 3.5 doesn’t want, and isn’t willing > > to > > > > accept the risk. > > > > > > > > The bottom line here is that tick/tock is probably a well intentioned > > but > > > > failed attempt to bring stability to Cassandra’s releases. The > problems > > > > tick/tock was meant to solve are real problems, but tick/tock doesn’t > > > seem > > > > to be addressing them – new features invalidate old testing, which > > makes > > > it > > > > difficult/impossible for real users to sit on the 3.odd versions. > > > > > > > > We’re due for cutting 3.9 and 3.0.9, and we have limited RE manpower > to > > > > get those out. Only after those are out would I be +1 on a 3.5.1, and > > > then > > > > only because if I were running 3.5, and I hit this bug, I wouldn’t > want > > > to > > > > spend the ~$100k it would cost my organization to validate 3.7 prior > to > > > > upgrading, and I don’t think it’s reasonable to ask users to > recompile > > a > > > > release for a ~10 line fix for a very nasty bug. > > > > > > > > I’m also very strongly recommend we (committers/PMC) reconsider > > tick/tock > > > > for 4.x releases, because this is exactly the type of problem that > will > > > > continue to happen as we move forward. I suggest that we either need > to > > > go > > > > back to the old model and do a better job of dealing with feature > creep > > > and > > > > testing, or we need to better define what gets backported, because > the > > > > community needs a stable version to run, and running latest odd > release > > > of > > > > tick/tock isn’t it. > > > > > > > > - Jeff > > > > > > > > > > > > On 9/15/16, 10:31 AM, "dave_les...@apple.com on behalf of Dave > > Lester" < > > > > dave_les...@apple.com> wrote: > > > > > > > > >How would cutting a 3.5.1 release possibly confuse users of the > > > software? > > > > It would be easy to document the change and to send release notes. > > > > > > > > > >Given the bug’s critical nature and that it's a minor fix, I’m +1 > > > > (non-binding) to a new release. > > > > > > > > > >Dave > > > > > > > > > >> On Sep 15, 2016, at 7:18 AM, Jeremiah D Jordan < > https://urldefense. > > > > > > proofpoint.com/v2/url?u=http-3A__jeremiah.jordan-40gmail.com&d=DQIFaQ&c= > > > > 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r= > > > > yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m= > > > > srNzKwrs8hKPoJMZ4Ao18CYaMYKnbWaCHou6ui5tqdM&s=iM_ > > > > LKKIhaiC0w6uz3lhK1lob4gJbKhLPqGNfPPLye6w&e= > wrote: > > > > >> > > > > >> I’m with Jeff on this, 3.7 (bug fixes on 3.6) has already been > > > released > > > > with the fix. Since the fix applies cleanly anyone is free to put it > > on > > > > top of 3.5 on their own if they like, but I see no reason to put out > a > > > > 3.5.1 right now and confuse people further. > > > > >> > > > > >> -Jeremiah > > > > >> > > > > >> > > > > >>> On Sep 15, 2016, at 9:07 AM, Jonathan Haddad <j...@jonhaddad.com> > > > > wrote: > > > > >>> > > > > >>> As I follow up, I suppose I'm only advocating for a fix to the > odd > > > > >>> releases. Sadly, Tick Tock versioning is misleading. > > > > >>> > > > > >>> If tick tock were to continue (and I'm very much against how it > > > > currently > > > > >>> works) the whole even-features odd-fixes thing needs to stop > ASAP, > > > all > > > > it > > > > >>> does it confuse people. > > > > >>> > > > > >>> The follow up to 3.4 (3.5) should have been 3.4.1, following > > semver, > > > so > > > > >>> people know it's bug fixes only to 3.4. > > > > >>> > > > > >>> Jon > > > > >>> > > > > >>> On Wed, Sep 14, 2016 at 10:37 PM Jonathan Haddad < > > j...@jonhaddad.com> > > > > wrote: > > > > >>> > > > > >>>> In this particular case, I'd say adding a bug fix release for > > every > > > > >>>> version that's affected would be the right thing. The issue is > so > > > > easily > > > > >>>> reproducible and will likely result in massive data loss for > > anyone > > > > on 3.X > > > > >>>> WHERE X < 6 and uses the "date" type. > > > > >>>> > > > > >>>> This is how easy it is to reproduce: > > > > >>>> > > > > >>>> 1. Start Cassandra 3.5 > > > > >>>> 2. create KEYSPACE test WITH replication = {'class': > > > 'SimpleStrategy', > > > > >>>> 'replication_factor': 1}; > > > > >>>> 3. use test; > > > > >>>> 4. create table fail (id int primary key, d date); > > > > >>>> 5. delete d from fail where id = 1; > > > > >>>> 6. Stop Cassandra > > > > >>>> 7. Start Cassandra > > > > >>>> > > > > >>>> You will get this, and startup will fail: > > > > >>>> > > > > >>>> ERROR 05:32:09 Exiting due to error while processing commit log > > > during > > > > >>>> initialization. > > > > >>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$ > > > > CommitLogReplayException: > > > > >>>> Unexpected error deserializing mutation; saved to > > > > >>>> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4r0000gn/T/ > > > > mutation6313332720566971713dat. > > > > >>>> This may be caused by replaying a mutation against a table with > > the > > > > same > > > > >>>> name but incompatible schema. Exception follows: > > > > >>>> org.apache.cassandra.serializers.MarshalException: Expected 4 > byte > > > > long for > > > > >>>> date (0) > > > > >>>> > > > > >>>> I mean.. come on. It's an easy fix. It cleanly merges against > > 3.5 > > > > (and > > > > >>>> probably the other releases) and requires very little investment > > > from > > > > >>>> anyone. > > > > >>>> > > > > >>>> > > > > >>>> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa < > > > > jeff.ji...@crowdstrike.com> > > > > >>>> wrote: > > > > >>>> > > > > >>>>> We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency > > > > fixes, > > > > >>>>> but we certainly didn’t/won’t go back and cut new releases from > > > every > > > > >>>>> branch for every critical bug in future releases, so I think we > > > need > > > > to > > > > >>>>> draw the line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= > 6), > > > it > > > > seems > > > > >>>>> like you’ve got options (either stay on the tick and go up to > > 3.7, > > > > or bail > > > > >>>>> down to 3.0.x) > > > > >>>>> > > > > >>>>> Perhaps, though, this highlights the fact that tick/tock may > not > > be > > > > the > > > > >>>>> best option long term. We’ve tried it for a year, perhaps we > > should > > > > instead > > > > >>>>> discuss whether or not it should continue, or if there’s > another > > > > process > > > > >>>>> that gives us a better way to get useful patches into versions > > > > people are > > > > >>>>> willing to run in production. > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> On 9/14/16, 8:55 PM, "Jonathan Haddad" <j...@jonhaddad.com> > > wrote: > > > > >>>>> > > > > >>>>>> Common sense is what prevents someone from upgrading to yet > > > another > > > > >>>>>> completely unknown version with new features which have > probably > > > > broken > > > > >>>>>> even more stuff that nobody is aware of. The folks I'm > helping > > > > right > > > > >>>>>> deployed 3.5 when they got started because > > > > >>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__ > > > > cassandra.apache.org&d=DQIBaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kq > > > > hAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m= > > > > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=pLP3udocOcAG6k_ > > > > sAb9p8tcAhtOhpFm6JB7owGhPQEs&e= > > > > >>>>> suggests > > > > >>>>>> it's acceptable for production. It turns out using 4 of the > > built > > > > in > > > > >>>>>> datatypes of the database result in the server being unable to > > > > restart > > > > >>>>>> without clearing out the commit logs and running a repair. > That > > > > screams > > > > >>>>>> critical to me. You shouldn't even be able to install 3.5 > > without > > > > the > > > > >>>>>> patch I've supplied - that bug is a ticking time bomb for > anyone > > > > that > > > > >>>>>> installs it. > > > > >>>>>> > > > > >>>>>> On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler < > > > > mich...@pbandjelly.org> > > > > >>>>>> wrote: > > > > >>>>>> > > > > >>>>>>> What's preventing the use of the 3.6 or 3.7 releases where > this > > > > bug is > > > > >>>>>>> already fixed? This is also fixed in the 3.0.6/7/8 releases. > > > > >>>>>>> > > > > >>>>>>> Michael > > > > >>>>>>> > > > > >>>>>>> On 09/14/2016 08:30 PM, Jonathan Haddad wrote: > > > > >>>>>>>> Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not > > back > > > > >>>>> ported to > > > > >>>>>>>> 3.5 as well, and it makes Cassandra effectively unusable if > > > > someone > > > > >>>>> is > > > > >>>>>>>> using any of the 4 types affected in any of their schema. > > > > >>>>>>>> > > > > >>>>>>>> I have cherry picked & merged the patch back to here and > will > > > put > > > > it > > > > >>>>> in a > > > > >>>>>>>> JIRA as well tonight, I just wanted to get the ball rolling > > asap > > > > on > > > > >>>>> this. > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github. > > > > com_rustyrazorblade_cassandra_tree_fix-5Fcommitlog- > > > 5Fexception&d=DQIBaQ&c= > > > > 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r= > > > > yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m= > > > > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=ktY5tkT- > > > > nO1jtyc0EicbgZHXJYl03DvzuxqzyyOgzII&e= > > > > >>>>>>>> > > > > >>>>>>>> Jon > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>> > > > > >>>> > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > http://twitter.com/tjake > > > -- Tyler Hobbs DataStax <http://datastax.com/>