Re: Proposal - 3.5.1

kurt Greaves Thu, 15 Sep 2016 11:21:48 -0700

Just another user perspective from someone who manages many clusters:

Tick-tock doesn't really make sense unless at some point you stop ticking.
You can't expect to release features constantly and have a stable product
every tock. Not unless you have really high development, code review, and
testing standards. Even then you're probably dreaming, as it's impossible
for any one person to have enough knowledge to account for every possible
implication of a feature (or bug fix for that matter).


It seriously doesn't work when people are only running tock releases
because that's the obvious choice. In reality the majority of bug fixes in
the latest tock release will not be bug fixes for the latest feature
release. That is, 3.7 is likely to have more bug fixes from 3.4/5 features
than 3.6, as hardly anyone would be running 3.6. And the 3.6 features may
well have introduced bugs that don't get picked up until 3.7 is well used
in production.

I'll throw some ideas in that may warrant discussion so I'm not just some
negative nancy:
tick-tock-tock releases. How you version these would be questionable, I
suppose to minimise confusion you could just jump even numbers for the
second tock (e.g 3.4 -> 3.5 -> 3.7). It kind of keeps the same release
cycle but at least users could be more confident in the latest tock
release. However it still suffers from the flaw that 3.5 is unlikely to
have bug fixes relevant to 3.4 features.

Another alternative is to eventually stop ticking altogether, that is you
may tick up until 3.6 but then only tock from then on, pushing all further
features to the next major release. In this case after 6 months the
features would stop and you'd have 6 months of only bugfixes. Users looking
for stability can wait for 3.7 onwards, whilst risk takers and people in
development phases can start on earlier releases. At most people would be
waiting 7-8 months from the last version of the previous major before
upgrading, with a year in total to get any new features (e.g 3.7 -> 4.7).

As it stands at the moment I typically wouldn't recommend 3.x for critical
production loads, and I've been telling people they're better off waiting
for 3.9 or later assuming the number of new features introduced slows down.

Also, in my opinion if you start going back to minor releases and
backporting patches, you've defeated the purpose of tick-tock entirely and
you may as well do away with it. I think you're better off marking those
releases as unstable and pushing out a new "stable" release as fast as
possible. This is obviously further complicated that the next logical
release may be a tick, which has the risk of introducing new bugs, in which
case I'd say if a critical fix is necessary you skip the tick.

Just my 1c.

On 15 September 2016 at 15:18, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> Where did we come from?
>
> We came from a place where we would say, "You probably do not want to run
> 2.0.X until it reaches 2.0.6"
>
> One thing about Cassandra is we get into a situation where we can only go
> forward. For example, when you update from version X to version Y, version
> Y might start writing a new versions of sstables.
>
> X - sstables-v1
> Y - sstables-v2
>
> This is very scary operations side because you can not bring the the system
> back to running version X as Y data is unreadable.
>
> Where are we at now?
>
> We now seem to be in a place where you say "Problem in 3.5 (trunk at a
> given day)?,  go to 3.9 (trunk at last tt- release) "
>
> http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
>
> "To get there, we are investing significant effort in making trunk “always
> releasable,” with the goal that each release, or at least each odd-numbered
> bugfix release, should be usable in production. "
>
> I support releasable trunk, but the qualifying statement "or at least each
> odd number release" undoes the assertion of "always releasable". Not trying
> to nit pick here. I realize it may be hard to get to the desired state of
> releasable trunk in a short time.
>
> Anecdotally I notice a lot of "movement" in class names/names of functions.
> Generally, I can look at a stack trace of a piece of software and I can
> bring up the line number in github and it is dead on, or fairly close to
> the line of code. Recently I have tried this in versions fairly close
> together and seen some drastic changes.
>
> We know some things i personally do not like:
> 1) lack of stable-ish api's in the codebase
> 2) use of singletons rather than simple dependency injection (like even
> constructor based injection)
>
> IMHO these do not fit well with 'release often' and always produce 'high
> quality release'.
>
> I do not love the concept of 'bug fix release' I would not mind waiting
> longer for a feature as long as I could have a high trust factor in in
> working right the first time.
>
> Take a feature like trickle_fs, By the description it sounds like a clear
> optimization win. It is off by default. The description says "turn on for
> ssd" but elsewhere in the configuration # disk_optimization_strategy: ssd.
> Are we tuning for ssd by default or not?
>
> By being false, it is not tested in wild, how is it covered and trusted
> during tests, how many tests have it off vs on?
>
> I think the concept that trickle_fs can be added as a feature, set false
> and possibly gains real world coverage is not comforting to me. I do not
> want to turn it on and get some weird issue because no one else is running
> this. I would rather it be added on by default with extreme confidence or
> not added at all.
>
>
>
> On Thu, Sep 15, 2016 at 1:37 AM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
> > In this particular case, I'd say adding a bug fix release for every
> version
> > that's affected would be the right thing.  The issue is so easily
> > reproducible and will likely result in massive data loss for anyone on
> 3.X
> > WHERE X < 6 and uses the "date" type.
> >
> > This is how easy it is to reproduce:
> >
> > 1. Start Cassandra 3.5
> > 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> > 'replication_factor': 1};
> > 3. use test;
> > 4. create table fail (id int primary key, d date);
> > 5. delete d from fail where id = 1;
> > 6. Stop Cassandra
> > 7. Start Cassandra
> >
> > You will get this, and startup will fail:
> >
> > ERROR 05:32:09 Exiting due to error while processing commit log during
> > initialization.
> > org.apache.cassandra.db.commitlog.CommitLogReplayer$
> > CommitLogReplayException:
> > Unexpected error deserializing mutation; saved to
> > /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4r0000gn/T/
> > mutation6313332720566971713dat.
> > This may be caused by replaying a mutation against a table with the same
> > name but incompatible schema.  Exception follows:
> > org.apache.cassandra.serializers.MarshalException: Expected 4 byte long
> > for
> > date (0)
> >
> > I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5 (and
> > probably the other releases) and requires very little investment from
> > anyone.
> >
> >
> > On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa <jeff.ji...@crowdstrike.com>
> > wrote:
> >
> > > We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
> > but
> > > we certainly didn’t/won’t go back and cut new releases from every
> branch
> > > for every critical bug in future releases, so I think we need to draw
> the
> > > line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it seems like
> > > you’ve got options (either stay on the tick and go up to 3.7, or bail
> > down
> > > to 3.0.x)
> > >
> > > Perhaps, though, this highlights the fact that tick/tock may not be the
> > > best option long term. We’ve tried it for a year, perhaps we should
> > instead
> > > discuss whether or not it should continue, or if there’s another
> process
> > > that gives us a better way to get useful patches into versions people
> are
> > > willing to run in production.
> > >
> > >
> > >
> > > On 9/14/16, 8:55 PM, "Jonathan Haddad" <j...@jonhaddad.com> wrote:
> > >
> > > >Common sense is what prevents someone from upgrading to yet another
> > > >completely unknown version with new features which have probably
> broken
> > > >even more stuff that nobody is aware of.  The folks I'm helping right
> > > >deployed 3.5 when they got started because
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__
> > cassandra.apache.org&d=DQIBaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kq
> > hAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=
> > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=pLP3udocOcAG6k_
> > sAb9p8tcAhtOhpFm6JB7owGhPQEs&e=
> > > suggests
> > > >it's acceptable for production.  It turns out using 4 of the built in
> > > >datatypes of the database result in the server being unable to restart
> > > >without clearing out the commit logs and running a repair.  That
> screams
> > > >critical to me.  You shouldn't even be able to install 3.5 without the
> > > >patch I've supplied - that bug is a ticking time bomb for anyone that
> > > >installs it.
> > > >
> > > >On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler <
> mich...@pbandjelly.org>
> > > >wrote:
> > > >
> > > >> What's preventing the use of the 3.6 or 3.7 releases where this bug
> is
> > > >> already fixed? This is also fixed in the 3.0.6/7/8 releases.
> > > >>
> > > >> Michael
> > > >>
> > > >> On 09/14/2016 08:30 PM, Jonathan Haddad wrote:
> > > >> > Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not back
> > > ported to
> > > >> > 3.5 as well, and it makes Cassandra effectively unusable if
> someone
> > is
> > > >> > using any of the 4 types affected in any of their schema.
> > > >> >
> > > >> > I have cherry picked & merged the patch back to here and will put
> it
> > > in a
> > > >> > JIRA as well tonight, I just wanted to get the ball rolling asap
> on
> > > this.
> > > >> >
> > > >> >
> > > >>
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > com_rustyrazorblade_cassandra_tree_fix-5Fcommitlog-
> 5Fexception&d=DQIBaQ&c=
> > 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=
> > yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=
> > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=ktY5tkT-
> > nO1jtyc0EicbgZHXJYl03DvzuxqzyyOgzII&e=
> > > >> >
> > > >> > Jon
> > > >> >
> > > >>
> > > >>
> > >
> >
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

Re: Proposal - 3.5.1

Reply via email to