Failing tests 2016-09-15

2016-09-15 Thread Joel Knighton
cassandra-3.9: No new runs


trunk
===
testall: 6 failures
  org.apache.cassandra.cql3.KeyCacheCqlTest
  .test2iKeyCachePathsShallowIndexEntry

  org.apache.cassandra.cql3.KeyCacheCqlTest
  .test2iKeyCachePathsShallowIndexEntry-compression
  CASSANDRA-12650 for the two failures above. New flaky failure.

  org.apache.cassandra.cql3.validation.entities.SecondaryIndexTest
  .testAllowFilteringOnPartitionKeyWithSecondaryIndex

  org.apache.cassandra.cql3.validation.entities.SecondaryIndexTest
  .testAllowFilteringOnPartitionKeyWithSecondaryIndex-compression
  CASSANDRA-12651 for the two failures above. New flaky failure.

  org.apache.cassandra.index.sasi.SASIIndexTest
  .testMultiExpressionQueriesWhereRowSplitBetweenSSTables
  Looks like an environmental problem where the forked JVM exited.
  I'm holding off on creating a JIRA for now.

  org.apache.cassandra.index.sasi.SASIIndexTest
  .testStaticIndex-compression
  CASSANDRA-12652. New flaky failure.

===
dtest: 4 failures
  cdc_test.TestCDC.test_cdc_data_available_in_cdc_raw
  CASSANDRA-11811. Known flaky failure.

  materialized_views_test.TestMaterializedViews
  .add_node_after_mv_test
  CASSANDRA-12140. Known flaky failure.

  materialized_views_test.TestMaterializedViews
  .really_complex_repair_test
  CASSANDRA-12475. Known flaky failure.

  snitch_test.TestGossipingPropertyFileSnitch
  .test_prefer_local_reconnect_on_listen_address
  A typo fix was committed to trunk without updating the test
  looking for the log message.

===
novnode: 6 failures
  paging_test.TestPagingData
  .test_paging_with_filtering_on_partition_key

  paging_test.TestPagingData
  .test_paging_with_filtering_on_partition_key_on_clustering_columns

  paging_test.TestPagingData

.test_paging_with_filtering_on_partition_key_on_clustering_columns_with_contains

  paging_test.TestPagingData
  .test_paging_with_filtering_on_partition_key_on_counter_columns
  Four new failures, bisect suggests they are due to
  CASSANDRA-11031. Only failed on novnode. I've asked Alex
  Petrov to take a look. No JIRA yet.

  snitch_test.TestGossipingPropertyFileSnitch
  .test_prefer_local_reconnect_on_listen_address
  Same as the vnode failure above.

  replication_test.SnitchConfigurationUpdateTest
  .test_rf_collapse_property_file_snitch
  New flaky failure. No JIRA created yet.

===
upgrade: All passed!


Re: Proposal - 3.5.1

2016-09-15 Thread Mick Semb Wever
Totally agree with all the frustrations felt by Jon here.


TL;DR
Here's a proposal for 4.0 and beyond: that is puts together the comments
from Benedict, Jon, Tyler, Jeremy, and Ed;

 - keep bimonthly feature releases,
 - revert from tick-tock to SemVer numbering scheme,
 - during the release vote also vote on the quality label (feature branches
start with a 'Alpha' and the first patch release as 'Beta'),
 - accept that every feature release isn't by default initially supported,
and its branch might never be,
 - maintain 3 'GA' branches at any one time,
 - accept that it's not going to be the oldest GA branches that necessarily
reach EOL first.


Background and rationale…

IMO the problem with Tick-Tock is that it introduces two separate concepts:
   - incremental development, and
   - limiting patch releases.

The first concept: having bimonthly tocks; made C* development more
incremental. A needed improvement.
No coincidence, at the same time as tick-tock was introduced, there was
also a lot of effort being put into testing and a QA framework.
>From this we've seen a lot of fantastic features incrementally added to C*!

The second concept: having bimonthly ticks; limited C* to having only one
patch release per tock release.
The only real benefit to this was to reduce the effort involved in
maintenance, required because of the more frequent tock releases.
The consequence is instability has gone bananas, as Jon clearly
demonstrates. Someone went and let the monkey out.

A quick comparison of before to tick-tock:

   * Before tick-tock: against 6-12 months of development it took a
time-frame of 3-6 months and 6+ patch releases to stabilise C*.

   * After tick-tock: against 2 months of development we could have
expected the same time-frame of 3-6 months (because adoption is dictated by
users, not developers) and *over* this period 1-2 patch releases to
stabilise. It seemed to have been a fools errand to force this to 1 patch
release after only one month. It seems that the notion of incremental
development was applied for the developers where-as the waterfall model was
applied to QA in production for the users. (note: all this is not taking
into account advantages of incremental development, an improved QA
framework, and a move towards a stable-master.)

The question remains to how many of these releases can the community afford
to support. And being realistic much of this effort relies upon the
commercial entities around the community. For example having 1 year of
support means having to support 6 feature releases, and there's probably
not the people power to do that. It also means that in effect any release
is actually only supported for 6-9 months, since it took 3-6 for it to get
to production-ready.

A typical Apache release process is that each new major release gets voted
on as only 'Alpha' or 'Beta'. As patch releases are made it is ascertained
whether enough people are using it (eg in production) and the quality label
appropriately raised to either 'Beta' or 'GA'.  The quality label can be
proposed in the vote or left to be voted upon by everyone. The quality
label is itself not part of the version number, so that the version number
can follow strict SemVer.

Then the community can say, for example, it supports 3 'GA' branches. This
permits some major releases to never make it to GA, and others to hang over
for a bit longer. It's something that the community gets a feel for by
appreciating the users and actors around it. The number of branches
supported depends on what the community can sustain (including the new
non-GA branches). The community also becomes a bit more honest about the
quality of x.y.0 releases.

The proposal is an example that embraces incremental development and the
release-often mentality, while keeping a realistic and flexible approach to
how many branches can be supported. The cost of supporting branches is
still very real, and pushing for a stable master means no feature branch is
cut without passing everything in the QA framework and 100% belief that it
can be put into a user's production. That is there's not a return to
thinking about feature branches as a place for ongoing stabilisation
efforts, just because they have a 'Alpha/Beta' label. The onus of work is
put upon the developer having to maintain branches for features targeted
for master, and not on the community having to stabilise and support
feature branches.

BTW has anyone figured out whether it's the tick or the tock that
represents the feature release??   I probably got it wrong here :-)


~mck


Re: [VOTE] Release Apache Cassandra 3.0.9

2016-09-15 Thread Sam Tunnicliffe
+1

On 15 Sep 2016 19:58, "Jake Luciani"  wrote:

> I propose the following artifacts for release as 3.0.9.
>
> sha1: d600f51ee1a3eb7b30ce3c409129567b70c22012
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.9-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/org/apache/cassandra/apache-cassandra/3.0.9/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: https://goo.gl/JKkE05 (CHANGES.txt)
> [2]: https://goo.gl/Hi8X71 (NEWS.txt)
>


Re: [VOTE] Release Apache Cassandra 3.0.9

2016-09-15 Thread Jason Brown
+1

On Thu, Sep 15, 2016 at 3:20 PM, Nate McCall  wrote:

> +1
>
> On Fri, Sep 16, 2016 at 6:57 AM, Jake Luciani  wrote:
>
> > I propose the following artifacts for release as 3.0.9.
> >
> > sha1: d600f51ee1a3eb7b30ce3c409129567b70c22012
> > Git:
> > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> > shortlog;h=refs/tags/3.0.9-tentative
> > Artifacts:
> > https://repository.apache.org/content/repositories/
> > orgapachecassandra-1124/org/apache/cassandra/apache-cassandra/3.0.9/
> > Staging repository:
> > https://repository.apache.org/content/repositories/
> > orgapachecassandra-1124/
> >
> > The artifacts as well as the debian package are also available here:
> > http://people.apache.org/~jake
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: https://goo.gl/JKkE05 (CHANGES.txt)
> > [2]: https://goo.gl/Hi8X71 (NEWS.txt)
> >
>
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: [VOTE] Release Apache Cassandra 3.0.9

2016-09-15 Thread Nate McCall
+1

On Fri, Sep 16, 2016 at 6:57 AM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.9.
>
> sha1: d600f51ee1a3eb7b30ce3c409129567b70c22012
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.9-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/org/apache/cassandra/apache-cassandra/3.0.9/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: https://goo.gl/JKkE05 (CHANGES.txt)
> [2]: https://goo.gl/Hi8X71 (NEWS.txt)
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Proposal - 3.5.1

2016-09-15 Thread Jonathan Haddad
If the releases can be tagged as alpha / beta so that people don't
accidentally put it in prod (or at least, will do so less), that would be
totally reasonable.

On Thu, Sep 15, 2016 at 12:27 PM Tyler Hobbs  wrote:

> On Thu, Sep 15, 2016 at 2:22 PM, Benedict Elliott Smith <
> bened...@apache.org
> > wrote:
>
> > Feature releases don't have to be on the same cadence as bug fixes.
> They're
> > naturally different beasts.
> >
>
> With the exception of critical bug fixes (which can warrant an immediate
> release), I think keeping a regular cadence makes us less likely to slip
> and fall behind on releases.
>
>
> >
> > Why not stick with monthly feature releases, but mark every third (or
> > sixth) as a supported release that gets quarterly updates for 2-3
> quarters?
> >
>
> That's also a good idea.
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Proposal - 3.5.1

2016-09-15 Thread Benedict Elliott Smith
Yes, agreed. I'm advocating a different cadence, not a random cadence.

On Thursday, 15 September 2016, Tyler Hobbs  wrote:

> On Thu, Sep 15, 2016 at 2:22 PM, Benedict Elliott Smith <
> bened...@apache.org 
> > wrote:
>
> > Feature releases don't have to be on the same cadence as bug fixes.
> They're
> > naturally different beasts.
> >
>
> With the exception of critical bug fixes (which can warrant an immediate
> release), I think keeping a regular cadence makes us less likely to slip
> and fall behind on releases.
>
>
> >
> > Why not stick with monthly feature releases, but mark every third (or
> > sixth) as a supported release that gets quarterly updates for 2-3
> quarters?
> >
>
> That's also a good idea.
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Proposal - 3.5.1

2016-09-15 Thread Tyler Hobbs
On Thu, Sep 15, 2016 at 2:22 PM, Benedict Elliott Smith  wrote:

> Feature releases don't have to be on the same cadence as bug fixes. They're
> naturally different beasts.
>

With the exception of critical bug fixes (which can warrant an immediate
release), I think keeping a regular cadence makes us less likely to slip
and fall behind on releases.


>
> Why not stick with monthly feature releases, but mark every third (or
> sixth) as a supported release that gets quarterly updates for 2-3 quarters?
>

That's also a good idea.

-- 
Tyler Hobbs
DataStax 


Re: Proposal - 3.5.1

2016-09-15 Thread Tyler Hobbs
I agree that regular (monthly) releases, and smaller, more frequent feature
releases are the best part of tick/tock.  The downside of tick/tock, as
mentioned above, is that there isn't enough time for user feedback and
testing to catch new bugs before the next feature release.

I would personally like to see a hybrid.  The proposal that Jon mentions of
doing a new feature release every three months plus 6 months of bugfixes
for any release seems like like a good balance to me.

On Thu, Sep 15, 2016 at 1:59 PM, Jonathan Haddad  wrote:

> I don't think it's binary - we don't have to do year long insanity or
> bleeding edge crazyness.
>
> How about a release every 3 months, with each release accepting 6 months of
> patches?  (oldstable & newstable)  Also provide nightly builds & stick to
> the idea of stable trunk.
>
> The issue is the number of bug fixes a given release gets.  1 bug fix
> release for a new feature is just terrible.  The community as a whole
> despises this system and is lowering confidence in the project.
>
> Jon
>
>
> On Thu, Sep 15, 2016 at 11:48 AM Jake Luciani  wrote:
>
> > I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to
> > change.
> >
> > The problem for me is going back to the old way doesn't sound great.
> There
> > are parts of tick-tock I really like,
> > for example, the cadence and limited scope per release.
> >
> > I know at the summit there were a lot of ideas thrown around I can
> > regurgitate but perhaps people
> > who have been thinking about this would like to chime in and present
> ideas?
> >
> > -Jake
> >
> > On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith <
> > bened...@apache.org
> > > wrote:
> >
> > > I agree tick-tock is a failure.  But for two reasons IMO:
> > >
> > > 1) Ultimately, the users are the real testers and it takes a while for
> a
> > > release to percolate into the wild for feedback.  The reality is that a
> > > release doesn't have its tires properly kicked for at least three
> months
> > > after it's cut.  So if we are to have any tocks, they should be
> > completely
> > > unwed from the ticks, and should probably happen on a ~3M cadence to
> keep
> > > the labour down but the utility up (and there should probably still be
> > more
> > > than one tock per tick)
> > >
> > > 2) Those promised resources to improved process never happened.  We
> > haven't
> > > even reached parity with the 2.1 release until very recently, i.e. no
> > > failing u/dtests.
> > >
> > >
> > > On 15 September 2016 at 19:08, Jeff Jirsa 
> > > wrote:
> > >
> > > > I know we’ve got a lot of folks following the dev list without a lot
> of
> > > > background, so let’s make sure we get some context here so everyone
> can
> > > be
> > > > on the same page.
> > > >
> > > > Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and
> > > 3.3.1,
> > > > etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first
> > before
> > > > the RE manpower is spent on backporting fixes, even critical fixes,
> > > because
> > > > 3.9 has multiple critical fixes for people running 3.7).
> > > >
> > > > Now some background:
> > > >
> > > > For many years, Cassandra used to have a dev process that kept 3
> active
> > > > branches - “bleeding edge”, a “stable”, and an “old stable” branch,
> > where
> > > > developers would be committing ALL new contributions to the bleeding
> > > edge,
> > > > non-api-breaking changes to stable, and bugfixes only to old stable.
> > > While
> > > > the api changed and major features were added, that bleeding edge
> would
> > > > just be ‘trunk’, and it’d get cut into a major version when it was
> > ready
> > > to
> > > > ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 /
> > 1.2,
> > > > and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got
> released
> > > as
> > > > a major x.y.0, the third, oldest, most stable branch went EOL, and
> new
> > > > features would go into trunk for the next major version.
> > > >
> > > > There were two big negatives observed with this:
> > > >
> > > > The first big negative is that if multiple major new features were in
> > > > flight, releases were prone to delay. Nobody wants to break an API
> on a
> > > > x.y.1 release, and nobody wants to add a new feature to a x.y.2
> > release,
> > > so
> > > > the project would delay the x.y releases if major features were
> close,
> > > and
> > > > then there’d be pressure to slip them in before they were fully
> tested,
> > > or
> > > > cut features to avoid delaying the release. This pressure was
> observed
> > to
> > > > be bad for the project – it forced technical compromises.
> > > >
> > > > The second downside that was observed was that nobody would try to
> run
> > > the
> > > > new versions when they launched, because they were buggy because they
> > > were
> > > > filled with new features. 2.2, for example, introduced RBAC,
> commitlog
> > > > 

Re: Proposal - 3.5.1

2016-09-15 Thread Jeremy Hanna
Right - I think like Jake and others have said, it seems appropriate to do 
something at this point.  Would a clearer, more liberal backport policy to the 
odd versions be worthwhile until we find our footing?  As Jeremiah said, it 
does seem like the big bang 3.0 release has caused much of the baggage that 
we’re facing.  Combine with that the slow uptake on any specific version so far 
at least partly because of the newness of the release model.

To me, the hard thing to me about 3 month releases is that then you get into 
the larger untested feature releases which is what it was originally supposed 
to get away from.

So in essence, would we
1) do nothing and see it through
2) have a more liberal backport policy in the 3.x line and revisit once we get 
to 4
3) do a tick-tock(-tock-tock) sort of model
4) do some sort of LTS
5) go back to the drawing board
6) go back to the old model

I think the earlier numbers imply some confidence in the thinking behind 
tick-tock.  Would 2 be acceptable to see the 3.x line through with the current 
release model?  Or do we need to do something more extensive at this stage?

> On Sep 15, 2016, at 1:59 PM, Jonathan Haddad  wrote:
> 
> I don't think it's binary - we don't have to do year long insanity or
> bleeding edge crazyness.
> 
> How about a release every 3 months, with each release accepting 6 months of
> patches?  (oldstable & newstable)  Also provide nightly builds & stick to
> the idea of stable trunk.
> 
> The issue is the number of bug fixes a given release gets.  1 bug fix
> release for a new feature is just terrible.  The community as a whole
> despises this system and is lowering confidence in the project.
> 
> Jon
> 
> 
> On Thu, Sep 15, 2016 at 11:48 AM Jake Luciani  wrote:
> 
>> I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to
>> change.
>> 
>> The problem for me is going back to the old way doesn't sound great. There
>> are parts of tick-tock I really like,
>> for example, the cadence and limited scope per release.
>> 
>> I know at the summit there were a lot of ideas thrown around I can
>> regurgitate but perhaps people
>> who have been thinking about this would like to chime in and present ideas?
>> 
>> -Jake
>> 
>> On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith <
>> bened...@apache.org
>>> wrote:
>> 
>>> I agree tick-tock is a failure.  But for two reasons IMO:
>>> 
>>> 1) Ultimately, the users are the real testers and it takes a while for a
>>> release to percolate into the wild for feedback.  The reality is that a
>>> release doesn't have its tires properly kicked for at least three months
>>> after it's cut.  So if we are to have any tocks, they should be
>> completely
>>> unwed from the ticks, and should probably happen on a ~3M cadence to keep
>>> the labour down but the utility up (and there should probably still be
>> more
>>> than one tock per tick)
>>> 
>>> 2) Those promised resources to improved process never happened.  We
>> haven't
>>> even reached parity with the 2.1 release until very recently, i.e. no
>>> failing u/dtests.
>>> 
>>> 
>>> On 15 September 2016 at 19:08, Jeff Jirsa 
>>> wrote:
>>> 
 I know we’ve got a lot of folks following the dev list without a lot of
 background, so let’s make sure we get some context here so everyone can
>>> be
 on the same page.
 
 Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and
>>> 3.3.1,
 etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first
>> before
 the RE manpower is spent on backporting fixes, even critical fixes,
>>> because
 3.9 has multiple critical fixes for people running 3.7).
 
 Now some background:
 
 For many years, Cassandra used to have a dev process that kept 3 active
 branches - “bleeding edge”, a “stable”, and an “old stable” branch,
>> where
 developers would be committing ALL new contributions to the bleeding
>>> edge,
 non-api-breaking changes to stable, and bugfixes only to old stable.
>>> While
 the api changed and major features were added, that bleeding edge would
 just be ‘trunk’, and it’d get cut into a major version when it was
>> ready
>>> to
 ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 /
>> 1.2,
 and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released
>>> as
 a major x.y.0, the third, oldest, most stable branch went EOL, and new
 features would go into trunk for the next major version.
 
 There were two big negatives observed with this:
 
 The first big negative is that if multiple major new features were in
 flight, releases were prone to delay. Nobody wants to break an API on a
 x.y.1 release, and nobody wants to add a new feature to a x.y.2
>> release,
>>> so
 the project would delay the x.y releases if major features were close,
>>> and
 then there’d be pressure to slip them in 

Re: [VOTE] Release Apache Cassandra 3.0.9

2016-09-15 Thread Tyler Hobbs
+1

On Thu, Sep 15, 2016 at 1:57 PM, Jake Luciani  wrote:

> I propose the following artifacts for release as 3.0.9.
>
> sha1: d600f51ee1a3eb7b30ce3c409129567b70c22012
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.9-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/org/apache/cassandra/apache-cassandra/3.0.9/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: https://goo.gl/JKkE05 (CHANGES.txt)
> [2]: https://goo.gl/Hi8X71 (NEWS.txt)
>



-- 
Tyler Hobbs
DataStax 


Re: [VOTE] Release Apache Cassandra 3.0.9

2016-09-15 Thread Aleksey Yeschenko
+1

-- 
AY

On 15 September 2016 at 11:58:24, Jake Luciani (j...@apache.org) wrote:

I propose the following artifacts for release as 3.0.9.  

sha1: d600f51ee1a3eb7b30ce3c409129567b70c22012  
Git:  
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.9-tentative
  
Artifacts:  
https://repository.apache.org/content/repositories/orgapachecassandra-1124/org/apache/cassandra/apache-cassandra/3.0.9/
  
Staging repository:  
https://repository.apache.org/content/repositories/orgapachecassandra-1124/  

The artifacts as well as the debian package are also available here:  
http://people.apache.org/~jake  

The vote will be open for 72 hours (longer if needed).  

[1]: https://goo.gl/JKkE05 (CHANGES.txt)  
[2]: https://goo.gl/Hi8X71 (NEWS.txt)  


Re: Proposal - 3.5.1

2016-09-15 Thread Jonathan Haddad
I don't think it's binary - we don't have to do year long insanity or
bleeding edge crazyness.

How about a release every 3 months, with each release accepting 6 months of
patches?  (oldstable & newstable)  Also provide nightly builds & stick to
the idea of stable trunk.

The issue is the number of bug fixes a given release gets.  1 bug fix
release for a new feature is just terrible.  The community as a whole
despises this system and is lowering confidence in the project.

Jon


On Thu, Sep 15, 2016 at 11:48 AM Jake Luciani  wrote:

> I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to
> change.
>
> The problem for me is going back to the old way doesn't sound great. There
> are parts of tick-tock I really like,
> for example, the cadence and limited scope per release.
>
> I know at the summit there were a lot of ideas thrown around I can
> regurgitate but perhaps people
> who have been thinking about this would like to chime in and present ideas?
>
> -Jake
>
> On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith <
> bened...@apache.org
> > wrote:
>
> > I agree tick-tock is a failure.  But for two reasons IMO:
> >
> > 1) Ultimately, the users are the real testers and it takes a while for a
> > release to percolate into the wild for feedback.  The reality is that a
> > release doesn't have its tires properly kicked for at least three months
> > after it's cut.  So if we are to have any tocks, they should be
> completely
> > unwed from the ticks, and should probably happen on a ~3M cadence to keep
> > the labour down but the utility up (and there should probably still be
> more
> > than one tock per tick)
> >
> > 2) Those promised resources to improved process never happened.  We
> haven't
> > even reached parity with the 2.1 release until very recently, i.e. no
> > failing u/dtests.
> >
> >
> > On 15 September 2016 at 19:08, Jeff Jirsa 
> > wrote:
> >
> > > I know we’ve got a lot of folks following the dev list without a lot of
> > > background, so let’s make sure we get some context here so everyone can
> > be
> > > on the same page.
> > >
> > > Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and
> > 3.3.1,
> > > etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first
> before
> > > the RE manpower is spent on backporting fixes, even critical fixes,
> > because
> > > 3.9 has multiple critical fixes for people running 3.7).
> > >
> > > Now some background:
> > >
> > > For many years, Cassandra used to have a dev process that kept 3 active
> > > branches - “bleeding edge”, a “stable”, and an “old stable” branch,
> where
> > > developers would be committing ALL new contributions to the bleeding
> > edge,
> > > non-api-breaking changes to stable, and bugfixes only to old stable.
> > While
> > > the api changed and major features were added, that bleeding edge would
> > > just be ‘trunk’, and it’d get cut into a major version when it was
> ready
> > to
> > > ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 /
> 1.2,
> > > and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released
> > as
> > > a major x.y.0, the third, oldest, most stable branch went EOL, and new
> > > features would go into trunk for the next major version.
> > >
> > > There were two big negatives observed with this:
> > >
> > > The first big negative is that if multiple major new features were in
> > > flight, releases were prone to delay. Nobody wants to break an API on a
> > > x.y.1 release, and nobody wants to add a new feature to a x.y.2
> release,
> > so
> > > the project would delay the x.y releases if major features were close,
> > and
> > > then there’d be pressure to slip them in before they were fully tested,
> > or
> > > cut features to avoid delaying the release. This pressure was observed
> to
> > > be bad for the project – it forced technical compromises.
> > >
> > > The second downside that was observed was that nobody would try to run
> > the
> > > new versions when they launched, because they were buggy because they
> > were
> > > filled with new features. 2.2, for example, introduced RBAC, commitlog
> > > compression, and user defined functions – major features that needed to
> > be
> > > tested. Unfortunately, because there were few real-world testers, there
> > > were still major bugs being found for months – the first
> production-ready
> > > version of 2.2 is probably in the 2.2.5 or 2.2.6 range.
> > >
> > > For version 3, we moved to an alternate release, modeled on Intel’s
> > > tick/tock https://en.wikipedia.org/wiki/Tick-Tock_model
> > >
> > > The intention was to allow new features into 3.even releases (3.0, 3.2,
> > > 3.4, 3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The
> hope
> > > was to allow more frequent releases to address the first big negative
> > > (flood of new features that blocked releases), while also helping to
> > > address the second – with fewer major 

[VOTE] Release Apache Cassandra 3.0.9

2016-09-15 Thread Jake Luciani
I propose the following artifacts for release as 3.0.9.

sha1: d600f51ee1a3eb7b30ce3c409129567b70c22012
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.9-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1124/org/apache/cassandra/apache-cassandra/3.0.9/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-1124/

The artifacts as well as the debian package are also available here:
http://people.apache.org/~jake

The vote will be open for 72 hours (longer if needed).

[1]: https://goo.gl/JKkE05 (CHANGES.txt)
[2]: https://goo.gl/Hi8X71 (NEWS.txt)


Re: Proposal - 3.5.1

2016-09-15 Thread Jeremiah D Jordan
Because tick-tock started based off of the 3.0 big bang “we broke everything” 
release I don’t think we can judge wether or not it is working until we are 
another 6 months in.  AKA when we would have been releasing the next big bang 
release.  Right now a lot if not most of the bugs in a given tick tock release 
are bugs that were introduced in 3.0.  Even the bug mentioned here, it is not a 
tick tock bug, it is a 3.0 bug.


> On Sep 15, 2016, at 1:48 PM, Jake Luciani  wrote:
> 
> I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to
> change.
> 
> The problem for me is going back to the old way doesn't sound great. There
> are parts of tick-tock I really like,
> for example, the cadence and limited scope per release.
> 
> I know at the summit there were a lot of ideas thrown around I can
> regurgitate but perhaps people
> who have been thinking about this would like to chime in and present ideas?
> 
> -Jake
> 
> On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith > wrote:
> 
>> I agree tick-tock is a failure.  But for two reasons IMO:
>> 
>> 1) Ultimately, the users are the real testers and it takes a while for a
>> release to percolate into the wild for feedback.  The reality is that a
>> release doesn't have its tires properly kicked for at least three months
>> after it's cut.  So if we are to have any tocks, they should be completely
>> unwed from the ticks, and should probably happen on a ~3M cadence to keep
>> the labour down but the utility up (and there should probably still be more
>> than one tock per tick)
>> 
>> 2) Those promised resources to improved process never happened.  We haven't
>> even reached parity with the 2.1 release until very recently, i.e. no
>> failing u/dtests.
>> 
>> 
>> On 15 September 2016 at 19:08, Jeff Jirsa 
>> wrote:
>> 
>>> I know we’ve got a lot of folks following the dev list without a lot of
>>> background, so let’s make sure we get some context here so everyone can
>> be
>>> on the same page.
>>> 
>>> Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and
>> 3.3.1,
>>> etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first before
>>> the RE manpower is spent on backporting fixes, even critical fixes,
>> because
>>> 3.9 has multiple critical fixes for people running 3.7).
>>> 
>>> Now some background:
>>> 
>>> For many years, Cassandra used to have a dev process that kept 3 active
>>> branches - “bleeding edge”, a “stable”, and an “old stable” branch, where
>>> developers would be committing ALL new contributions to the bleeding
>> edge,
>>> non-api-breaking changes to stable, and bugfixes only to old stable.
>> While
>>> the api changed and major features were added, that bleeding edge would
>>> just be ‘trunk’, and it’d get cut into a major version when it was ready
>> to
>>> ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 / 1.2,
>>> and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released
>> as
>>> a major x.y.0, the third, oldest, most stable branch went EOL, and new
>>> features would go into trunk for the next major version.
>>> 
>>> There were two big negatives observed with this:
>>> 
>>> The first big negative is that if multiple major new features were in
>>> flight, releases were prone to delay. Nobody wants to break an API on a
>>> x.y.1 release, and nobody wants to add a new feature to a x.y.2 release,
>> so
>>> the project would delay the x.y releases if major features were close,
>> and
>>> then there’d be pressure to slip them in before they were fully tested,
>> or
>>> cut features to avoid delaying the release. This pressure was observed to
>>> be bad for the project – it forced technical compromises.
>>> 
>>> The second downside that was observed was that nobody would try to run
>> the
>>> new versions when they launched, because they were buggy because they
>> were
>>> filled with new features. 2.2, for example, introduced RBAC, commitlog
>>> compression, and user defined functions – major features that needed to
>> be
>>> tested. Unfortunately, because there were few real-world testers, there
>>> were still major bugs being found for months – the first production-ready
>>> version of 2.2 is probably in the 2.2.5 or 2.2.6 range.
>>> 
>>> For version 3, we moved to an alternate release, modeled on Intel’s
>>> tick/tock https://en.wikipedia.org/wiki/Tick-Tock_model
>>> 
>>> The intention was to allow new features into 3.even releases (3.0, 3.2,
>>> 3.4, 3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The hope
>>> was to allow more frequent releases to address the first big negative
>>> (flood of new features that blocked releases), while also helping to
>>> address the second – with fewer major features in a release, they better
>>> get more/better test coverage.
>>> 
>>> In the tick/tock model, anyone running 3.odd (like 3.5) should be looking
>>> for bugfixes in 3.7. It’s certainly 

Re: Proposal - 3.5.1

2016-09-15 Thread Jake Luciani
I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to
change.

The problem for me is going back to the old way doesn't sound great. There
are parts of tick-tock I really like,
for example, the cadence and limited scope per release.

I know at the summit there were a lot of ideas thrown around I can
regurgitate but perhaps people
who have been thinking about this would like to chime in and present ideas?

-Jake

On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith  wrote:

> I agree tick-tock is a failure.  But for two reasons IMO:
>
> 1) Ultimately, the users are the real testers and it takes a while for a
> release to percolate into the wild for feedback.  The reality is that a
> release doesn't have its tires properly kicked for at least three months
> after it's cut.  So if we are to have any tocks, they should be completely
> unwed from the ticks, and should probably happen on a ~3M cadence to keep
> the labour down but the utility up (and there should probably still be more
> than one tock per tick)
>
> 2) Those promised resources to improved process never happened.  We haven't
> even reached parity with the 2.1 release until very recently, i.e. no
> failing u/dtests.
>
>
> On 15 September 2016 at 19:08, Jeff Jirsa 
> wrote:
>
> > I know we’ve got a lot of folks following the dev list without a lot of
> > background, so let’s make sure we get some context here so everyone can
> be
> > on the same page.
> >
> > Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and
> 3.3.1,
> > etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first before
> > the RE manpower is spent on backporting fixes, even critical fixes,
> because
> > 3.9 has multiple critical fixes for people running 3.7).
> >
> > Now some background:
> >
> > For many years, Cassandra used to have a dev process that kept 3 active
> > branches - “bleeding edge”, a “stable”, and an “old stable” branch, where
> > developers would be committing ALL new contributions to the bleeding
> edge,
> > non-api-breaking changes to stable, and bugfixes only to old stable.
> While
> > the api changed and major features were added, that bleeding edge would
> > just be ‘trunk’, and it’d get cut into a major version when it was ready
> to
> > ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 / 1.2,
> > and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released
> as
> > a major x.y.0, the third, oldest, most stable branch went EOL, and new
> > features would go into trunk for the next major version.
> >
> > There were two big negatives observed with this:
> >
> > The first big negative is that if multiple major new features were in
> > flight, releases were prone to delay. Nobody wants to break an API on a
> > x.y.1 release, and nobody wants to add a new feature to a x.y.2 release,
> so
> > the project would delay the x.y releases if major features were close,
> and
> > then there’d be pressure to slip them in before they were fully tested,
> or
> > cut features to avoid delaying the release. This pressure was observed to
> > be bad for the project – it forced technical compromises.
> >
> > The second downside that was observed was that nobody would try to run
> the
> > new versions when they launched, because they were buggy because they
> were
> > filled with new features. 2.2, for example, introduced RBAC, commitlog
> > compression, and user defined functions – major features that needed to
> be
> > tested. Unfortunately, because there were few real-world testers, there
> > were still major bugs being found for months – the first production-ready
> > version of 2.2 is probably in the 2.2.5 or 2.2.6 range.
> >
> > For version 3, we moved to an alternate release, modeled on Intel’s
> > tick/tock https://en.wikipedia.org/wiki/Tick-Tock_model
> >
> > The intention was to allow new features into 3.even releases (3.0, 3.2,
> > 3.4, 3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The hope
> > was to allow more frequent releases to address the first big negative
> > (flood of new features that blocked releases), while also helping to
> > address the second – with fewer major features in a release, they better
> > get more/better test coverage.
> >
> > In the tick/tock model, anyone running 3.odd (like 3.5) should be looking
> > for bugfixes in 3.7. It’s certainly true that 3.5 is horribly broken (as
> is
> > 3.3, and 3.4, etc), but with this release model, the bugfix SHOULD BE in
> > 3.7. As I mentioned previously, we have precedent for backporting
> critical
> > fixes, but we don’t have a well defined bar (that I see) for what’s
> > critical enough for a backport.
> >
> > Jon is noting (and what many of us who run Cassandra in production have
> > really known for a very long time) is that nobody wants to run 3.newest
> > (even or odd), because 3.newest is likely broken (because it’s a complex
> > distributed database, and testing is 

Re: Proposal - 3.5.1

2016-09-15 Thread Benedict Elliott Smith
I agree tick-tock is a failure.  But for two reasons IMO:

1) Ultimately, the users are the real testers and it takes a while for a
release to percolate into the wild for feedback.  The reality is that a
release doesn't have its tires properly kicked for at least three months
after it's cut.  So if we are to have any tocks, they should be completely
unwed from the ticks, and should probably happen on a ~3M cadence to keep
the labour down but the utility up (and there should probably still be more
than one tock per tick)

2) Those promised resources to improved process never happened.  We haven't
even reached parity with the 2.1 release until very recently, i.e. no
failing u/dtests.


On 15 September 2016 at 19:08, Jeff Jirsa 
wrote:

> I know we’ve got a lot of folks following the dev list without a lot of
> background, so let’s make sure we get some context here so everyone can be
> on the same page.
>
> Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and 3.3.1,
> etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first before
> the RE manpower is spent on backporting fixes, even critical fixes, because
> 3.9 has multiple critical fixes for people running 3.7).
>
> Now some background:
>
> For many years, Cassandra used to have a dev process that kept 3 active
> branches - “bleeding edge”, a “stable”, and an “old stable” branch, where
> developers would be committing ALL new contributions to the bleeding edge,
> non-api-breaking changes to stable, and bugfixes only to old stable. While
> the api changed and major features were added, that bleeding edge would
> just be ‘trunk’, and it’d get cut into a major version when it was ready to
> ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 / 1.2,
> and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released as
> a major x.y.0, the third, oldest, most stable branch went EOL, and new
> features would go into trunk for the next major version.
>
> There were two big negatives observed with this:
>
> The first big negative is that if multiple major new features were in
> flight, releases were prone to delay. Nobody wants to break an API on a
> x.y.1 release, and nobody wants to add a new feature to a x.y.2 release, so
> the project would delay the x.y releases if major features were close, and
> then there’d be pressure to slip them in before they were fully tested, or
> cut features to avoid delaying the release. This pressure was observed to
> be bad for the project – it forced technical compromises.
>
> The second downside that was observed was that nobody would try to run the
> new versions when they launched, because they were buggy because they were
> filled with new features. 2.2, for example, introduced RBAC, commitlog
> compression, and user defined functions – major features that needed to be
> tested. Unfortunately, because there were few real-world testers, there
> were still major bugs being found for months – the first production-ready
> version of 2.2 is probably in the 2.2.5 or 2.2.6 range.
>
> For version 3, we moved to an alternate release, modeled on Intel’s
> tick/tock https://en.wikipedia.org/wiki/Tick-Tock_model
>
> The intention was to allow new features into 3.even releases (3.0, 3.2,
> 3.4, 3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The hope
> was to allow more frequent releases to address the first big negative
> (flood of new features that blocked releases), while also helping to
> address the second – with fewer major features in a release, they better
> get more/better test coverage.
>
> In the tick/tock model, anyone running 3.odd (like 3.5) should be looking
> for bugfixes in 3.7. It’s certainly true that 3.5 is horribly broken (as is
> 3.3, and 3.4, etc), but with this release model, the bugfix SHOULD BE in
> 3.7. As I mentioned previously, we have precedent for backporting critical
> fixes, but we don’t have a well defined bar (that I see) for what’s
> critical enough for a backport.
>
> Jon is noting (and what many of us who run Cassandra in production have
> really known for a very long time) is that nobody wants to run 3.newest
> (even or odd), because 3.newest is likely broken (because it’s a complex
> distributed database, and testing is hard, and it takes time and complex
> workloads to find bugs). In the tick/tock model, because new features went
> into 3.6, there are new features that may not be adequately
> tested/validated in 3.7 a user of 3.5 doesn’t want, and isn’t willing to
> accept the risk.
>
> The bottom line here is that tick/tock is probably a well intentioned but
> failed attempt to bring stability to Cassandra’s releases. The problems
> tick/tock was meant to solve are real problems, but tick/tock doesn’t seem
> to be addressing them – new features invalidate old testing, which makes it
> difficult/impossible for real users to sit on the 3.odd versions.
>
> We’re due for cutting 3.9 and 3.0.9, and we have limited 

Re: Proposal - 3.5.1

2016-09-15 Thread Jeff Jirsa
I know we’ve got a lot of folks following the dev list without a lot of 
background, so let’s make sure we get some context here so everyone can be on 
the same page. 

Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and 3.3.1, etc) 
if it’s done AFTER 3.9 (I think we need to get 3.9 out first before the RE 
manpower is spent on backporting fixes, even critical fixes, because 3.9 has 
multiple critical fixes for people running 3.7). 

Now some background: 

For many years, Cassandra used to have a dev process that kept 3 active 
branches - “bleeding edge”, a “stable”, and an “old stable” branch, where 
developers would be committing ALL new contributions to the bleeding edge, 
non-api-breaking changes to stable, and bugfixes only to old stable. While the 
api changed and major features were added, that bleeding edge would just be 
‘trunk’, and it’d get cut into a major version when it was ready to ship. We 
saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 / 1.2, and before 
that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released as a major x.y.0, 
the third, oldest, most stable branch went EOL, and new features would go into 
trunk for the next major version. 

There were two big negatives observed with this:

The first big negative is that if multiple major new features were in flight, 
releases were prone to delay. Nobody wants to break an API on a x.y.1 release, 
and nobody wants to add a new feature to a x.y.2 release, so the project would 
delay the x.y releases if major features were close, and then there’d be 
pressure to slip them in before they were fully tested, or cut features to 
avoid delaying the release. This pressure was observed to be bad for the 
project – it forced technical compromises. 

The second downside that was observed was that nobody would try to run the new 
versions when they launched, because they were buggy because they were filled 
with new features. 2.2, for example, introduced RBAC, commitlog compression, 
and user defined functions – major features that needed to be tested. 
Unfortunately, because there were few real-world testers, there were still 
major bugs being found for months – the first production-ready version of 2.2 
is probably in the 2.2.5 or 2.2.6 range. 

For version 3, we moved to an alternate release, modeled on Intel’s tick/tock 
https://en.wikipedia.org/wiki/Tick-Tock_model

The intention was to allow new features into 3.even releases (3.0, 3.2, 3.4, 
3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The hope was to 
allow more frequent releases to address the first big negative (flood of new 
features that blocked releases), while also helping to address the second – 
with fewer major features in a release, they better get more/better test 
coverage.

In the tick/tock model, anyone running 3.odd (like 3.5) should be looking for 
bugfixes in 3.7. It’s certainly true that 3.5 is horribly broken (as is 3.3, 
and 3.4, etc), but with this release model, the bugfix SHOULD BE in 3.7. As I 
mentioned previously, we have precedent for backporting critical fixes, but we 
don’t have a well defined bar (that I see) for what’s critical enough for a 
backport. 

Jon is noting (and what many of us who run Cassandra in production have really 
known for a very long time) is that nobody wants to run 3.newest (even or odd), 
because 3.newest is likely broken (because it’s a complex distributed database, 
and testing is hard, and it takes time and complex workloads to find bugs). In 
the tick/tock model, because new features went into 3.6, there are new features 
that may not be adequately tested/validated in 3.7 a user of 3.5 doesn’t want, 
and isn’t willing to accept the risk.

The bottom line here is that tick/tock is probably a well intentioned but 
failed attempt to bring stability to Cassandra’s releases. The problems 
tick/tock was meant to solve are real problems, but tick/tock doesn’t seem to 
be addressing them – new features invalidate old testing, which makes it 
difficult/impossible for real users to sit on the 3.odd versions.   

We’re due for cutting 3.9 and 3.0.9, and we have limited RE manpower to get 
those out. Only after those are out would I be +1 on a 3.5.1, and then only 
because if I were running 3.5, and I hit this bug, I wouldn’t want to spend the 
~$100k it would cost my organization to validate 3.7 prior to upgrading, and I 
don’t think it’s reasonable to ask users to recompile a release for a ~10 line 
fix for a very nasty bug. 

I’m also very strongly recommend we (committers/PMC) reconsider tick/tock for 
4.x releases, because this is exactly the type of problem that will continue to 
happen as we move forward. I suggest that we either need to go back to the old 
model and do a better job of dealing with feature creep and testing, or we need 
to better define what gets backported, because the community needs a stable 
version to run, and running latest odd release of tick/tock isn’t it.

- Jeff


On 

Re: Proposal - 3.5.1

2016-09-15 Thread Benedict Elliott Smith
It's worth noting more clearly that 3.5 is an arbitrary point in time.  All
3.X releases < 3.6 are affected.

If we backport to 3.5, it seems like 3.1 and 3.3 should get the same
treatment.  I do recall commitments to backport critical fixes, but exactly
what the bar is was never well defined.

I also cannot see how there would be any added confusion.


On 15 September 2016 at 18:31, Dave Lester  wrote:

> How would cutting a 3.5.1 release possibly confuse users of the software?
> It would be easy to document the change and to send release notes.
>
> Given the bug’s critical nature and that it's a minor fix, I’m +1
> (non-binding) to a new release.
>
> Dave
>
> > On Sep 15, 2016, at 7:18 AM, Jeremiah D Jordan <
> jeremiah.jor...@gmail.com> wrote:
> >
> > I’m with Jeff on this, 3.7 (bug fixes on 3.6) has already been released
> with the fix.  Since the fix applies cleanly anyone is free to put it on
> top of 3.5 on their own if they like, but I see no reason to put out a
> 3.5.1 right now and confuse people further.
> >
> > -Jeremiah
> >
> >
> >> On Sep 15, 2016, at 9:07 AM, Jonathan Haddad  wrote:
> >>
> >> As I follow up, I suppose I'm only advocating for a fix to the odd
> >> releases.  Sadly, Tick Tock versioning is misleading.
> >>
> >> If tick tock were to continue (and I'm very much against how it
> currently
> >> works) the whole even-features odd-fixes thing needs to stop ASAP, all
> it
> >> does it confuse people.
> >>
> >> The follow up to 3.4 (3.5) should have been 3.4.1, following semver, so
> >> people know it's bug fixes only to 3.4.
> >>
> >> Jon
> >>
> >> On Wed, Sep 14, 2016 at 10:37 PM Jonathan Haddad 
> wrote:
> >>
> >>> In this particular case, I'd say adding a bug fix release for every
> >>> version that's affected would be the right thing.  The issue is so
> easily
> >>> reproducible and will likely result in massive data loss for anyone on
> 3.X
> >>> WHERE X < 6 and uses the "date" type.
> >>>
> >>> This is how easy it is to reproduce:
> >>>
> >>> 1. Start Cassandra 3.5
> >>> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> >>> 'replication_factor': 1};
> >>> 3. use test;
> >>> 4. create table fail (id int primary key, d date);
> >>> 5. delete d from fail where id = 1;
> >>> 6. Stop Cassandra
> >>> 7. Start Cassandra
> >>>
> >>> You will get this, and startup will fail:
> >>>
> >>> ERROR 05:32:09 Exiting due to error while processing commit log during
> >>> initialization.
> >>> org.apache.cassandra.db.commitlog.CommitLogReplayer$
> CommitLogReplayException:
> >>> Unexpected error deserializing mutation; saved to
> >>> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/
> mutation6313332720566971713dat.
> >>> This may be caused by replaying a mutation against a table with the
> same
> >>> name but incompatible schema.  Exception follows:
> >>> org.apache.cassandra.serializers.MarshalException: Expected 4 byte
> long for
> >>> date (0)
> >>>
> >>> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5
> (and
> >>> probably the other releases) and requires very little investment from
> >>> anyone.
> >>>
> >>>
> >>> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa  >
> >>> wrote:
> >>>
>  We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
>  but we certainly didn’t/won’t go back and cut new releases from every
>  branch for every critical bug in future releases, so I think we need
> to
>  draw the line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it
> seems
>  like you’ve got options (either stay on the tick and go up to 3.7, or
> bail
>  down to 3.0.x)
> 
>  Perhaps, though, this highlights the fact that tick/tock may not be
> the
>  best option long term. We’ve tried it for a year, perhaps we should
> instead
>  discuss whether or not it should continue, or if there’s another
> process
>  that gives us a better way to get useful patches into versions people
> are
>  willing to run in production.
> 
> 
> 
>  On 9/14/16, 8:55 PM, "Jonathan Haddad"  wrote:
> 
> > Common sense is what prevents someone from upgrading to yet another
> > completely unknown version with new features which have probably
> broken
> > even more stuff that nobody is aware of.  The folks I'm helping right
> > deployed 3.5 when they got started because
>  https://urldefense.proofpoint.com/v2/url?u=http-3A__
> cassandra.apache.org=DQIBaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kq
> hAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=
> MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY=pLP3udocOcAG6k_
> sAb9p8tcAhtOhpFm6JB7owGhPQEs=
>  suggests
> > it's acceptable for production.  It turns out using 4 of the built in
> > datatypes of the database result in the server being unable to
> restart
> > without clearing out the commit logs and running a repair.  That

Re: Proposal - 3.5.1

2016-09-15 Thread Edward Capriolo
Where did we come from?

We came from a place where we would say, "You probably do not want to run
2.0.X until it reaches 2.0.6"

One thing about Cassandra is we get into a situation where we can only go
forward. For example, when you update from version X to version Y, version
Y might start writing a new versions of sstables.

X - sstables-v1
Y - sstables-v2

This is very scary operations side because you can not bring the the system
back to running version X as Y data is unreadable.

Where are we at now?

We now seem to be in a place where you say "Problem in 3.5 (trunk at a
given day)?,  go to 3.9 (trunk at last tt- release) "

http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/

"To get there, we are investing significant effort in making trunk “always
releasable,” with the goal that each release, or at least each odd-numbered
bugfix release, should be usable in production. "

I support releasable trunk, but the qualifying statement "or at least each
odd number release" undoes the assertion of "always releasable". Not trying
to nit pick here. I realize it may be hard to get to the desired state of
releasable trunk in a short time.

Anecdotally I notice a lot of "movement" in class names/names of functions.
Generally, I can look at a stack trace of a piece of software and I can
bring up the line number in github and it is dead on, or fairly close to
the line of code. Recently I have tried this in versions fairly close
together and seen some drastic changes.

We know some things i personally do not like:
1) lack of stable-ish api's in the codebase
2) use of singletons rather than simple dependency injection (like even
constructor based injection)

IMHO these do not fit well with 'release often' and always produce 'high
quality release'.

I do not love the concept of 'bug fix release' I would not mind waiting
longer for a feature as long as I could have a high trust factor in in
working right the first time.

Take a feature like trickle_fs, By the description it sounds like a clear
optimization win. It is off by default. The description says "turn on for
ssd" but elsewhere in the configuration # disk_optimization_strategy: ssd.
Are we tuning for ssd by default or not?

By being false, it is not tested in wild, how is it covered and trusted
during tests, how many tests have it off vs on?

I think the concept that trickle_fs can be added as a feature, set false
and possibly gains real world coverage is not comforting to me. I do not
want to turn it on and get some weird issue because no one else is running
this. I would rather it be added on by default with extreme confidence or
not added at all.



On Thu, Sep 15, 2016 at 1:37 AM, Jonathan Haddad  wrote:

> In this particular case, I'd say adding a bug fix release for every version
> that's affected would be the right thing.  The issue is so easily
> reproducible and will likely result in massive data loss for anyone on 3.X
> WHERE X < 6 and uses the "date" type.
>
> This is how easy it is to reproduce:
>
> 1. Start Cassandra 3.5
> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> 3. use test;
> 4. create table fail (id int primary key, d date);
> 5. delete d from fail where id = 1;
> 6. Stop Cassandra
> 7. Start Cassandra
>
> You will get this, and startup will fail:
>
> ERROR 05:32:09 Exiting due to error while processing commit log during
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$
> CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/
> mutation6313332720566971713dat.
> This may be caused by replaying a mutation against a table with the same
> name but incompatible schema.  Exception follows:
> org.apache.cassandra.serializers.MarshalException: Expected 4 byte long
> for
> date (0)
>
> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5 (and
> probably the other releases) and requires very little investment from
> anyone.
>
>
> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa 
> wrote:
>
> > We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
> but
> > we certainly didn’t/won’t go back and cut new releases from every branch
> > for every critical bug in future releases, so I think we need to draw the
> > line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it seems like
> > you’ve got options (either stay on the tick and go up to 3.7, or bail
> down
> > to 3.0.x)
> >
> > Perhaps, though, this highlights the fact that tick/tock may not be the
> > best option long term. We’ve tried it for a year, perhaps we should
> instead
> > discuss whether or not it should continue, or if there’s another process
> > that gives us a better way to get useful patches into versions people are
> > willing to run in production.
> >
> >
> >
> > On 9/14/16, 8:55 PM, "Jonathan Haddad"  wrote:
> >
> > >Common 

Re: Proposal - 3.5.1

2016-09-15 Thread Jeremiah D Jordan
I’m with Jeff on this, 3.7 (bug fixes on 3.6) has already been released with 
the fix.  Since the fix applies cleanly anyone is free to put it on top of 3.5 
on their own if they like, but I see no reason to put out a 3.5.1 right now and 
confuse people further.

-Jeremiah


> On Sep 15, 2016, at 9:07 AM, Jonathan Haddad  wrote:
> 
> As I follow up, I suppose I'm only advocating for a fix to the odd
> releases.  Sadly, Tick Tock versioning is misleading.
> 
> If tick tock were to continue (and I'm very much against how it currently
> works) the whole even-features odd-fixes thing needs to stop ASAP, all it
> does it confuse people.
> 
> The follow up to 3.4 (3.5) should have been 3.4.1, following semver, so
> people know it's bug fixes only to 3.4.
> 
> Jon
> 
> On Wed, Sep 14, 2016 at 10:37 PM Jonathan Haddad  wrote:
> 
>> In this particular case, I'd say adding a bug fix release for every
>> version that's affected would be the right thing.  The issue is so easily
>> reproducible and will likely result in massive data loss for anyone on 3.X
>> WHERE X < 6 and uses the "date" type.
>> 
>> This is how easy it is to reproduce:
>> 
>> 1. Start Cassandra 3.5
>> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
>> 'replication_factor': 1};
>> 3. use test;
>> 4. create table fail (id int primary key, d date);
>> 5. delete d from fail where id = 1;
>> 6. Stop Cassandra
>> 7. Start Cassandra
>> 
>> You will get this, and startup will fail:
>> 
>> ERROR 05:32:09 Exiting due to error while processing commit log during
>> initialization.
>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>> Unexpected error deserializing mutation; saved to
>> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/mutation6313332720566971713dat.
>> This may be caused by replaying a mutation against a table with the same
>> name but incompatible schema.  Exception follows:
>> org.apache.cassandra.serializers.MarshalException: Expected 4 byte long for
>> date (0)
>> 
>> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5 (and
>> probably the other releases) and requires very little investment from
>> anyone.
>> 
>> 
>> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa 
>> wrote:
>> 
>>> We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
>>> but we certainly didn’t/won’t go back and cut new releases from every
>>> branch for every critical bug in future releases, so I think we need to
>>> draw the line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it seems
>>> like you’ve got options (either stay on the tick and go up to 3.7, or bail
>>> down to 3.0.x)
>>> 
>>> Perhaps, though, this highlights the fact that tick/tock may not be the
>>> best option long term. We’ve tried it for a year, perhaps we should instead
>>> discuss whether or not it should continue, or if there’s another process
>>> that gives us a better way to get useful patches into versions people are
>>> willing to run in production.
>>> 
>>> 
>>> 
>>> On 9/14/16, 8:55 PM, "Jonathan Haddad"  wrote:
>>> 
 Common sense is what prevents someone from upgrading to yet another
 completely unknown version with new features which have probably broken
 even more stuff that nobody is aware of.  The folks I'm helping right
 deployed 3.5 when they got started because
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org=DQIBaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY=pLP3udocOcAG6k_sAb9p8tcAhtOhpFm6JB7owGhPQEs=
>>> suggests
 it's acceptable for production.  It turns out using 4 of the built in
 datatypes of the database result in the server being unable to restart
 without clearing out the commit logs and running a repair.  That screams
 critical to me.  You shouldn't even be able to install 3.5 without the
 patch I've supplied - that bug is a ticking time bomb for anyone that
 installs it.
 
 On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler 
 wrote:
 
> What's preventing the use of the 3.6 or 3.7 releases where this bug is
> already fixed? This is also fixed in the 3.0.6/7/8 releases.
> 
> Michael
> 
> On 09/14/2016 08:30 PM, Jonathan Haddad wrote:
>> Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not back
>>> ported to
>> 3.5 as well, and it makes Cassandra effectively unusable if someone
>>> is
>> using any of the 4 types affected in any of their schema.
>> 
>> I have cherry picked & merged the patch back to here and will put it
>>> in a
>> JIRA as well tonight, I just wanted to get the ball rolling asap on
>>> this.
>> 
>> 
> 
>>> 

Re: Proposal - 3.5.1

2016-09-15 Thread Jonathan Haddad
As I follow up, I suppose I'm only advocating for a fix to the odd
releases.  Sadly, Tick Tock versioning is misleading.

If tick tock were to continue (and I'm very much against how it currently
works) the whole even-features odd-fixes thing needs to stop ASAP, all it
does it confuse people.

The follow up to 3.4 (3.5) should have been 3.4.1, following semver, so
people know it's bug fixes only to 3.4.

Jon

On Wed, Sep 14, 2016 at 10:37 PM Jonathan Haddad  wrote:

> In this particular case, I'd say adding a bug fix release for every
> version that's affected would be the right thing.  The issue is so easily
> reproducible and will likely result in massive data loss for anyone on 3.X
> WHERE X < 6 and uses the "date" type.
>
> This is how easy it is to reproduce:
>
> 1. Start Cassandra 3.5
> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> 3. use test;
> 4. create table fail (id int primary key, d date);
> 5. delete d from fail where id = 1;
> 6. Stop Cassandra
> 7. Start Cassandra
>
> You will get this, and startup will fail:
>
> ERROR 05:32:09 Exiting due to error while processing commit log during
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/mutation6313332720566971713dat.
> This may be caused by replaying a mutation against a table with the same
> name but incompatible schema.  Exception follows:
> org.apache.cassandra.serializers.MarshalException: Expected 4 byte long for
> date (0)
>
> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5 (and
> probably the other releases) and requires very little investment from
> anyone.
>
>
> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa 
> wrote:
>
>> We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
>> but we certainly didn’t/won’t go back and cut new releases from every
>> branch for every critical bug in future releases, so I think we need to
>> draw the line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it seems
>> like you’ve got options (either stay on the tick and go up to 3.7, or bail
>> down to 3.0.x)
>>
>> Perhaps, though, this highlights the fact that tick/tock may not be the
>> best option long term. We’ve tried it for a year, perhaps we should instead
>> discuss whether or not it should continue, or if there’s another process
>> that gives us a better way to get useful patches into versions people are
>> willing to run in production.
>>
>>
>>
>> On 9/14/16, 8:55 PM, "Jonathan Haddad"  wrote:
>>
>> >Common sense is what prevents someone from upgrading to yet another
>> >completely unknown version with new features which have probably broken
>> >even more stuff that nobody is aware of.  The folks I'm helping right
>> >deployed 3.5 when they got started because
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org=DQIBaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY=pLP3udocOcAG6k_sAb9p8tcAhtOhpFm6JB7owGhPQEs=
>> suggests
>> >it's acceptable for production.  It turns out using 4 of the built in
>> >datatypes of the database result in the server being unable to restart
>> >without clearing out the commit logs and running a repair.  That screams
>> >critical to me.  You shouldn't even be able to install 3.5 without the
>> >patch I've supplied - that bug is a ticking time bomb for anyone that
>> >installs it.
>> >
>> >On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler 
>> >wrote:
>> >
>> >> What's preventing the use of the 3.6 or 3.7 releases where this bug is
>> >> already fixed? This is also fixed in the 3.0.6/7/8 releases.
>> >>
>> >> Michael
>> >>
>> >> On 09/14/2016 08:30 PM, Jonathan Haddad wrote:
>> >> > Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not back
>> ported to
>> >> > 3.5 as well, and it makes Cassandra effectively unusable if someone
>> is
>> >> > using any of the 4 types affected in any of their schema.
>> >> >
>> >> > I have cherry picked & merged the patch back to here and will put it
>> in a
>> >> > JIRA as well tonight, I just wanted to get the ball rolling asap on
>> this.
>> >> >
>> >> >
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rustyrazorblade_cassandra_tree_fix-5Fcommitlog-5Fexception=DQIBaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY=ktY5tkT-nO1jtyc0EicbgZHXJYl03DvzuxqzyyOgzII=
>> >> >
>> >> > Jon
>> >> >
>> >>
>> >>
>>
>


Re: Failing tests 2016-09-14

2016-09-15 Thread Oleksandr Petrov
> SelectTest start to be pretty big.

Agree, I've started to get that feeling as well.

On Thu, Sep 15, 2016 at 9:42 AM Benjamin Lerer 
wrote:

> SelectTest start to be pretty big. It makes sense to splitting it into
> separate TestClasses. For example we could extract all the filtering tests
> into a new TestClass: FilteringTest or SelectWithFilteringTest.
>
> On Thu, Sep 15, 2016 at 8:34 AM, Oleksandr Petrov <
> oleksandr.pet...@gmail.com> wrote:
>
> > > CASSANDRA-11031
> >
> > Yes, sorry for delay with #11031 dtests. I've ran updated dtests
> yesterday
> > and they were clean to merge. I just wanted to make sure someone else
> takes
> > a quick glance. By now they're merged, so hopefully today it's going to
> be
> > better.
> >
> > As regards environmental timeouts, it looks like certain methods are more
> > prone to this (in particular, view filtering test does quite a lot). I
> > realise they don't hang, they just execute slower on CI machine than we
> > anticipate. But what should we do with it generally? Increasing timeouts
> > won't really help, so what comes to mind is:
> >   * Splitting tests
> >   * Modularising to make sure unnecessary components don't get started
> >   * Running "slow-prone" tests sequentially to make sure they get enough
> > processor time
> >   * Taking a deeper look, might be there's a performance issue hiding
> > behind
> >   * Thread-dumping in case there is some sort of deadlock that's hard to
> > reproduce on "faster" machine (however improbable that might sound)
> >
> > Since it looks like generally tests are in much better shape, it might
> be a
> > good point to start thinking about those timing out ones.
> >
> >
> >
> >
> > On Thu, Sep 15, 2016 at 7:51 AM Joel Knighton <
> joel.knigh...@datastax.com>
> > wrote:
> >
> > > cassandra-3.9
> > > ===
> > > testall: 8 failures
> > >   org.apache.cassandra.cql3.ViewFilteringTest
> > >   .testPartitionKeyAndClusteringKeyFilteringRestrictions
> > >
> > >   org.apache.cassandra.cql3.ViewFilteringTest
> > >   .testMVCreationSelectRestrictions
> > >
> > >   org.apache.cassandra.cql3.ViewTest.testCompoundPartitionKey
> > >
> > >   org.apache.cassandra.cql3.validation.entities.UFTest.testEmptyString
> > >
> > >   org.apache.cassandra.cql3.validation.operations.AggregationTest
> > >   .testFunctionsWithCompactStorage
> > >
> > >   org.apache.cassandra.cql3.validation.operations.SelectTest
> > >   .testAllowFiltering
> > >   These six test failures are due to environmental timeouts.
> > >
> > >   org.apache.cassandra.db.compaction
> > >   .TimeWindowCompactionStrategyTest
> > >   .testDropExpiredSSTables-compression
> > >   New flaky failure. CASSANDRA-12645 opened.
> > >
> > >   org.apache.cassandra.service.RemoveTest.testBadHostId
> > > CASSANDRA-12487. Flaky failure in a test utility setup method.
> > >
> > > ===
> > > dtest: 1 failure
> > >   user_types_test.TestUserTypes.test_type_as_part_of_pkey
> > > Should have been fixed as part of CASSANDRA-11031. Incorrect
> > > version gating still - I'll follow up and get this fixed tomorrow.
> > >
> > > ===
> > > novnode: 4 failures
> > >   user_types_test.TestUserTypes.test_type_as_part_of_pkey
> > >   Same as above.
> > >
> > >   cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> > >   .test_bulk_round_trip_with_single_core
> > >   New failure - looks like a schema agreement problem. A JIRA
> > >   hasn't been created yet.
> > >
> > >   cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> > >   .test_reading_max_insert_errors
> > >   New failure - looks like Netty detected a leak. A JIRA hasn't
> been
> > >   created yet.
> > >
> > >   batch_test.TestBatch.logged_batch_doesnt_throw_uae_test
> > >   CASSANDRA-12383. Flaky failure.
> > >
> > > ===
> > > upgrade: 1 failure
> > >   upgrade_tests.cql_tests
> > >   .TestCQLNodes3RF3_Upgrade_current_2_1_x_To_indev_3_x
> > >   .bug_5732_test
> > >   CASSANDRA-12457. Patch available that needs a reviewer.
> > >
> > >
> > > Since there's a few open opportunities based on 3.9 failures, I'm only
> > > covering 3.9 on today's email.
> > >
> > --
> > Alex Petrov
> >
>
-- 
Alex Petrov


Re: Failing tests 2016-09-14

2016-09-15 Thread Benjamin Lerer
SelectTest start to be pretty big. It makes sense to splitting it into
separate TestClasses. For example we could extract all the filtering tests
into a new TestClass: FilteringTest or SelectWithFilteringTest.

On Thu, Sep 15, 2016 at 8:34 AM, Oleksandr Petrov <
oleksandr.pet...@gmail.com> wrote:

> > CASSANDRA-11031
>
> Yes, sorry for delay with #11031 dtests. I've ran updated dtests yesterday
> and they were clean to merge. I just wanted to make sure someone else takes
> a quick glance. By now they're merged, so hopefully today it's going to be
> better.
>
> As regards environmental timeouts, it looks like certain methods are more
> prone to this (in particular, view filtering test does quite a lot). I
> realise they don't hang, they just execute slower on CI machine than we
> anticipate. But what should we do with it generally? Increasing timeouts
> won't really help, so what comes to mind is:
>   * Splitting tests
>   * Modularising to make sure unnecessary components don't get started
>   * Running "slow-prone" tests sequentially to make sure they get enough
> processor time
>   * Taking a deeper look, might be there's a performance issue hiding
> behind
>   * Thread-dumping in case there is some sort of deadlock that's hard to
> reproduce on "faster" machine (however improbable that might sound)
>
> Since it looks like generally tests are in much better shape, it might be a
> good point to start thinking about those timing out ones.
>
>
>
>
> On Thu, Sep 15, 2016 at 7:51 AM Joel Knighton 
> wrote:
>
> > cassandra-3.9
> > ===
> > testall: 8 failures
> >   org.apache.cassandra.cql3.ViewFilteringTest
> >   .testPartitionKeyAndClusteringKeyFilteringRestrictions
> >
> >   org.apache.cassandra.cql3.ViewFilteringTest
> >   .testMVCreationSelectRestrictions
> >
> >   org.apache.cassandra.cql3.ViewTest.testCompoundPartitionKey
> >
> >   org.apache.cassandra.cql3.validation.entities.UFTest.testEmptyString
> >
> >   org.apache.cassandra.cql3.validation.operations.AggregationTest
> >   .testFunctionsWithCompactStorage
> >
> >   org.apache.cassandra.cql3.validation.operations.SelectTest
> >   .testAllowFiltering
> >   These six test failures are due to environmental timeouts.
> >
> >   org.apache.cassandra.db.compaction
> >   .TimeWindowCompactionStrategyTest
> >   .testDropExpiredSSTables-compression
> >   New flaky failure. CASSANDRA-12645 opened.
> >
> >   org.apache.cassandra.service.RemoveTest.testBadHostId
> > CASSANDRA-12487. Flaky failure in a test utility setup method.
> >
> > ===
> > dtest: 1 failure
> >   user_types_test.TestUserTypes.test_type_as_part_of_pkey
> > Should have been fixed as part of CASSANDRA-11031. Incorrect
> > version gating still - I'll follow up and get this fixed tomorrow.
> >
> > ===
> > novnode: 4 failures
> >   user_types_test.TestUserTypes.test_type_as_part_of_pkey
> >   Same as above.
> >
> >   cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> >   .test_bulk_round_trip_with_single_core
> >   New failure - looks like a schema agreement problem. A JIRA
> >   hasn't been created yet.
> >
> >   cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> >   .test_reading_max_insert_errors
> >   New failure - looks like Netty detected a leak. A JIRA hasn't been
> >   created yet.
> >
> >   batch_test.TestBatch.logged_batch_doesnt_throw_uae_test
> >   CASSANDRA-12383. Flaky failure.
> >
> > ===
> > upgrade: 1 failure
> >   upgrade_tests.cql_tests
> >   .TestCQLNodes3RF3_Upgrade_current_2_1_x_To_indev_3_x
> >   .bug_5732_test
> >   CASSANDRA-12457. Patch available that needs a reviewer.
> >
> >
> > Since there's a few open opportunities based on 3.9 failures, I'm only
> > covering 3.9 on today's email.
> >
> --
> Alex Petrov
>


Re: Failing tests 2016-09-14

2016-09-15 Thread Oleksandr Petrov
> CASSANDRA-11031

Yes, sorry for delay with #11031 dtests. I've ran updated dtests yesterday
and they were clean to merge. I just wanted to make sure someone else takes
a quick glance. By now they're merged, so hopefully today it's going to be
better.

As regards environmental timeouts, it looks like certain methods are more
prone to this (in particular, view filtering test does quite a lot). I
realise they don't hang, they just execute slower on CI machine than we
anticipate. But what should we do with it generally? Increasing timeouts
won't really help, so what comes to mind is:
  * Splitting tests
  * Modularising to make sure unnecessary components don't get started
  * Running "slow-prone" tests sequentially to make sure they get enough
processor time
  * Taking a deeper look, might be there's a performance issue hiding
behind
  * Thread-dumping in case there is some sort of deadlock that's hard to
reproduce on "faster" machine (however improbable that might sound)

Since it looks like generally tests are in much better shape, it might be a
good point to start thinking about those timing out ones.




On Thu, Sep 15, 2016 at 7:51 AM Joel Knighton 
wrote:

> cassandra-3.9
> ===
> testall: 8 failures
>   org.apache.cassandra.cql3.ViewFilteringTest
>   .testPartitionKeyAndClusteringKeyFilteringRestrictions
>
>   org.apache.cassandra.cql3.ViewFilteringTest
>   .testMVCreationSelectRestrictions
>
>   org.apache.cassandra.cql3.ViewTest.testCompoundPartitionKey
>
>   org.apache.cassandra.cql3.validation.entities.UFTest.testEmptyString
>
>   org.apache.cassandra.cql3.validation.operations.AggregationTest
>   .testFunctionsWithCompactStorage
>
>   org.apache.cassandra.cql3.validation.operations.SelectTest
>   .testAllowFiltering
>   These six test failures are due to environmental timeouts.
>
>   org.apache.cassandra.db.compaction
>   .TimeWindowCompactionStrategyTest
>   .testDropExpiredSSTables-compression
>   New flaky failure. CASSANDRA-12645 opened.
>
>   org.apache.cassandra.service.RemoveTest.testBadHostId
> CASSANDRA-12487. Flaky failure in a test utility setup method.
>
> ===
> dtest: 1 failure
>   user_types_test.TestUserTypes.test_type_as_part_of_pkey
> Should have been fixed as part of CASSANDRA-11031. Incorrect
> version gating still - I'll follow up and get this fixed tomorrow.
>
> ===
> novnode: 4 failures
>   user_types_test.TestUserTypes.test_type_as_part_of_pkey
>   Same as above.
>
>   cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
>   .test_bulk_round_trip_with_single_core
>   New failure - looks like a schema agreement problem. A JIRA
>   hasn't been created yet.
>
>   cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
>   .test_reading_max_insert_errors
>   New failure - looks like Netty detected a leak. A JIRA hasn't been
>   created yet.
>
>   batch_test.TestBatch.logged_batch_doesnt_throw_uae_test
>   CASSANDRA-12383. Flaky failure.
>
> ===
> upgrade: 1 failure
>   upgrade_tests.cql_tests
>   .TestCQLNodes3RF3_Upgrade_current_2_1_x_To_indev_3_x
>   .bug_5732_test
>   CASSANDRA-12457. Patch available that needs a reviewer.
>
>
> Since there's a few open opportunities based on 3.9 failures, I'm only
> covering 3.9 on today's email.
>
-- 
Alex Petrov