from:"Michael Kjellman"

Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman

i’m not trying to get into a fight here jeremiah. and this will be my last 
reply on this as i’ve made my opinion pretty clear. but ask yourself: would you 
run c* in idea debugger and then do performance testing? no. because it’s a 
DEBUGger.

> On Mar 18, 2018, at 11:43 AM, J. D. Jordan <jeremiah.jor...@gmail.com> wrote:
> 
> If there are some log messages you think should be improved to make them more 
> useful please do so.  Saying things are “crap” is not productive.
> 
> I have seen having the extra information from the debug.log be very helpful 
> in debugging production issues after the fact on operational clusters many 
> times.
> 
> Also if you think there are things logged at DEBUG, since it was cleaned it 
> up, that are not useful, then please improve them or change their logging 
> level.
> 
> You are also free to change the logging level on clusters you run if you 
> don’t want the extra information.
> 
> And again we are only talking about versions where DEBUG has been cleaned up. 
> When running 2.1 or earlier, yes there is a ton of stuff at DEBUG and you 
> would not want that on by default, even asynchronously.
> 
> It is up to reviewers and committers to understand the impact of and rules 
> around the use of different log levels. Said reviewers and committers should 
> teach new contributors those rules during reviews if they are violated.
> 
> -Jeremiah
> 
>> On Mar 18, 2018, at 2:31 PM, Michael Kjellman <kjell...@apple.com> wrote:
>> 
>> what really baffles me with this entire thing is as a project we donâ€™t 
>> even log things like partition keys along with the tombstone overwhelming or 
>> batch to large log messages.. this would immediately be helpful to thousands 
>> and thousands of people... yet somehow we think itâ€™s okay to log tons of 
>> crap at debug to users drives that will shorten their ssds and objectively 
>> reduce the performance of the actual database due to logging overhead for 
>> some possible day in the future when we might need them to debug a problem 
>> really we should have figured out and reproduced ourselves in the first 
>> place.
>> 
>>> On Mar 18, 2018, at 11:24 AM, Michael Kjellman <kjell...@apple.com> wrote:
>>> 
>>> itâ€™s too easy to make a regression there. and does anyone even have a 
>>> splunk (or equivalent) infrastructure to actually keep debug logs around 
>>> for a long enough retention period to even have them be helpful?
>>> 
>>> again: this is something engineers for the project want. itâ€™s not in the 
>>> best interest for our users. 
>>> 
>>> 
>>>> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>>> 
>>>> That really depends on whether you're judicious in deciding what to log at
>>>> debug, doesn't it?
>>>> 
>>>> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman <kjell...@apple.com>
>>>> wrote:
>>>> 
>>>>> +1. this is how it works.
>>>>> 
>>>>> your computer doesnâ€™t run at debug logging by default. your phone 
>>>>> doesnâ€™t
>>>>> either. neither does your smart tv. your database canâ€™t be running at 
>>>>> debug
>>>>> just because it makes our lives as engineers easier.
>>>>> 
>>>>>> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>>>>> a...@thelastpickle.com> wrote:
>>>>>> 
>>>>>> It's a tiny bit unusual to turn on debug logging for all users by default
>>>>>> though, and there should be occasions to turn it on when facing issues
>>>>> that
>>>>>> you want to debug (if they can be easily reproduced).
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jonathan Ellis
>>>> co-founder, http://www.datastax.com
>>>> @spyced
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> Ð¢ÐÐ¥FòVç7V'67&–ÂRÖÖ–Ã¢FWb×Vç7V'67&–676æG&æ6†Ræ÷Ð¤f÷"FF—F–öæÂ6öÖÖæG2ÂRÖÖ–Ã¢FWbÖ†VÇ676æG&æ6†Ræ÷Ð
>>  Ð
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman

what really baffles me with this entire thing is as a project we don’t even log 
things like partition keys along with the tombstone overwhelming or batch to 
large log messages.. this would immediately be helpful to thousands and 
thousands of people... yet somehow we think it’s okay to log tons of crap at 
debug to users drives that will shorten their ssds and objectively reduce the 
performance of the actual database due to logging overhead for some possible 
day in the future when we might need them to debug a problem really we should 
have figured out and reproduced ourselves in the first place.

> On Mar 18, 2018, at 11:24 AM, Michael Kjellman <kjell...@apple.com> wrote:
> 
> it’s too easy to make a regression there. and does anyone even have a splunk 
> (or equivalent) infrastructure to actually keep debug logs around for a long 
> enough retention period to even have them be helpful?
> 
> again: this is something engineers for the project want. it’s not in the best 
> interest for our users. 
> 
> 
>> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>> 
>> That really depends on whether you're judicious in deciding what to log at
>> debug, doesn't it?
>> 
>> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman <kjell...@apple.com>
>> wrote:
>> 
>>> +1. this is how it works.
>>> 
>>> your computer doesn’t run at debug logging by default. your phone doesn’t
>>> either. neither does your smart tv. your database can’t be running at debug
>>> just because it makes our lives as engineers easier.
>>> 
>>>> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>> 
>>>> It's a tiny bit unusual to turn on debug logging for all users by default
>>>> though, and there should be occasions to turn it on when facing issues
>>> that
>>>> you want to debug (if they can be easily reproduced).
>>> 
>> 
>> 
>> 
>> -- 
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman

it’s too easy to make a regression there. and does anyone even have a splunk 
(or equivalent) infrastructure to actually keep debug logs around for a long 
enough retention period to even have them be helpful?

again: this is something engineers for the project want. it’s not in the best 
interest for our users. 


> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
> 
> That really depends on whether you're judicious in deciding what to log at
> debug, doesn't it?
> 
> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman <kjell...@apple.com>
> wrote:
> 
>> +1. this is how it works.
>> 
>> your computer doesn’t run at debug logging by default. your phone doesn’t
>> either. neither does your smart tv. your database can’t be running at debug
>> just because it makes our lives as engineers easier.
>> 
>>> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>> 
>>> It's a tiny bit unusual to turn on debug logging for all users by default
>>> though, and there should be occasions to turn it on when facing issues
>> that
>>> you want to debug (if they can be easily reproduced).
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman

+1. this is how it works.

your computer doesn’t run at debug logging by default. your phone doesn’t 
either. neither does your smart tv. your database can’t be running at debug 
just because it makes our lives as engineers easier. 

> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski  
> wrote:
> 
> It's a tiny bit unusual to turn on debug logging for all users by default
> though, and there should be occasions to turn it on when facing issues that
> you want to debug (if they can be easily reproduced).

Re: Debug logging enabled by default since 2.2

2018-03-17 Thread Michael Kjellman

ive never understood this change. and it’s been explained to me multiple times.

DEBUG shouldn’t run by default in prod. and it certainly shouldn’t be enabled 
by default for users.

but hey, what do i know! just my 2 cents. 

> On Mar 17, 2018, at 10:55 AM, J. D. Jordan  wrote:
> 
> We went through an exercise of setting things up so that DEBUG logging was 
> asynchronous would give people a “production” debug log. 
> https://issues.apache.org/jira/browse/CASSANDRA-10241
> If there are some things going out at DEBUG that cause performance issues 
> then most likely those should be moved to TRACE so that debug logging can 
> stay enabled for all the useful information found there.
> 
> -Jeremiah
> 
>> On Mar 17, 2018, at 1:49 PM, Alexander Dejanovski  
>> wrote:
>> 
>> Hi folks,
>> 
>> we've been upgrading clusters from 2.0 to 2.2 recently and we've noticed
>> that debug logging was causing serious performance issues in some cases,
>> specifically because of its use in the query pager.
>> 
>> I've opened a ticket with some benchmarks and flame graphs :
>> https://issues.apache.org/jira/browse/CASSANDRA-14318
>> 
>> The problem should be less serious in the read path with Cassandra 3.0 and
>> above as the query pager code has been reworked and doesn't log at debug
>> level.
>> I think that debug logging shouldn't be turned on by default though, since
>> we see it doesn't come for free and that it lowers read performance in 2.2.
>> 
>> Was there any specific reason why it was enabled by default in 2.2 ?
>> 
>> Is anyone opposed to disabling debug logging by default in all branches ?
>> 
>> -- 
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>> 
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018

2018-02-27 Thread Michael Kjellman

2.2: yes
2.1: no.

i don't think it's worth the effort to get it working on 2.1 at this point -- 
and i hope we've fully moved on from 2.1 by August 31, 2018 ;)

> On Feb 27, 2018, at 5:35 PM, kurt greaves <k...@instaclustr.com> wrote:
> 
> Not that much gets committed to 2.1 and 2.2 anymore, but is this also true
> for those branches?
> 
> On 27 February 2018 at 22:58, Michael Kjellman <kjell...@apple.com> wrote:
> 
>> FYI: we're already fully on circleci 2.0 for the 3.0, 3.11, and trunk
>> branches so no action required for us here!
>> 
>> best,
>> kjellman
>> 
>> Begin forwarded message:
>> 
>> From: The CircleCI Team <no-re...@circleci.com> no-re...@circleci.com>>
>> Subject: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018
>> Date: February 27, 2018 at 2:44:01 PM PST
>> To: mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>
>> Reply-To: <no-re...@circleci.com<mailto:no-re...@circleci.com>>
>> 
>> 
>> Dear customer,
>> 
>> We wanted to let you know that we are planning on sunsetting CircleCI 1.0
>> on August 31st, 2018. Our goal as a company for 2018 is to invest in
>> delivering more features and better performance on CircleCI 2.0, which
>> unlocks faster builds and greater control. For more information, you can
>> read our blog post<http://go.circleci.com/Mk0H040mZ0a12006G0UM052> on
>> sunsetting CircleCI 1.0.
>> 
>> You’ll need to update all of your config files to the CircleCI 2.0 syntax
>> in order to migrate your projects to CircleCI 2.0 over the next 6 months.
>> 
>> If all of your projects are already on 2.0, congratulations! No action is
>> necessary. We are sending this announcement to all active users to make
>> sure you have all of the information you need. Take a look at your builds
>> dashboard<http://go.circleci.com/ra060H200M7GmkZ1U004020> to see if your
>> projects are still building on CircleCI 1.0:
>> 
>> [https://www2.circleci.com/rs/485-ZMH-626/images/CircleCI%
>> 20Version%20Number.png]
>> 
>> 
>> These resources will help you get migrate your projects from 1.0 to 2.0:
>> 
>> Config.yml translator.<http://go.circleci.com/ZMkaU018240m720GHZ0><
>> http://go.circleci.com/ZMkaU018240m720GHZ0> Note: This will generate
>> a baseline config.yml file that you can adjust to fit your needs.
>> 1.0 to 2.0 migration documentation.<http://go.circleci.com/
>> X2G0U9M0k000H0am4021Z80>
>> Language-specific 2.0 tutorials.<http://go.circleci.
>> com/Q00H0mM0Z92G40k1200aaU0>
>> 
>> We will be sending you email reminders periodically with additional
>> resources and links as they become available to help with your migration
>> plan. We will also be updating this page<http://go.circleci.com/
>> ga0makG04201MH02Z000b0U> with information relevant to sunsetting CircleCI
>> 1.0. If you need additional migration assistance, open a support request<
>> http://go.circleci.com/R00Gam1M420020U0ZbH0ck0> and our support team will
>> be in touch.
>> 
>> 
>> Cheers,
>> 
>> The CircleCI Team
>> 
>> 
>> 
>> 
>>

Re: Timeout unit tests in trunk

2018-02-27 Thread Michael Kjellman

i've seen it timeout a lot too. if you think breaking it up will fix it that 
definitely sounds like a good approach!

> On Feb 27, 2018, at 2:57 PM, Dikang Gu <dikan...@gmail.com> wrote:
> 
> I took some look at the cql3.ViewTest, it seems too big and timeout very
> often. Any objections if I split it into two or multiple tests?
> 
> On Tue, Feb 27, 2018 at 1:32 PM, Michael Kjellman <kjell...@apple.com>
> wrote:
> 
>> well, turns out we already have a jira tracking the MV tests being broken
>> on trunk. they are legit broken :) thanks jaso
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-14194
>> 
>> not sure about the batch test timeout there though.. did you debug it at
>> all by chance?
>> 
>> 
>> On Feb 27, 2018, at 1:27 PM, Michael Kjellman <kjell...@apple.com> kjell...@apple.com>> wrote:
>> 
>> hey dikang: just chatted a little bit about this. proposal: let's add the
>> equivalent of @resource_intensive to unit tests too.. and the first one is
>> to stop from running the MV unit tests in the free circleci containers.
>> thoughts?
>> 
>> also, might want to bug your management to see if you can get some paid
>> circleci resources. it's game changing!
>> 
>> best,
>> kjellman
>> 
>> On Feb 27, 2018, at 12:12 PM, Dinesh Joshi <dinesh.jo...@yahoo.com.
>> INVALID<mailto:dinesh.jo...@yahoo.com.INVALID>> wrote:
>> 
>> Some tests might require additional resources to spin up the required
>> components. 2 CPU / 4GB might not be sufficient. You may need to bump up
>> the resources to 8CPU / 16GB.
>> Dinesh
>> 
>>  On Tuesday, February 27, 2018, 11:24:34 AM PST, Dikang Gu <
>> dikan...@gmail.com<mailto:dikan...@gmail.com>> wrote:
>> 
>> Looks like there are a few flaky/timeout unit tests in trunk, wondering is
>> there anyone looking at them already?
>> 
>> testBuildRange - org.apache.cassandra.db.view.ViewBuilderTaskTest
>> testUnloggedPartitionsPerBatch -
>> org.apache.cassandra.metrics.BatchMetricsTest
>> testViewBuilderResume - org.apache.cassandra.cql3.ViewTest
>> 
>> https://circleci.com/gh/DikangGu/cassandra/20
>> 
>> --
>> Dikang
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev-
>> unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: dev-h...@cassandra.apache.org> dev-h...@cassandra.apache.org>
>> 
>> 
>> 
> 
> 
> -- 
> Dikang


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Fwd: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018

2018-02-27 Thread Michael Kjellman

FYI: we're already fully on circleci 2.0 for the 3.0, 3.11, and trunk branches 
so no action required for us here!

best,
kjellman

Begin forwarded message:

From: The CircleCI Team >
Subject: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018
Date: February 27, 2018 at 2:44:01 PM PST
To: mkjell...@internalcircle.com
Reply-To: >

Dear customer,

We wanted to let you know that we are planning on sunsetting CircleCI 1.0 on 
August 31st, 2018. Our goal as a company for 2018 is to invest in delivering 
more features and better performance on CircleCI 2.0, which unlocks faster 
builds and greater control. For more information, you can read our blog 
post on sunsetting CircleCI 1.0.

You’ll need to update all of your config files to the CircleCI 2.0 syntax in 
order to migrate your projects to CircleCI 2.0 over the next 6 months.

If all of your projects are already on 2.0, congratulations! No action is 
necessary. We are sending this announcement to all active users to make sure 
you have all of the information you need. Take a look at your builds 
dashboard to see if your 
projects are still building on CircleCI 1.0:

[https://www2.circleci.com/rs/485-ZMH-626/images/CircleCI%20Version%20Number.png]

These resources will help you get migrate your projects from 1.0 to 2.0:

Config.yml 
translator.
 Note: This will generate a baseline config.yml file that you can adjust to fit 
your needs.
1.0 to 2.0 migration 
documentation.
Language-specific 2.0 tutorials.

We will be sending you email reminders periodically with additional resources 
and links as they become available to help with your migration plan. We will 
also be updating this page with 
information relevant to sunsetting CircleCI 1.0. If you need additional 
migration assistance, open a support 
request and our support team 
will be in touch.

Cheers,

The CircleCI Team

Re: Timeout unit tests in trunk

2018-02-27 Thread Michael Kjellman

well, turns out we already have a jira tracking the MV tests being broken on 
trunk. they are legit broken :) thanks jaso

https://issues.apache.org/jira/browse/CASSANDRA-14194

not sure about the batch test timeout there though.. did you debug it at all by 
chance?


On Feb 27, 2018, at 1:27 PM, Michael Kjellman 
<kjell...@apple.com<mailto:kjell...@apple.com>> wrote:

hey dikang: just chatted a little bit about this. proposal: let's add the 
equivalent of @resource_intensive to unit tests too.. and the first one is to 
stop from running the MV unit tests in the free circleci containers. thoughts?

also, might want to bug your management to see if you can get some paid 
circleci resources. it's game changing!

best,
kjellman

On Feb 27, 2018, at 12:12 PM, Dinesh Joshi 
<dinesh.jo...@yahoo.com.INVALID<mailto:dinesh.jo...@yahoo.com.INVALID>> wrote:

Some tests might require additional resources to spin up the required 
components. 2 CPU / 4GB might not be sufficient. You may need to bump up the 
resources to 8CPU / 16GB.
Dinesh

  On Tuesday, February 27, 2018, 11:24:34 AM PST, Dikang Gu 
<dikan...@gmail.com<mailto:dikan...@gmail.com>> wrote:

Looks like there are a few flaky/timeout unit tests in trunk, wondering is
there anyone looking at them already?

testBuildRange - org.apache.cassandra.db.view.ViewBuilderTaskTest
testUnloggedPartitionsPerBatch -
org.apache.cassandra.metrics.BatchMetricsTest
testViewBuilderResume - org.apache.cassandra.cql3.ViewTest

https://circleci.com/gh/DikangGu/cassandra/20

--
Dikang


-
To unsubscribe, e-mail: 
dev-unsubscr...@cassandra.apache.org<mailto:dev-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
dev-h...@cassandra.apache.org<mailto:dev-h...@cassandra.apache.org>

Re: Timeout unit tests in trunk

2018-02-27 Thread Michael Kjellman

hey dikang: just chatted a little bit about this. proposal: let's add the 
equivalent of @resource_intensive to unit tests too.. and the first one is to 
stop from running the MV unit tests in the free circleci containers. thoughts?

also, might want to bug your management to see if you can get some paid 
circleci resources. it's game changing!

best,
kjellman

> On Feb 27, 2018, at 12:12 PM, Dinesh Joshi  
> wrote:
> 
> Some tests might require additional resources to spin up the required 
> components. 2 CPU / 4GB might not be sufficient. You may need to bump up the 
> resources to 8CPU / 16GB.
> Dinesh 
> 
>On Tuesday, February 27, 2018, 11:24:34 AM PST, Dikang Gu 
>  wrote:  
> 
> Looks like there are a few flaky/timeout unit tests in trunk, wondering is
> there anyone looking at them already?
> 
> testBuildRange - org.apache.cassandra.db.view.ViewBuilderTaskTest
> testUnloggedPartitionsPerBatch -
> org.apache.cassandra.metrics.BatchMetricsTest
> testViewBuilderResume - org.apache.cassandra.cql3.ViewTest
> 
> https://circleci.com/gh/DikangGu/cassandra/20
> 
> -- 
> Dikang


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Michael Kjellman

it's an interesting idea. i'd wonder how much overhead you'd end up with 
message parsing and negate any potential GC wins. rick branson had played 
around a bunch with running storage nodes and doubling down on the old "fat 
client" model. if you had 1 tables (yes, barely works but we don't 
explicitly prevent it) you can't really run that many jvm processes on a single 
box.

> On Feb 22, 2018, at 12:39 PM, Carl Mueller  
> wrote:
> 
> GC pauses may have been improved in newer releases, since we are in 2.1.x,
> but I was wondering why cassandra uses one jvm for all tables and
> keyspaces, intermingling the heap for on-JVM objects.
> 
> ... so why doesn't cassandra spin off a jvm per table so each jvm can be
> tuned per table and gc tuned and gc impacts not impact other tables? It
> would probably increase the number of endpoints if we avoid having an
> overarching query router.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Michael Kjellman

Please do send them! There was a *lot* of really hard great work by a lot of 
people over the past year to significantly improve the documentation in tree.

http://cassandra.apache.org/doc/latest/
https://github.com/apache/cassandra/tree/trunk/doc

I still didn't see a reply from you re: my request for your jira information so 
i'm unable to follow what issues you're referring to as you haven't linked to 
any in your emails either. If you still see holes in the new and improved 
documentation above, _please_ do create tickets to track that so we can improve 
that asap! a fresh set of eyes on areas not covered is obviously welcomed; 
especially those with overlap with the links you're referring to in your email 
obviously.

best,
kjellman

On Feb 21, 2018, at 4:13 PM, Kenneth Brotman 
> wrote:

Jeff,

I already addressed everything you said.  Boy! Would I like to bring up the out 
of date articles on the web that trip people up and the lousy documentation on 
the Apache website but I can’t because a lot of folks don’t know me or why I’m 
saying these things.

I will be making another post that I hope clarifies what’s going on with me.  
After that I will either be a freakishly valuable asset to this community or I 
will be a freakishly valuable asset to another community.

You sure have a funny way of reigning in people that are used to helping out.  
You sure misjudged me.  Wow.

Kenneth Brotman

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, February 21, 2018 3:12 PM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!

On Wed, Feb 21, 2018 at 2:53 PM, Kenneth Brotman 
> wrote:

Hi Akash,

I get the part about outside work which is why in replying to Jeff Jirsa I was 
suggesting the big companies could justify taking it on easy enough and you 
know actually pay the people who would be working at it so those people could 
have a life.

The part I don't get is the aversion to usability.  Isn't that what you think 
about when you are coding?  "Am I making this thing I'm building easy to use?"  
If you were programming for me, we would be constantly talking about what we 
are building and how we can make things easier for users.  If I had to fight 
with a developer, architect or engineer about usability all the time, they 
would be gone and quick.  How do approach programming if you aren't trying to 
make things easy.

There's no aversion to usability, you're assuming things that just aren't true 
Nobody's against usability, we've just prioritized other things HIGHER. We make 
those decisions in part by looking at open JIRAs and determining what's asked 
for the most, what members of the community have contributed, and then balance 
that against what we ourselves care about. You're making a statement that it 
should be the top priority for the next release, with no JIRA, and history of 
contributing (and indeed, no real clear sign that you even understand the full 
extent of the database), no sign that you're willing to do the work yourself, 
and making a ton of assumptions about the level of effort and ROI.

I would love for Cassandra to be easier to use, I'm sure everyone does. There's 
a dozen features I'd love to add if I had infinite budget and infinite 
manpower. But what you're asking for is A LOT of effort and / or A LOT of 
money, and you're assuming someone's going to step up and foot the bill, but 
there's no real reason to believe that's the case.

In the mean time, everyone's spending hours replying to this thread that is 0% 
actionable. We would all have been objectively better off had everyone ignored 
this thread and just spent 10 minutes writing some section of the docs. So the 
next time I get the urge to reply, I'm just going to do that instead.

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Michael Kjellman

kenneth: could you please send your jira information? i'm unable to even find 
an account on http://issues.apache.org with your name despite multiple 
attempts. thanks!

best,
kjellman

> On Feb 21, 2018, at 2:20 PM, Kenneth Brotman  
> wrote:
> 
> Jon,
> 
> Very sorry that you don't see the value of the time I'm taking for this.  I 
> don't have demands; I do have a stern warning and I'm right Jon.  Please be 
> very careful not to mischaracterized my words Jon.
> 
> You suggest I put things in JIRA's, then seem to suggest that I'd be lucky if 
> anyone looked at it and did anything. That's what I figured too.  
> 
> I don't appreciate the hostility.  You will understand more fully in the next 
> post where I'm coming from.  Try to keep the conversation civilized.  I'm 
> trying or at least so you understand I think what I'm doing is saving your 
> gig and mine.  I really like a lot of people is this group.
> 
> I've come to a preliminary assessment on things.  Soon the cloud will clear 
> or I'll be gone.  Don't worry.  I'm a very peaceful person and like you I am 
> driven by real important projects that I feel compelled to work on for the 
> good of others.  I don't have time for people to hand hold a database and I 
> can't get stuck with my projects on the wrong stuff.  
> 
> Kenneth Brotman
> 
> 
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: u...@cassandra.apache.org
> Cc: dev@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
> 
> Ken,
> 
> Maybe it’s not clear how open source projects work, so let me try to explain. 
>  There’s a bunch of us who either get paid by someone or volunteer on our 
> free time.  The folks that get paid, (yay!) usually take direction on what 
> the priorities are, and work on projects that directly affect our jobs.  That 
> means that someone needs to care enough about the features you want to work 
> on them, if you’re not going to do it yourself. 
> 
> Now as others have said already, please put your list of demands in JIRA, if 
> someone is interested, they will work on it.  You may need to contribute a 
> little more than you’ve done already, be prepared to get involved if you 
> actually want to to see something get done.  Perhaps learning a little more 
> about Cassandra’s internals and the people involved will reveal some of the 
> design decisions and priorities of the project.  
> 
> Third, you seem to be a little obsessed with market share.  While market 
> share is fun to talk about, *most* of us that are working on and contributing 
> to Cassandra do so because it does actually solve a problem we have, and 
> solves it reasonably well.  If some magic open source DB appears out of no 
> where and does everything you want Cassandra to, and is bug free, keeps your 
> data consistent, automatically does backups, comes with really nice cert 
> management, ad hoc querying, amazing materialized views that are perfect, no 
> caveats to secondary indexes, and somehow still gives you linear scalability 
> without any mental overhead whatsoever then sure, people might start using 
> it.  And that’s actually OK, because if that happens we’ll all be incredibly 
> pumped out of our minds because we won’t have to work as hard.  If on the 
> slim chance that doesn’t manifest, those of us that use Cassandra and are 
> part of the community will keep working on the things we care about, 
> iterating, and improving things.  Maybe someone will even take a look at your 
> JIRA issues.  
> 
> Further filling the mailing list with your grievances will likely not help 
> you progress towards your goal of a Cassandra that’s easier to use, so I 
> encourage you to try to be a little more productive and try to help rather 
> than just complain, which is not constructive.  I did a quick search for your 
> name on the mailing list, and I’ve seen very little from you, so to 
> everyone’s who’s been around for a while and trying to help you it looks like 
> you’re just some random dude asking for people to work for free on the things 
> you’re asking for, without offering anything back in return.
> 
> Jon
> 
> 
>> On Feb 21, 2018, at 11:56 AM, Kenneth Brotman  
>> wrote:
>> 
>> Josh,
>> 
>> To say nothing is indifference.  If you care about your community, sometimes 
>> don't you have to bring up a subject even though you know it's also 
>> temporarily adding some discomfort?  
>> 
>> As to opening a JIRA, I've got a very specific topic to try in mind now.  An 
>> easy one I'll work on and then announce.  Someone else will have to do the 
>> coding.  A year from now I would probably just knock it out to make sure 
>> it's as easy as I expect it to be but to be honest, as I've been saying, I'm 
>> not set up to do that right now.  I've barely looked at any Cassandra code; 
>> for one;

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread Michael Kjellman

the things you are asking for are unfortunately not tiny effort. as you don’t 
seem to have the time to contribute code the best way you personally create 
change would be (again) to file individual jiras for each enhancement or 
feature request.

highlight key ones you filed via the mailing list that you’d personally like to 
see prioritized - and advocate to have resources allocated towards implementing 
and ultimately get those scheduled for a release over other ones.

best,
kjellman

> On Feb 18, 2018, at 11:07 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
> wrote:
> 
> Hi Michael, actually I do very much like the database.  thanks for the 
> thoughts... a few comments:
> 
> 1) Lots of big companies like, let's see, Apple is a big one, probably could 
> easily justify contributing resources to finish up the basic development of 
> Cassandra. 
> 2) There are lots of big companies using Cassandra.  Each could contribute a 
> tiny effort and everyone would benefit greatly.
> 3) A focused effort by a small group of talented people like there are in 
> this group could knock it out easily.
> 4) Not everyone is a Cassandra coder.  It's not for me to do Michael.
> 5) I'm an individual.  I am not working at a big company at the moment 
> Michael.  
> 
> Best,
> Kenneth Brotman
> 
> 
> -Original Message-
> From: Michael Kjellman [mailto:kjell...@apple.com] 
> Sent: Sunday, February 18, 2018 10:18 PM
> To: dev@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
> 
> hi ken, sorry you don’t like the database. some thoughts:
> 
> 1) please file actionable jiras for places you feel need to be improved in 
> the database... this is the best way to make and encourage the change you’re 
> looking for. it seems you have quite a few ideas from your post that could be 
> broken down into individual actionable jiras.
> 2) please don’t cross post between mailing lists.
> 3) pull requests are always welcomed!
> 
> best,
> kjellman
> 
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
>> wrote:
>> 
>> Cassandra feels like an unfinished program to me.  The problem is not 
>> that it's open source or cutting edge.  It's an open source cutting 
>> edge program that lacks some of its basic functionality.  We are all 
>> stuck addressing fundamental mechanical tasks for Cassandra because 
>> the basic code that would do that part has not been contributed yet.
>> 
>> Ease of use issues need to be given much more attention.  For an 
>> administrator, the ease of use of Cassandra is very poor.
>> 
>> Furthermore, currently Cassandra is an idiot.  We have to do 
>> everything for Cassandra. Contrast that with the fact that we are in 
>> the dawn of artificial intelligence.
>> 
>> Software exists to automate tasks for humans, not mechanize humans to 
>> administer tasks for a database.  I'm an engineering type.  My job is 
>> to apply science and technology to solve real world problems.  And 
>> that's where I need an organization's I.T. talent to focus; not in 
>> crank starting an unfinished database.
>> 
>> For example, I should be able to go to any node, replace the 
>> Cassandra.yaml file and have a prompt on the display ask me if I want 
>> to update all the yaml files across the cluster.  I shouldn't have to 
>> manually modify yaml files on each node or have to create a script for 
>> some third party automation tool to do it.
>> 
>> I should not have to turn off service, clear directories, restart 
>> service in coordination with the other nodes.  It's already a computer 
>> system.  It can do those things on its own.
>> 
>> How about read repair.  First there is something wrong with the name.  
>> Maybe it should be called Consistency Repair.  An administrator 
>> shouldn't have to do anything.  It should be a behavior of Cassandra 
>> that is programmed in. It should consider the GC setting of each node, 
>> calculate how often it has to run repair, when it should run it so all 
>> the nodes aren't trying at the same time and when other circumstances 
>> indicate it should also run it.
>> 
>> Certificate management should be automated.
>> 
>> Cluster wide management should be a big theme in any next major release.
>> What is a major release?  How many major releases could a program have 
>> before all the coding for basic stuff like installation, configuration 
>> and maintenance is included!
>> 
>> Finish the basic coding of Cassandra, make it easy to use for 
>> administrators, make is smart, a

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Michael Kjellman

hi ken, sorry you don’t like the database. some thoughts:

1) please file actionable jiras for places you feel need to be improved in the 
database... this is the best way to make and encourage the change you’re 
looking for. it seems you have quite a few ideas from your post that could be 
broken down into individual actionable jiras.
2) please don’t cross post between mailing lists.
3) pull requests are always welcomed!

best,
kjellman

> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
> wrote:
> 
> Cassandra feels like an unfinished program to me.  The problem is not that
> it's open source or cutting edge.  It's an open source cutting edge program
> that lacks some of its basic functionality.  We are all stuck addressing
> fundamental mechanical tasks for Cassandra because the basic code that would
> do that part has not been contributed yet.
> 
> Ease of use issues need to be given much more attention.  For an
> administrator, the ease of use of Cassandra is very poor.  
> 
> Furthermore, currently Cassandra is an idiot.  We have to do everything for
> Cassandra. Contrast that with the fact that we are in the dawn of artificial
> intelligence.
> 
> Software exists to automate tasks for humans, not mechanize humans to
> administer tasks for a database.  I'm an engineering type.  My job is to
> apply science and technology to solve real world problems.  And that's where
> I need an organization's I.T. talent to focus; not in crank starting an
> unfinished database.
> 
> For example, I should be able to go to any node, replace the Cassandra.yaml
> file and have a prompt on the display ask me if I want to update all the
> yaml files across the cluster.  I shouldn't have to manually modify yaml
> files on each node or have to create a script for some third party
> automation tool to do it.  
> 
> I should not have to turn off service, clear directories, restart service in
> coordination with the other nodes.  It's already a computer system.  It can
> do those things on its own.
> 
> How about read repair.  First there is something wrong with the name.  Maybe
> it should be called Consistency Repair.  An administrator shouldn't have to
> do anything.  It should be a behavior of Cassandra that is programmed in. It
> should consider the GC setting of each node, calculate how often it has to
> run repair, when it should run it so all the nodes aren't trying at the same
> time and when other circumstances indicate it should also run it.
> 
> Certificate management should be automated.
> 
> Cluster wide management should be a big theme in any next major release.
> What is a major release?  How many major releases could a program have
> before all the coding for basic stuff like installation, configuration and
> maintenance is included!
> 
> Finish the basic coding of Cassandra, make it easy to use for
> administrators, make is smart, add cluster wide management.  Keep Cassandra
> competitive or it will soon be the old Model T we all remember fondly.
> 
> I ask the Committee to compile a list of all such items, make a plan, and
> commit to including the completed and tested code as part of major release
> 5.0.  I further ask that release 4.0 not be delayed and then there be an
> unusually short skip to version 5.0. 
> 
> Kenneth Brotman
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman

No worries.. it looks like we didn't update the circleci config in tree for the 
cassandra-3.0 and cassandra-3.11 branches. We should do that -- i'll take that 
as an action item on me... for now though you can grab the same one from trunk:

CircleCI Configuration: 
https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml

FYI: kjellman/cassandra-test:0.4.3 is was built from the Docker file as 
committed here: 
https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile

best,
kjellman

On Feb 14, 2018, at 10:24 AM, Michael Shuler 
<mich...@pbandjelly.org<mailto:mich...@pbandjelly.org>> wrote:

I apologize for being naive. The test runs do not come from the in-tree
circle.yml file and I need to do something different?

On 02/14/2018 12:21 PM, Michael Kjellman wrote:
please use the latest circleci config that is in trunk. looking at the config 
you used in your run you’re using the old circleci 1.0 based config.

On Feb 14, 2018, at 10:16 AM, Michael Shuler 
<mich...@pbandjelly.org<mailto:mich...@pbandjelly.org>> wrote:

So far, I have had very unpredictable test runs from CirceCI and ASF
Jenkins. Commit 890f319 resulted in 4 completely different test failures
for me today in CircleCI.

https://circleci.com/gh/mshuler/cassandra/150

I trust results from the static ASF slaves even less.

`ant test-all -Dtest.name=CommitLogSegmentBackpressureTest` passed for
me locally.

¯\_(ツ)_/¯

I do not see any permissions issues in the output of this repeating test
failure from our internal Jenkins, but it appears that an instance of
Cassandra is left running from some other test, perhaps, and this test
cannot bind.

I'm fine with cutting a new release set from 890f319 and chalking this
up to test flakiness.

--
Kind regards,
Michael

On 02/14/2018 11:50 AM, Michael Kjellman wrote:
the tests are writing something to java.io.tmpdir which for whatever reason 
isn’t writable. the same exact tests work locally and in the cassandra-test 
docker image.

given everyone can partake on circleci runs and asf jenkins i think we should 
send out those links and base the vote off those runs.

On Feb 14, 2018, at 9:48 AM, Michael Shuler <mich...@pbandjelly.org> wrote:

This is an internal Jenkins instance that's not reachable on the internet.

What's the permissions issue? The test runs on this internal instance
are exactly the way cassci used to run - launch a scratch machine, check
out, build, run tests.

I'll see if I can repro locally.

Michael

On 02/14/2018 11:44 AM, Michael Kjellman wrote:
i looked at this a few weeks back. this is asf jenkins right? if so, it’s a 
permissions issue on the build executors

On Feb 14, 2018, at 9:40 AM, Michael Shuler <mich...@pbandjelly.org> wrote:

Thanks for the feedback on 3.0.16 tentative release.

Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
(in both standard and -compression suites) in CI for me. This test has
failed 20 times in the last 20 runs. Test output attached.

Do we wish to fix this before the next cut, since we're here? :)

--
Kind regards,
Michael

On 02/14/2018 07:30 AM, Jason Brown wrote:
I think we can attempt another build and vote now.

On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown <jasedbr...@gmail.com> wrote:

CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).

On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams <dri...@gmail.com>
wrote:

I change my vote to -1 binding as well.

On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown <jasedbr...@gmail.com>
wrote:

-1, binding. Unit tests are broken:
https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50

Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to update
some logging messages, which broke ViewComplexTest. The errors like
this:

junit.framework.AssertionFailedError: Expected error message to contain
'Cannot drop column a on base table with materialized views', but got
'Cannot drop column a on base table table_21 with materialized views.'

Dave has a followup commit, 40148a178bd9b74b731591aa46b4158efb16b742,
which
fixed a few of the errors, but there are four outstanding failures. I
created CASSANDRA-14219 last week, and assigned it to Dave, but he might
have missed the notification. Dinesh Joshi has a patch that I will
review
ASAP.

Michael, is there a link of where you ran the tests? If so, can you
include
it in the future [VOTE] emails?

Thanks,

-Jason



On Tue, Feb 13, 2018 at 11:03 AM, Jon Haddad <j...@jonhaddad.com> wrote:

+1

On Feb 13, 2018, at 10:52 AM, Josh McKenzie <jmcken...@apache.org>
wrote:

+1

On Feb 13, 2018 9:20 AM, "Marcus Eriksson" <krum...@gmail.com>
wrote:

+1

On Tue, Feb 13, 2018 at 1:29 PM, Aleksey Yeshchenko <
alek...@apple.com>
wrote:

+1

—
AY

On 12 February 2018 at 20:31:23, Michael Shuler (
mich...@pbandjelly.org)
wrote:

I propose the following artifacts fo

Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman

please use the latest circleci config that is in trunk. looking at the config 
you used in your run you’re using the old circleci 1.0 based config. 

> On Feb 14, 2018, at 10:16 AM, Michael Shuler <mich...@pbandjelly.org> wrote:
> 
> So far, I have had very unpredictable test runs from CirceCI and ASF
> Jenkins. Commit 890f319 resulted in 4 completely different test failures
> for me today in CircleCI.
> 
> https://circleci.com/gh/mshuler/cassandra/150
> 
> I trust results from the static ASF slaves even less.
> 
> `ant test-all -Dtest.name=CommitLogSegmentBackpressureTest` passed for
> me locally.
> 
>  ¯\_(ツ)_/¯
> 
> I do not see any permissions issues in the output of this repeating test
> failure from our internal Jenkins, but it appears that an instance of
> Cassandra is left running from some other test, perhaps, and this test
> cannot bind.
> 
> I'm fine with cutting a new release set from 890f319 and chalking this
> up to test flakiness.
> 
> -- 
> Kind regards,
> Michael
> 
>> On 02/14/2018 11:50 AM, Michael Kjellman wrote:
>> the tests are writing something to java.io.tmpdir which for whatever reason 
>> isn’t writable. the same exact tests work locally and in the cassandra-test 
>> docker image.
>> 
>> given everyone can partake on circleci runs and asf jenkins i think we 
>> should send out those links and base the vote off those runs.
>> 
>>> On Feb 14, 2018, at 9:48 AM, Michael Shuler <mich...@pbandjelly.org> wrote:
>>> 
>>> This is an internal Jenkins instance that's not reachable on the internet.
>>> 
>>> What's the permissions issue? The test runs on this internal instance
>>> are exactly the way cassci used to run - launch a scratch machine, check
>>> out, build, run tests.
>>> 
>>> I'll see if I can repro locally.
>>> 
>>> Michael
>>> 
>>>> On 02/14/2018 11:44 AM, Michael Kjellman wrote:
>>>> i looked at this a few weeks back. this is asf jenkins right? if so, it’s 
>>>> a permissions issue on the build executors 
>>>> 
>>>>> On Feb 14, 2018, at 9:40 AM, Michael Shuler <mich...@pbandjelly.org> 
>>>>> wrote:
>>>>> 
>>>>> Thanks for the feedback on 3.0.16 tentative release.
>>>>> 
>>>>> Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
>>>>> (in both standard and -compression suites) in CI for me. This test has
>>>>> failed 20 times in the last 20 runs. Test output attached.
>>>>> 
>>>>> Do we wish to fix this before the next cut, since we're here? :)
>>>>> 
>>>>> -- 
>>>>> Kind regards,
>>>>> Michael
>>>>> 
>>>>>> On 02/14/2018 07:30 AM, Jason Brown wrote:
>>>>>> I think we can attempt another build and vote now.
>>>>>> 
>>>>>>> On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown <jasedbr...@gmail.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
>>>>>>> workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).
>>>>>>> 
>>>>>>> On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams <dri...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I change my vote to -1 binding as well.
>>>>>>>> 
>>>>>>>> On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown <jasedbr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> -1, binding. Unit tests are broken:
>>>>>>>>> https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50
>>>>>>>>> 
>>>>>>>>> Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to 
>>>>>>>>> update
>>>>>>>>> some logging messages, which broke ViewComplexTest. The errors like
>>>>>>>> this:
>>>>>>>>> 
>>>>>>>>> junit.framework.AssertionFailedError: Expected error message to 
>>>>>>>>> contain
>>>>>>>>> 'Cannot drop column a on base table with materialized views', but got
>>>>>>>>> 'Cannot drop column a on base table table_21 with materialized views.'
>>>>>>>>> 
>>>>>&

Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman

the tests are writing something to java.io.tmpdir which for whatever reason 
isn’t writable. the same exact tests work locally and in the cassandra-test 
docker image.

given everyone can partake on circleci runs and asf jenkins i think we should 
send out those links and base the vote off those runs.

> On Feb 14, 2018, at 9:48 AM, Michael Shuler <mich...@pbandjelly.org> wrote:
> 
> This is an internal Jenkins instance that's not reachable on the internet.
> 
> What's the permissions issue? The test runs on this internal instance
> are exactly the way cassci used to run - launch a scratch machine, check
> out, build, run tests.
> 
> I'll see if I can repro locally.
> 
> Michael
> 
>> On 02/14/2018 11:44 AM, Michael Kjellman wrote:
>> i looked at this a few weeks back. this is asf jenkins right? if so, it’s a 
>> permissions issue on the build executors 
>> 
>>> On Feb 14, 2018, at 9:40 AM, Michael Shuler <mich...@pbandjelly.org> wrote:
>>> 
>>> Thanks for the feedback on 3.0.16 tentative release.
>>> 
>>> Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
>>> (in both standard and -compression suites) in CI for me. This test has
>>> failed 20 times in the last 20 runs. Test output attached.
>>> 
>>> Do we wish to fix this before the next cut, since we're here? :)
>>> 
>>> -- 
>>> Kind regards,
>>> Michael
>>> 
>>>> On 02/14/2018 07:30 AM, Jason Brown wrote:
>>>> I think we can attempt another build and vote now.
>>>> 
>>>>> On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown <jasedbr...@gmail.com> wrote:
>>>>> 
>>>>> CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
>>>>> workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).
>>>>> 
>>>>> On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams <dri...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I change my vote to -1 binding as well.
>>>>>> 
>>>>>> On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown <jasedbr...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> -1, binding. Unit tests are broken:
>>>>>>> https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50
>>>>>>> 
>>>>>>> Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to update
>>>>>>> some logging messages, which broke ViewComplexTest. The errors like
>>>>>> this:
>>>>>>> 
>>>>>>> junit.framework.AssertionFailedError: Expected error message to contain
>>>>>>> 'Cannot drop column a on base table with materialized views', but got
>>>>>>> 'Cannot drop column a on base table table_21 with materialized views.'
>>>>>>> 
>>>>>>> Dave has a followup commit, 40148a178bd9b74b731591aa46b4158efb16b742,
>>>>>>> which
>>>>>>> fixed a few of the errors, but there are four outstanding failures. I
>>>>>>> created CASSANDRA-14219 last week, and assigned it to Dave, but he might
>>>>>>> have missed the notification. Dinesh Joshi has a patch that I will
>>>>>> review
>>>>>>> ASAP.
>>>>>>> 
>>>>>>> Michael, is there a link of where you ran the tests? If so, can you
>>>>>> include
>>>>>>> it in the future [VOTE] emails?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> -Jason
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Feb 13, 2018 at 11:03 AM, Jon Haddad <j...@jonhaddad.com> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>>> On Feb 13, 2018, at 10:52 AM, Josh McKenzie <jmcken...@apache.org>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> On Feb 13, 2018 9:20 AM, "Marcus Eriksson" <krum...@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> +1
>>>>&g

Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman

i looked at this a few weeks back. this is asf jenkins right? if so, it’s a 
permissions issue on the build executors 

> On Feb 14, 2018, at 9:40 AM, Michael Shuler  wrote:
> 
> Thanks for the feedback on 3.0.16 tentative release.
> 
> Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
> (in both standard and -compression suites) in CI for me. This test has
> failed 20 times in the last 20 runs. Test output attached.
> 
> Do we wish to fix this before the next cut, since we're here? :)
> 
> -- 
> Kind regards,
> Michael
> 
>> On 02/14/2018 07:30 AM, Jason Brown wrote:
>> I think we can attempt another build and vote now.
>> 
>>> On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown  wrote:
>>> 
>>> CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
>>> workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).
>>> 
>>> On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams 
>>> wrote:
>>> 
 I change my vote to -1 binding as well.
 
 On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown 
 wrote:
 
> -1, binding. Unit tests are broken:
> https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50
> 
> Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to update
> some logging messages, which broke ViewComplexTest. The errors like
 this:
> 
> junit.framework.AssertionFailedError: Expected error message to contain
> 'Cannot drop column a on base table with materialized views', but got
> 'Cannot drop column a on base table table_21 with materialized views.'
> 
> Dave has a followup commit, 40148a178bd9b74b731591aa46b4158efb16b742,
> which
> fixed a few of the errors, but there are four outstanding failures. I
> created CASSANDRA-14219 last week, and assigned it to Dave, but he might
> have missed the notification. Dinesh Joshi has a patch that I will
 review
> ASAP.
> 
> Michael, is there a link of where you ran the tests? If so, can you
 include
> it in the future [VOTE] emails?
> 
> Thanks,
> 
> -Jason
> 
> 
> 
>> On Tue, Feb 13, 2018 at 11:03 AM, Jon Haddad  wrote:
>> 
>> +1
>> 
>>> On Feb 13, 2018, at 10:52 AM, Josh McKenzie 
>> wrote:
>>> 
>>> +1
>>> 
>>> On Feb 13, 2018 9:20 AM, "Marcus Eriksson" 
 wrote:
>>> 
 +1
 
 On Tue, Feb 13, 2018 at 1:29 PM, Aleksey Yeshchenko <
> alek...@apple.com>
 wrote:
 
> +1
> 
> —
> AY
> 
> On 12 February 2018 at 20:31:23, Michael Shuler (
>> mich...@pbandjelly.org)
> wrote:
> 
> I propose the following artifacts for release as 3.0.16.
> 
> sha1: 91e83c72de109521074b14a8eeae1309c3b1f215
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.16-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1154/org/apache/cassandra/apache-
> cassandra/3.0.16/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1154/
> 
> Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
> 
> *** This release addresses an important fix for CASSANDRA-14092
 ***
> "Max ttl of 20 years will overflow localDeletionTime"
> https://issues.apache.org/jira/browse/CASSANDRA-14092
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> 
> 
 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 
> 
 
>>> 
>>> 
>> 
> 
> <3.0-890f319-testall-failure.txt>
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Michael Kjellman

why are people inserting data with a 15+ year TTL? sorta curious about the 
actual use case for that.

> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> 
> The assertion was working fine until yesterday 03:14 UTC.
> 
> The long term solution would be to work with a long instead of a int. The
> serialized seems to be a variable-int already, so that should be fine
> already.
> 
> If you change the assertion to 15 years, then applications might fail, as
> they might be setting a 15+ year ttl.
> 
> regards,
> Christian
> 
> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
> wrote:
> 
>> Thanks for raising this. Agreed this is bad, when I filed
>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>> overflows (as it is with 2.1), but that doesn't seem to be the case on
>> 3.0+
>> 
>> I propose adding the assertion back so writes will fail, and reduce
>> the max TTL to something like 15 years for the time being while we
>> figure a long term solution.
>> 
>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
>>> If you aren’t getting an error, then I agree, that is very bad.  Looking
>> at the 3.0 code it looks like the assertion checking for overflow was
>> dropped somewhere along the way, I had only been looking into 2.1 where you
>> get an assertion error that fails the query.
>>> 
>>> -Jeremiah
>>> 
 On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> wrote:
 
 
 Hi Jeremiah,
 Validation is on TTL value not on (system_time+ TTL). You can test it
>> with below example. Insert is successful, overflow happens silently and
>> data is lost:
 create table test(name text primary key,age int);
 insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
 select * from test where name='test_20yrs';
 
 name | age
 --+-
 
 (0 rows)
 
 insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="ttl is too large. requested (630720001) maximum (63072)"
 ThanksAnuj
   On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>> jeremiah.jor...@gmail.com> wrote:
 
 Where is the dataloss?  Does the INSERT operation return successfully
>> to the client in this case?  From reading the linked issues it sounds like
>> you get an error client side.
 
 -Jeremiah
 
> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
>> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for inserting/updating
>> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
>> can silently cause irrecoverable Data Loss. This seems like a certain TOP
>> MOST BLOCKER to me. I think the category of the JIRA must be raised to
>> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
>> one seems to be actively working on it. Just like any other critical
>> vulnerability, this vulnerability demands immediate attention from some
>> very experienced folks to bring out an Urgent Fast Track Patch for all
>> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
>> understanding of the JIRA comments, the changes may not be that trivial for
>> older releases. So, community support on the patch is very much appreciated.
> 
> Thanks
> Anuj
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>>

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-03 Thread Michael Kjellman

no, i’m not. i just figured i should target python 3.6 if i was doing this work 
in the first place. the current Ubuntu LTS was pulling in a pretty old version. 
any concerns with using 3.6?

> On Jan 3, 2018, at 1:51 AM, Stefan Podkowinski <s...@apache.org> wrote:
> 
> The latest updates to your branch fixed the logging issue, thanks! Tests
> now seem to execute fine locally using pytest.
> 
> I was looking at the dockerfile and noticed that you explicitly use
> python 3.6 there. Are you aware of any issues with older python3
> versions, e.g. 3.5? Do I have to use 3.6 as well locally and do we have
> to do the same for jenkins?
> 
> 
>> On 02.01.2018 22:42, Michael Kjellman wrote:
>> I reproduced the NOTSET log issue locally... got a fix.. i'll push a commit 
>> up in a moment.
>> 
>>> On Jan 2, 2018, at 11:24 AM, Michael Kjellman 
>>> <mkjell...@internalcircle.com> wrote:
>>> 
>>> Comments Inline: Thanks for giving this a go!!
>>> 
>>>> On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski <s...@apache.org> wrote:
>>>> 
>>>> I was giving this a try today with some mixed results. First of all,
>>>> running pytest locally would fail with an "ccmlib.common.ArgumentError:
>>>> Unknown log level NOTSET" error for each test. Although I created a new
>>>> virtualenv for that as described in the readme (thanks for updating!)
>>>> and use both of your dtest and cassandra branches. But I haven't patched
>>>> ccm as described in the ticket, maybe that's why? Can you publish a
>>>> patched ccm branch to gh?
>>> 
>>> 99% sure this is an issue parsing the logging level passed to pytest to the 
>>> python logger... could you paste the exact command you're using to invoke 
>>> pytest? should be a small change - i'm sure i just missed a invocation case.
>>> 
>>>> 
>>>> The updated circle.yml is now using docker, which seems to be a good
>>>> idea to reduce clutter in the yaml file and gives us more control over
>>>> the test environment. Can you add the Dockerfile to the .circleci
>>>> directory as well? I couldn't find it when I was trying to solve the
>>>> pytest error mentioned above.
>>> 
>>> This is already tracked in a separate repo: 
>>> https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile
>>>> 
>>>> Next thing I did was to push your trunk_circle branch to my gh repo to
>>>> start a circleCI run. Finishing all dtests in 15 minutes sounds
>>>> exciting, but requires a paid tier plan to get that kind of
>>>> parallelization. Looks like the dtests have even been deliberately
>>>> disabled for non-paid accounts, so I couldn't test this any further.
>>> 
>>> the plan of action (i already already mentioned this in previous emails) is 
>>> to get dtests working for the free circieci oss accounts as well. part of 
>>> this work (already included in this pytest effort) is to have fixtures that 
>>> look at the system resources and dynamically include tests as possible.
>>> 
>>>> 
>>>> Running dtests from the pytest branch on builds.apache.org did not work
>>>> either. At least the run_dtests.py arguments will need to be updated in
>>>> cassandra-builds. We currently only use a single cassandra-dtest.sh
>>>> script for all builds. Maybe we should create a new job template that
>>>> would use an updated script with the wip-pytest dtest branch, to make
>>>> this work and testable in parallel.
>>> 
>>> yes, i didn't touch cassandra-builds yet.. focused on getting circleci and 
>>> local runs working first... once we're happy with that and stable we can 
>>> make the changes to jenkins configs pretty easily...
>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 21.12.2017 11:13, Michael Kjellman wrote:
>>>>> I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 
>>>>> which includes tons of details (and a patch available for review) with my 
>>>>> efforts to migrate dtests from nosetest to pytest (which ultimately ended 
>>>>> up also including porting the ode from python 2.7 to python 3).
>>>>> 
>>>>> I'd love if people could pitch in in any way to help get this reviewed 
>>>>> and committed so we can reduce the natural drift that will occur with a 
>>>>> huge patch like this against the changes going into master.

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-02 Thread Michael Kjellman

I reproduced the NOTSET log issue locally... got a fix.. i'll push a commit up 
in a moment.

> On Jan 2, 2018, at 11:24 AM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> Comments Inline: Thanks for giving this a go!!
> 
>> On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski <s...@apache.org> wrote:
>> 
>> I was giving this a try today with some mixed results. First of all,
>> running pytest locally would fail with an "ccmlib.common.ArgumentError:
>> Unknown log level NOTSET" error for each test. Although I created a new
>> virtualenv for that as described in the readme (thanks for updating!)
>> and use both of your dtest and cassandra branches. But I haven't patched
>> ccm as described in the ticket, maybe that's why? Can you publish a
>> patched ccm branch to gh?
> 
> 99% sure this is an issue parsing the logging level passed to pytest to the 
> python logger... could you paste the exact command you're using to invoke 
> pytest? should be a small change - i'm sure i just missed a invocation case.
> 
>> 
>> The updated circle.yml is now using docker, which seems to be a good
>> idea to reduce clutter in the yaml file and gives us more control over
>> the test environment. Can you add the Dockerfile to the .circleci
>> directory as well? I couldn't find it when I was trying to solve the
>> pytest error mentioned above.
> 
> This is already tracked in a separate repo: 
> https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile
>> 
>> Next thing I did was to push your trunk_circle branch to my gh repo to
>> start a circleCI run. Finishing all dtests in 15 minutes sounds
>> exciting, but requires a paid tier plan to get that kind of
>> parallelization. Looks like the dtests have even been deliberately
>> disabled for non-paid accounts, so I couldn't test this any further.
> 
> the plan of action (i already already mentioned this in previous emails) is 
> to get dtests working for the free circieci oss accounts as well. part of 
> this work (already included in this pytest effort) is to have fixtures that 
> look at the system resources and dynamically include tests as possible.
> 
>> 
>> Running dtests from the pytest branch on builds.apache.org did not work
>> either. At least the run_dtests.py arguments will need to be updated in
>> cassandra-builds. We currently only use a single cassandra-dtest.sh
>> script for all builds. Maybe we should create a new job template that
>> would use an updated script with the wip-pytest dtest branch, to make
>> this work and testable in parallel.
> 
> yes, i didn't touch cassandra-builds yet.. focused on getting circleci and 
> local runs working first... once we're happy with that and stable we can make 
> the changes to jenkins configs pretty easily...
> 
>> 
>> 
>> 
>> On 21.12.2017 11:13, Michael Kjellman wrote:
>>> I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 which 
>>> includes tons of details (and a patch available for review) with my efforts 
>>> to migrate dtests from nosetest to pytest (which ultimately ended up also 
>>> including porting the ode from python 2.7 to python 3).
>>> 
>>> I'd love if people could pitch in in any way to help get this reviewed and 
>>> committed so we can reduce the natural drift that will occur with a huge 
>>> patch like this against the changes going into master. I apologize for 
>>> sending this so close to the holidays, but I really have been working 
>>> non-stop trying to get things into a completed and stable state.
>>> 
>>> The latest CircleCI runs I did took roughly 15 minutes to run all the 
>>> dtests with only 6 failures remaining (when run with vnodes) and 12 
>>> failures remaining (when run without vnodes). For comparison the last ASF 
>>> Jenkins Dtest job to successfully complete took nearly 10 hours (9:51) and 
>>> we had 36 test failures. Of note, while I was working on this and trying to 
>>> determine a baseline for the existing tests I found that the ASF Jenkins 
>>> jobs were incorrectly configured due to a typo. The no-vnodes job is 
>>> actually running with vnodes (meaning the no-vnodes job is identical to the 
>>> with-vnodes ASF Jenkins job). There are some bootstrap tests that will 100% 
>>> reliably hang both nosetest and pytest on test cleanup, however this test 
>>> only runs in the no-vnodes configuration. I've debugged and fixed a lot of 
>>> these cases across many test cases over the past few weeks and I no longer 
>>> know of any tests that

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-02 Thread Michael Kjellman

Comments Inline: Thanks for giving this a go!!

> On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski <s...@apache.org> wrote:
> 
> I was giving this a try today with some mixed results. First of all,
> running pytest locally would fail with an "ccmlib.common.ArgumentError:
> Unknown log level NOTSET" error for each test. Although I created a new
> virtualenv for that as described in the readme (thanks for updating!)
> and use both of your dtest and cassandra branches. But I haven't patched
> ccm as described in the ticket, maybe that's why? Can you publish a
> patched ccm branch to gh?

99% sure this is an issue parsing the logging level passed to pytest to the 
python logger... could you paste the exact command you're using to invoke 
pytest? should be a small change - i'm sure i just missed a invocation case.

> 
> The updated circle.yml is now using docker, which seems to be a good
> idea to reduce clutter in the yaml file and gives us more control over
> the test environment. Can you add the Dockerfile to the .circleci
> directory as well? I couldn't find it when I was trying to solve the
> pytest error mentioned above.

This is already tracked in a separate repo: 
https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile
> 
> Next thing I did was to push your trunk_circle branch to my gh repo to
> start a circleCI run. Finishing all dtests in 15 minutes sounds
> exciting, but requires a paid tier plan to get that kind of
> parallelization. Looks like the dtests have even been deliberately
> disabled for non-paid accounts, so I couldn't test this any further.

the plan of action (i already already mentioned this in previous emails) is to 
get dtests working for the free circieci oss accounts as well. part of this 
work (already included in this pytest effort) is to have fixtures that look at 
the system resources and dynamically include tests as possible.

> 
> Running dtests from the pytest branch on builds.apache.org did not work
> either. At least the run_dtests.py arguments will need to be updated in
> cassandra-builds. We currently only use a single cassandra-dtest.sh
> script for all builds. Maybe we should create a new job template that
> would use an updated script with the wip-pytest dtest branch, to make
> this work and testable in parallel.

yes, i didn't touch cassandra-builds yet.. focused on getting circleci and 
local runs working first... once we're happy with that and stable we can make 
the changes to jenkins configs pretty easily...

> 
> 
> 
> On 21.12.2017 11:13, Michael Kjellman wrote:
>> I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 which 
>> includes tons of details (and a patch available for review) with my efforts 
>> to migrate dtests from nosetest to pytest (which ultimately ended up also 
>> including porting the ode from python 2.7 to python 3).
>> 
>> I'd love if people could pitch in in any way to help get this reviewed and 
>> committed so we can reduce the natural drift that will occur with a huge 
>> patch like this against the changes going into master. I apologize for 
>> sending this so close to the holidays, but I really have been working 
>> non-stop trying to get things into a completed and stable state.
>> 
>> The latest CircleCI runs I did took roughly 15 minutes to run all the dtests 
>> with only 6 failures remaining (when run with vnodes) and 12 failures 
>> remaining (when run without vnodes). For comparison the last ASF Jenkins 
>> Dtest job to successfully complete took nearly 10 hours (9:51) and we had 36 
>> test failures. Of note, while I was working on this and trying to determine 
>> a baseline for the existing tests I found that the ASF Jenkins jobs were 
>> incorrectly configured due to a typo. The no-vnodes job is actually running 
>> with vnodes (meaning the no-vnodes job is identical to the with-vnodes ASF 
>> Jenkins job). There are some bootstrap tests that will 100% reliably hang 
>> both nosetest and pytest on test cleanup, however this test only runs in the 
>> no-vnodes configuration. I've debugged and fixed a lot of these cases across 
>> many test cases over the past few weeks and I no longer know of any tests 
>> that can hang CI.
>> 
>> Thanks and I'm optimistic about making testing great for the project and 
>> most importantly for the OSS C* community!
>> 
>> best,
>> kjellman
>> 
>> Some highlights that I quickly thought of (in no particular order): {also 
>> included in the JIRA}
>> -Migrate dtests from executing using the nosetest framework to pytest
>> -Port the entire code base from Python 2.7 to Python 3.6
>> -Update run_dtests.py to work with pytest
>> -Ad

Re: Test patch to Cassandra.3.0.15 using dtests

2017-12-21 Thread Michael Kjellman

hi sergey:

took much longer than i hoped but i have a patch up for review to hopefully 
improve the dtest user experience.

https://issues.apache.org/jira/browse/CASSANDRA-14134

i sent an email summarizing it earlier this morning in a separate thread.

i moved all the undocumented environment variables to command line arguments 
and added help strings for all of them.

there is now some very basic environmental validation that happens up front:

either —cassandra-dir or —cassandra-version are required command line 
arguments. if you invoke with —help i even added a note that “ant clean jar” is 
required to be run before hand on the cassandra dir —cassandra-dir is pointed 
at.

in addition:
-if you’re running on mac i check that the required loopback interfaces have 
been created (if not print an error message with the command to run and create 
the other loopback interfaces).
-upgrade tests aren’t invoked/collected by default
-for tests marked with the “resource_intensive” annotation (these tests tend to 
invoke ccm by populating the test cluster with 9 instances requiring a good 
chunk of ram). instead of making the user know if they should run these or not 
i do a quick check at runtime to determine the amount of ram available on the 
system and dynamically enable or disable the resource_intensive annotated 
tests! (there are command line arguments of course to explicitly override this 
behavior if required for some reason).
-i added a bunch of extra documentation to README.md with the hope that it’s 
the first thing people see on GitHub and more likely to be read (how to start 
the tests, bootstrap and setup the required dependencies, and some tips on 
debugging tests)

curious on your thoughts from a user perspective of how a these improvements 
will help someone like yourself who recently tried to test your patch against 
the dtests? any other areas i didn’t address yet that would make getting 
bootstrapped better? hopefully we can shortly get a updated and greatly 
enhanced circleci 2.0 yaml in the upstream repo that reliably will let even the 
most casual contributor make a change and run the unit and dtests against their 
branch in circleci (using just a free circleci OSS account) without any end 
user effort!

best,
kjellman

On Dec 13, 2017, at 12:28 PM, Michael Kjellman 
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

i’ve been working on a story to improve this around the clock. including better 
documentation (and a —help flag with options to make it easy to know how to run 
dtests and a few runtime sanity checks about the environment)! stay tuned!

On Dec 13, 2017, at 3:24 AM, Sergey 
<cassandra.bu...@gmail.com<mailto:cassandra.bu...@gmail.com>> wrote:

Hi!

I am looking for a way to test the patch I made to specific version of 
Cassandra (3.0.15) by leveraging the dtest.

Documentation for the dtests says:
“The only thing the framework needs to know is the location of the (compiled) 
sources for Cassandra. There are two options:

Use existing sources:

CASSANDRA_DIR=~/path/to/cassandra nosetests”

So if I git clone the Cassandra sources to ~/path/to/Cassandra, switch to tag 
3.0.15, apply my patch and run ant – will this be enough for my purpose?

Best regards,
Sergey

Re: Cassandra Dtests: skip upgrade tests

2017-12-21 Thread Michael Kjellman

As part of the work i did for 
https://issues.apache.org/jira/browse/CASSANDRA-14134, one of the things I did 
was add a new command line argument “--execute-upgrade-tests”.

all the upgrade tests are now annotated with an upgrade_test pytest annotation. 
by default they aren’t run. adding a single flag (easily discoverable in the 
—help) will turn them on if necessary. or you can use the power features of 
pytest collection filtering when invoking pytest directly (look at the -m 
option).

hope this helps going forward!!

best,
kjellman

On Dec 8, 2017, at 2:03 PM, Michael Shuler 
> wrote:

Yep, that rm is a bit of a hack, since environment vars for
JDK{8,9}_HOME are not able to be set on the static slaves. The "proper"
way to skip them is just a normal nose exclude (drop --collect-only to
actually run 'em):

./run_dtests.py --nose-options="--collect-only -e upgrade_tests/"
 or
nosetests --collect-only -e upgrade_tests/

Also, to run only the upgrade_tests, since we're here :)

./run_dtests.py --nose-options="--collect-only upgrade_tests/"
 or
nosetests --collect-only upgrade_tests/

--
Michael

On 12/08/2017 12:07 PM, Jay Zhuang wrote:
Here is how cassandra-builds jenkins job do:$ rm -r upgrade_tests/
https://github.com/apache/cassandra-builds/blob/master/build-scripts/cassandra-dtest.sh#L50

   On Friday, December 8, 2017, 1:28:34 AM PST, Sergey 
> wrote:

Hi!

How to completely skip upgrade tests when running dtests?

Best regards,
Sergey

-
To unsubscribe, e-mail: 
dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
dev-h...@cassandra.apache.org

[Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2017-12-21 Thread Michael Kjellman

I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 which 
includes tons of details (and a patch available for review) with my efforts to 
migrate dtests from nosetest to pytest (which ultimately ended up also 
including porting the ode from python 2.7 to python 3).

I'd love if people could pitch in in any way to help get this reviewed and 
committed so we can reduce the natural drift that will occur with a huge patch 
like this against the changes going into master. I apologize for sending this 
so close to the holidays, but I really have been working non-stop trying to get 
things into a completed and stable state.

The latest CircleCI runs I did took roughly 15 minutes to run all the dtests 
with only 6 failures remaining (when run with vnodes) and 12 failures remaining 
(when run without vnodes). For comparison the last ASF Jenkins Dtest job to 
successfully complete took nearly 10 hours (9:51) and we had 36 test failures. 
Of note, while I was working on this and trying to determine a baseline for the 
existing tests I found that the ASF Jenkins jobs were incorrectly configured 
due to a typo. The no-vnodes job is actually running with vnodes (meaning the 
no-vnodes job is identical to the with-vnodes ASF Jenkins job). There are some 
bootstrap tests that will 100% reliably hang both nosetest and pytest on test 
cleanup, however this test only runs in the no-vnodes configuration. I've 
debugged and fixed a lot of these cases across many test cases over the past 
few weeks and I no longer know of any tests that can hang CI.

Thanks and I'm optimistic about making testing great for the project and most 
importantly for the OSS C* community!

best,
kjellman

Some highlights that I quickly thought of (in no particular order): {also 
included in the JIRA}
-Migrate dtests from executing using the nosetest framework to pytest
-Port the entire code base from Python 2.7 to Python 3.6
-Update run_dtests.py to work with pytest
-Add --dtest-print-tests-only option to run_dtests.py to get easily parsable 
list of all available collected tests
-Update README.md for executing the dtests with pytest
-Add new debugging tips section to README.md to help with some basics of 
debugging python3 and pytest
-Migrate all existing Enviornment Variable usage as a means to control dtest 
operation modes to argparse command line options with documented help on each 
toggles intended usage
-Migration of old unitTest and nose based test structure to modern pytest 
fixture approach
-Automatic detection of physical system resources to automatically determine if 
@pytest.mark.resource_intensive annotated tests should be collected and run on 
the system where they are being executed
-new pytest fixture replacements for @since and @pytest.mark.upgrade_test 
annotations
-Migration to python logging framework
-Upgrade thrift bindings to latest version with full python3 compatibility
-Remove deprecated cql and pycassa dependencies and migrate any remaining tests 
to fully remove those dependencies
-Fixed dozens of tests that would hang the pytest framework forever when run in 
CI enviornments
-Ran code nearly 300 times in CircleCI during the migration and to find, 
identify, and fix any tests capable of hanging CI
-Upgrade Tests do not yet run in CI and still need additional migration work 
(although all upgrade test classes compile successfully)

Re: Test patch to Cassandra.3.0.15 using dtests

2017-12-13 Thread Michael Kjellman

i’ve been working on a story to improve this around the clock. including better 
documentation (and a —help flag with options to make it easy to know how to run 
dtests and a few runtime sanity checks about the environment)! stay tuned!

> On Dec 13, 2017, at 3:24 AM, Sergey  wrote:
> 
> Hi!
> 
> I am looking for a way to test the patch I made to specific version of 
> Cassandra (3.0.15) by leveraging the dtest.
> 
> Documentation for the dtests says:
> “The only thing the framework needs to know is the location of the (compiled) 
> sources for Cassandra. There are two options:
> 
> Use existing sources:
> 
> CASSANDRA_DIR=~/path/to/cassandra nosetests”
> 
> So if I git clone the Cassandra sources to ~/path/to/Cassandra, switch to tag 
> 3.0.15, apply my patch and run ant – will this be enough for my purpose?
> 
> Best regards,
> Sergey

Re: CCM dependency in dtests

2017-11-30 Thread Michael Kjellman

Hey Stefan, any updates on this? Thanks.

best,
kjellman

> On Nov 27, 2017, at 7:34 AM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> thanks for driving this Stefan this is definitely an issue that I 
> recently saw too trying to get all the dtests passing. having logic you need 
> to fix in 3 repos isn’t ideal at all. 
> 
>> On Nov 27, 2017, at 4:05 AM, Stefan Podkowinski <s...@apache.org> wrote:
>> 
>> Just wanted to bring a recent discussion about how to use ccm from
>> dtests to your attention:
>> https://github.com/apache/cassandra-dtest/pull/13
>> 
>> Basically the idea is to not depend on a released ccm artifact, but to
>> use a dedicated git branch in the ccm repo instead for executing dtests.
>> Motivation and details can be found in the PR, please feel free to comment.
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: [PROPOSAL] Migrate to pytest from nosetests for dtests

2017-11-29 Thread Michael Kjellman

s/handling/hanging

> On Nov 29, 2017, at 9:54 AM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> i keep seeing nose randomly handing after a test successfully completes 
> execution. i’m very far from a python guru but i spent a few hours with gdb 
> trying to debug the thing and get python stacks and got symbolicated native 
> stacks but it’s random but root causing while nose is sitting on a lock 
> forever alludes me. some tests are more reproducible than others. some i see 
> fail 1 in 10 runs.
> 
> the net of it all though is this makes people not trust dtests because it 
> randomly hangs and shows tests with “failures” that actually succeeded.
> 
> i’m not a huge fan of just blindly upgrading to fix a problem but in this 
> case I found that there is quite a lot of mistrust and dislike for nosetests 
> in the python community with most projects already moving to pytest. and if 
> it is some complicated set of interactions between threads we use in the 
> tests and how nose works do we really want to even debug it when the project 
> appears to be abandoned?
> 
> i think regardless of the root cause for making things more stable it seems 
> like there is little motivation to stick around on nose...
> 
> lmk!
> 
> best,
> kjellman
> 
>> On Nov 29, 2017, at 5:33 AM, Philip Thompson <philip.thomp...@datastax.com> 
>> wrote:
>> 
>> I don't have any objection to this, really. I know I rely on a handful of
>> nose plugins, and possibly others do, but those should be easy enough to
>> re-write. I am curious though, what's the impetus for this? Is there some
>> pytest feature we want that nose lacks? Is there some nosetest bug or
>> restriction getting in the way?
>> 
>>> On Tue, Nov 28, 2017 at 8:34 PM, Jon Haddad <j...@jonhaddad.com> wrote:
>>> 
>>> +1
>>> 
>>> I stopped using nose a long time ago in favor of py.test.  It’s a
>>> significant improvement.
>>> 
>>>> On Nov 28, 2017, at 10:49 AM, Michael Kjellman <kjell...@apple.com>
>>> wrote:
>>>> 
>>>> I'd like to propose we move from nosetest to pytest for the dtests. It
>>> looks like nosetests is basically abandoned, the python community doesn't
>>> like it, it hasn't been updated since 2015, and pytest even has nosetests
>>> support which would help us greatly during migration (
>>> https://docs.pytest.org/en/latest/nose.html).
>>>> 
>>>> Thoughts?
>>>> 
>>>> best,
>>>> kjellman
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: [PROPOSAL] Migrate to pytest from nosetests for dtests

2017-11-29 Thread Michael Kjellman

i keep seeing nose randomly handing after a test successfully completes 
execution. i’m very far from a python guru but i spent a few hours with gdb 
trying to debug the thing and get python stacks and got symbolicated native 
stacks but it’s random but root causing while nose is sitting on a lock forever 
alludes me. some tests are more reproducible than others. some i see fail 1 in 
10 runs.

the net of it all though is this makes people not trust dtests because it 
randomly hangs and shows tests with “failures” that actually succeeded.

i’m not a huge fan of just blindly upgrading to fix a problem but in this case 
I found that there is quite a lot of mistrust and dislike for nosetests in the 
python community with most projects already moving to pytest. and if it is some 
complicated set of interactions between threads we use in the tests and how 
nose works do we really want to even debug it when the project appears to be 
abandoned?

i think regardless of the root cause for making things more stable it seems 
like there is little motivation to stick around on nose...

lmk!

best,
kjellman

> On Nov 29, 2017, at 5:33 AM, Philip Thompson <philip.thomp...@datastax.com> 
> wrote:
> 
> I don't have any objection to this, really. I know I rely on a handful of
> nose plugins, and possibly others do, but those should be easy enough to
> re-write. I am curious though, what's the impetus for this? Is there some
> pytest feature we want that nose lacks? Is there some nosetest bug or
> restriction getting in the way?
> 
>> On Tue, Nov 28, 2017 at 8:34 PM, Jon Haddad <j...@jonhaddad.com> wrote:
>> 
>> +1
>> 
>> I stopped using nose a long time ago in favor of py.test.  It’s a
>> significant improvement.
>> 
>>> On Nov 28, 2017, at 10:49 AM, Michael Kjellman <kjell...@apple.com>
>> wrote:
>>> 
>>> I'd like to propose we move from nosetest to pytest for the dtests. It
>> looks like nosetests is basically abandoned, the python community doesn't
>> like it, it hasn't been updated since 2015, and pytest even has nosetests
>> support which would help us greatly during migration (
>> https://docs.pytest.org/en/latest/nose.html).
>>> 
>>> Thoughts?
>>> 
>>> best,
>>> kjellman
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

[PROPOSAL] Migrate to pytest from nosetests for dtests

2017-11-28 Thread Michael Kjellman

I'd like to propose we move from nosetest to pytest for the dtests. It looks 
like nosetests is basically abandoned, the python community doesn't like it, it 
hasn't been updated since 2015, and pytest even has nosetests support which 
would help us greatly during migration 
(https://docs.pytest.org/en/latest/nose.html).

Thoughts?

best,
kjellman

Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman

do you know why this is the case? shouldn’t -all test...all?

> On Nov 27, 2017, at 7:39 PM, Michael Shuler <mich...@pbandjelly.org> wrote:
> 
> The `test-cdc` target is not a dependent of `test-all`, so it was set up
> as a separate job in Jenkins:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-3.11-test-cdc/
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test-cdc/
> 
> -- 
> Michael
> 
>> On 11/27/2017 03:45 PM, Michael Kjellman wrote:
>> Hey Jay:
>> 
>> Thanks!! I just took a quick look at the JIRA and noticed that there is a 
>> “test-cdc” ant target? So, does that mean CDC get’s no testing with ant 
>> test? Do you know any of the history around this?
>> 
>>> On Nov 27, 2017, at 9:44 AM, Jay Zhuang <jay.zhu...@yahoo.com.INVALID> 
>>> wrote:
>>> 
>>> I fixed one CDC uTest, please 
>>> review:https://issues.apache.org/jira/browse/CASSANDRA-14066
>>> 
>>> 
>>>   On Friday, November 17, 2017 6:34 AM, Josh McKenzie 
>>> <jmcken...@apache.org> wrote:
>>> 
>>> 
>>>> 
>>>> Do we have any volunteers to fix the broken Materialized Views and CDC
>>>> DTests?
>>> 
>>> I'll try to take a look at the CDC tests next week; looks like one of the
>>> base unit tests is failing as well.
>>> 
>>> On Fri, Nov 17, 2017 at 12:09 AM, Michael Kjellman <
>>> mkjell...@internalcircle.com> wrote:
>>> 
>>>> Quick update re: dtests and off-heap memtables:
>>>> 
>>>> I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException:
>>>> offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)
>>>> 
>>>> Looks like we’re gonna need to do some work to test this configuration and
>>>> right now it’s pretty broken...
>>>> 
>>>> Do we have any volunteers to fix the broken Materialized Views and CDC
>>>> DTests?
>>>> 
>>>> best,
>>>> kjellman
>>>> 
>>>> 
>>>>> On Nov 15, 2017, at 5:59 PM, Michael Kjellman <
>>>> mkjell...@internalcircle.com> wrote:
>>>>> 
>>>>> yes - true- some are flaky, but almost all of the ones i filed fail 100%
>>>> () of the time. i look forward to triaging just the remaining flaky ones
>>>> (hopefully - without powers combined - by the end of this month!!)
>>>>> 
>>>>> appreciate everyone’s help - no matter how small... i already personally
>>>> did a few “fun” random-python-class-is-missing-return-after-method stuff.
>>>>> 
>>>>> we’ve wanted this for a while and now is our time to actually execute
>>>> and make good on our previous dev list promises.
>>>>> 
>>>>> best,
>>>>> kjellman
>>>>> 
>>>>>> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>> 
>>>>>> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>>>>>> 
>>>>>> If you haven't been paying attention to JIRA, you likely didn't notice
>>>> that
>>>>>> Josh went through and triage/categorized a bunch of issues by adding
>>>>>> components, and Michael took the time to open a bunch of JIRAs for
>>>> failing
>>>>>> tests.
>>>>>> 
>>>>>> How many is a bunch? Something like 35 or so just for tests currently
>>>>>> failing on trunk.  If you're a regular contributor, you already know
>>>> that
>>>>>> dtests are flakey - it'd be great if a few of us can go through and fix
>>>> a
>>>>>> few. Even incremental improvements are improvements. Here's an easy
>>>> search
>>>>>> to find them:
>>>>>> 
>>>>>> https://issues.apache.org/jira/secure/IssueNavigator.
>>>> jspa?reset=true=project+%3D+CASSANDRA+AND+
>>>> component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
>>>> DESC%2C+created+ASC=hide
>>>>>> 
>>>>>> If you're a new contributor, fixing tests is often a good way to learn a
>>>>>> new part of the codebase. Many of these are dtests, which live in a
>>>>>> different repo ( https://github.com/apache/cassandra-dtest ) and are in
>>>>>> python, but have no fear, the repo has instructions for setting up and
>>>>>> running dtests(
>>>>>> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
>>>>>> 
>>>>>> Normal contribution workflow applies: self-assign the ticket if you
>>>> want to
>>>>>> work on it, click on 'start progress' to indicate that you're working on
>>>>>> it, mark it 'patch available' when you've uploaded code to be reviewed
>>>> (in
>>>>>> a github branch, or as a standalone patch file attached to the JIRA). If
>>>>>> you have questions, feel free to email the dev list (that's what it's
>>>> here
>>>>>> for).
>>>>>> 
>>>>>> Many thanks will be given,
>>>>>> - Jeff
>>>> 
>>>> 
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman

and just to make it super clear how awesome this is: currently the dtests when 
executed via ASF Jenkins (if they actually run successfully at all) take 
roughly 15+ hours to execute. Being able to run *everything* reliably and 
stably in 28 minutes is obviously many many orders of magnitude better.

best,
kjellman

On Nov 27, 2017, at 2:43 PM, Michael Kjellman 
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

(with 100 containers we can actually build the project, run all of the unit 
tests, and run all of the dtests in roughly 28 minutes!).

Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman

Complicated question unfortunately — and something we’re actively working on
improving:

Cassci is no longer being offered/run by Datastax and so we've need to come up
with a new solution, and what that ultimately is is still a WIP — it’s loss was
very huge obviously and a testament to the awesome resource and effort that was
put into providing it to the community for all those years.

- Short Term/Current: Tests (both dtests and unit tests) are being run via the
ASF Jenkins (https://builds.apache.org) - but that solution isn’t hugely
helpful as it’s resource constrained.
- Short-Medium Term: we hope to get a fully baked CircleCI solution to get
reliable fast test runs.
- Long Term: Actively being discussed but I’m optimistic that we can get
something awesome for the project with some stable combination of CircleCI +
ASF Jenkins, and once we do I’m sure this will change any long term plans.

For Unit Tests (a.k.a the Java ones in tree -
https://github.com/apache/cassandra/tree/trunk/test/unit/org/apache/cassandra):
Take a look at
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/…
looks like the last successful job to finish was #389.
(https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/389/testReport/).
There are currently a total of 6 tests (all from CompressedInputStreamTest)
failing on trunk via ASF Jenkins. These specific test failures are
environmental. The only *unit* test on trunk that I currently know to be flaky
is org.apache.cassandra.cql3.ViewTest. testRegularColumnTimestampUpdates
(tracked as https://issues.apache.org/jira/browse/CASSANDRA-14054)

For Distributed Tests (DTests) (a.k.a the Python ones -
https://github.com/apache/cassandra-dtest):
The situation is a great deal more complicated due to the length of time and
number of resources executing all of the dtests take (and executing the tests
across the various configurations)...

There are 4 dtest jobs on ASF Jenkins for trunk:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest-large/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest-novnode/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest-offheap/

It looks like you’ll need to go back to run #353
(https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/353/testReport/)
to see the test results as the last 2 jobs that were triggered failed to
execute. Depending on the environment variables set tests are executed or
skipped — so you’ll see different tests being run on the no-vnode job/off-heap
job/regular dtest job (or some tests might be run multiple times)

More recently we’ve been woking on getting CircleCI running. Some sample runs
from my personal fork can be seen at
https://circleci.com/gh/mkjellman/cassandra/tree/trunk_circle. I’m personally
using a paid account to get more CircleCI resources (with 100 containers we can
actually build the project, run all of the unit tests, and run all of the
dtests in roughly 28 minutes!). I’m actively working to determine out exactly
can (and cannot) be executed reliably, routinely, and easily by anyone with
just a simple free CircleCI account.

I’m also working on getting scheduled CircleCI daily runs setup against
trunk/3.0 — more on both of those when we’ve got that story fully baked.. Hope
this answers your question! There are quite a few dtests currently failing and
as Jeff mentioned I’ve created JIRAs for a lot of them already so any help (no
matter how trivial or annoying it might be or seem) to get everything green
again.

best,
kjellman

On Nov 27, 2017, at 1:54 PM, Jaydeep Chovatia
> wrote:

Is there a way to check which tests are failing in trunk currently?
Previously this URL was giving such results
but is no longer working.

Jaydeep

On Wed, Nov 15, 2017 at 5:44 PM, Jeff Jirsa
> wrote:

In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.

If you haven't been paying attention to JIRA, you likely didn't notice that
Josh went through and triage/categorized a bunch of issues by adding
components, and Michael took the time to open a bunch of JIRAs for failing
tests.

How many is a bunch? Something like 35 or so just for tests currently
failing on trunk. If you're a regular contributor, you already know that
dtests are flakey - it'd be great if a few of us can go through and fix a
few. Even incremental improvements are improvements. Here's an easy search
to find them:

https://issues.apache.org/jira/secure/IssueNavigator.
jspa?reset=true=project+%3D+CASSANDRA+AND+
component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
DESC%2C+created+ASC=hide

If you're a new contributor, fixing tests is often a good way to learn a
new

Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman

Hey Jay:

Thanks!! I just took a quick look at the JIRA and noticed that there is a 
“test-cdc” ant target? So, does that mean CDC get’s no testing with ant test? 
Do you know any of the history around this?

> On Nov 27, 2017, at 9:44 AM, Jay Zhuang <jay.zhu...@yahoo.com.INVALID> wrote:
> 
> I fixed one CDC uTest, please 
> review:https://issues.apache.org/jira/browse/CASSANDRA-14066
> 
> 
>On Friday, November 17, 2017 6:34 AM, Josh McKenzie <jmcken...@apache.org> 
> wrote:
> 
> 
>> 
>> Do we have any volunteers to fix the broken Materialized Views and CDC
>> DTests?
> 
> I'll try to take a look at the CDC tests next week; looks like one of the
> base unit tests is failing as well.
> 
> On Fri, Nov 17, 2017 at 12:09 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> Quick update re: dtests and off-heap memtables:
>> 
>> I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException:
>> offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)
>> 
>> Looks like we’re gonna need to do some work to test this configuration and
>> right now it’s pretty broken...
>> 
>> Do we have any volunteers to fix the broken Materialized Views and CDC
>> DTests?
>> 
>> best,
>> kjellman
>> 
>> 
>>> On Nov 15, 2017, at 5:59 PM, Michael Kjellman <
>> mkjell...@internalcircle.com> wrote:
>>> 
>>> yes - true- some are flaky, but almost all of the ones i filed fail 100%
>> () of the time. i look forward to triaging just the remaining flaky ones
>> (hopefully - without powers combined - by the end of this month!!)
>>> 
>>> appreciate everyone’s help - no matter how small... i already personally
>> did a few “fun” random-python-class-is-missing-return-after-method stuff.
>>> 
>>> we’ve wanted this for a while and now is our time to actually execute
>> and make good on our previous dev list promises.
>>> 
>>> best,
>>> kjellman
>>> 
>>>> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>> 
>>>> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>>>> 
>>>> If you haven't been paying attention to JIRA, you likely didn't notice
>> that
>>>> Josh went through and triage/categorized a bunch of issues by adding
>>>> components, and Michael took the time to open a bunch of JIRAs for
>> failing
>>>> tests.
>>>> 
>>>> How many is a bunch? Something like 35 or so just for tests currently
>>>> failing on trunk.  If you're a regular contributor, you already know
>> that
>>>> dtests are flakey - it'd be great if a few of us can go through and fix
>> a
>>>> few. Even incremental improvements are improvements. Here's an easy
>> search
>>>> to find them:
>>>> 
>>>> https://issues.apache.org/jira/secure/IssueNavigator.
>> jspa?reset=true=project+%3D+CASSANDRA+AND+
>> component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
>> DESC%2C+created+ASC=hide
>>>> 
>>>> If you're a new contributor, fixing tests is often a good way to learn a
>>>> new part of the codebase. Many of these are dtests, which live in a
>>>> different repo ( https://github.com/apache/cassandra-dtest ) and are in
>>>> python, but have no fear, the repo has instructions for setting up and
>>>> running dtests(
>>>> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
>>>> 
>>>> Normal contribution workflow applies: self-assign the ticket if you
>> want to
>>>> work on it, click on 'start progress' to indicate that you're working on
>>>> it, mark it 'patch available' when you've uploaded code to be reviewed
>> (in
>>>> a github branch, or as a standalone patch file attached to the JIRA). If
>>>> you have questions, feel free to email the dev list (that's what it's
>> here
>>>> for).
>>>> 
>>>> Many thanks will be given,
>>>> - Jeff
>> 
>> 
>

Re: CCM dependency in dtests

2017-11-27 Thread Michael Kjellman

thanks for driving this Stefan this is definitely an issue that I recently 
saw too trying to get all the dtests passing. having logic you need to fix in 3 
repos isn’t ideal at all. 

> On Nov 27, 2017, at 4:05 AM, Stefan Podkowinski  wrote:
> 
> Just wanted to bring a recent discussion about how to use ccm from
> dtests to your attention:
> https://github.com/apache/cassandra-dtest/pull/13
> 
> Basically the idea is to not depend on a released ccm artifact, but to
> use a dedicated git branch in the ccm repo instead for executing dtests.
> Motivation and details can be found in the PR, please feel free to comment.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Flakey Dtests

2017-11-17 Thread Michael Kjellman

I’m guessing this was part of
https://issues.apache.org/jira/browse/CASSANDRA-5?

I see Sylvain left a comment about something that sounds pretty similar… was
this actually resolved? looks like it was merged as
https://github.com/pcmanus/ccm/commit/1c0bf62e0b21fc78ee09026882953a5436ccf0f0?
when do ccm releases get published to pypy?

On Nov 17, 2017, at 12:18 AM, Michael Kjellman
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

I see a ton of upgrade tests right now failing for:

Unexpected error in node1 log, error:
ERROR [main] 2017-11-17 07:57:54,477 CassandraDaemon.java:672 - Exception
encountered during startup: Invalid yaml. Please remove properties [rpc_port]
from your cassandra.yaml

I do see that rpc_port is in 3.0 and it seems to have been yanked from trunk..
So it seems like a legitimate failure.. I’m not sure I fully understand how the
yaml upgrade path works for upgrade test dtests. I’ve taken a look at
upgrade_tests/upgrade_manifest.py and upgrade_tests/README.md… can anyone shed
any light on how this is supposed to work? Was handling rpc_port in the upgrade
dtests just missed when this was removed for whatever reason from trunk?

thanks…

best,
kjellman

On Nov 16, 2017, at 9:09 PM, Michael Kjellman
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com><mailto:mkjell...@internalcircle.com>>
wrote:

Quick update re: dtests and off-heap memtables:

I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException:
offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)

Looks like we’re gonna need to do some work to test this configuration and
right now it’s pretty broken...

Do we have any volunteers to fix the broken Materialized Views and CDC DTests?

best,
kjellman

On Nov 15, 2017, at 5:59 PM, Michael Kjellman
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com><mailto:mkjell...@internalcircle.com>>
wrote:

yes - true- some are flaky, but almost all of the ones i filed fail 100% () of
the time. i look forward to triaging just the remaining flaky ones (hopefully -
without powers combined - by the end of this month!!)

appreciate everyone’s help - no matter how small... i already personally did a
few “fun” random-python-class-is-missing-return-after-method stuff.

we’ve wanted this for a while and now is our time to actually execute and make
good on our previous dev list promises.

best,
kjellman

On Nov 15, 2017, at 5:45 PM, Jeff Jirsa
<jji...@gmail.com<mailto:jji...@gmail.com><mailto:jji...@gmail.com>> wrote:

In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC=hide

If you're a new contributor, fixing tests is often a good way to learn a
new part of the codebase. Many of these are dtests, which live in a
different repo ( https://github.com/apache/cassandra-dtest ) and are in
python, but have no fear, the repo has instructions for setting up and
running dtests(
https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )

Normal contribution workflow applies: self-assign the ticket if you want to
work on it, click on 'start progress' to indicate that you're working on
it, mark it 'patch available' when you've uploaded code to be reviewed (in
a github branch, or as a standalone patch file attached to the JIRA). If
you have questions, feel free to email the dev list (that's what it's here
for).

Many thanks will be given,
- Jeff

Re: Flakey Dtests

2017-11-16 Thread Michael Kjellman

Quick update re: dtests and off-heap memtables:

I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException: 
offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)

Looks like we’re gonna need to do some work to test this configuration and 
right now it’s pretty broken...

Do we have any volunteers to fix the broken Materialized Views and CDC DTests?

best,
kjellman


> On Nov 15, 2017, at 5:59 PM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> yes - true- some are flaky, but almost all of the ones i filed fail 100% () 
> of the time. i look forward to triaging just the remaining flaky ones 
> (hopefully - without powers combined - by the end of this month!!)
> 
> appreciate everyone’s help - no matter how small... i already personally did 
> a few “fun” random-python-class-is-missing-return-after-method stuff. 
> 
> we’ve wanted this for a while and now is our time to actually execute and 
> make good on our previous dev list promises. 
> 
> best,
> kjellman
> 
>> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>> 
>> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>> 
>> If you haven't been paying attention to JIRA, you likely didn't notice that
>> Josh went through and triage/categorized a bunch of issues by adding
>> components, and Michael took the time to open a bunch of JIRAs for failing
>> tests.
>> 
>> How many is a bunch? Something like 35 or so just for tests currently
>> failing on trunk.  If you're a regular contributor, you already know that
>> dtests are flakey - it'd be great if a few of us can go through and fix a
>> few. Even incremental improvements are improvements. Here's an easy search
>> to find them:
>> 
>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC=hide
>> 
>> If you're a new contributor, fixing tests is often a good way to learn a
>> new part of the codebase. Many of these are dtests, which live in a
>> different repo ( https://github.com/apache/cassandra-dtest ) and are in
>> python, but have no fear, the repo has instructions for setting up and
>> running dtests(
>> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
>> 
>> Normal contribution workflow applies: self-assign the ticket if you want to
>> work on it, click on 'start progress' to indicate that you're working on
>> it, mark it 'patch available' when you've uploaded code to be reviewed (in
>> a github branch, or as a standalone patch file attached to the JIRA). If
>> you have questions, feel free to email the dev list (that's what it's here
>> for).
>> 
>> Many thanks will be given,
>> - Jeff

Re: Flakey Dtests

2017-11-15 Thread Michael Kjellman

yes - true- some are flaky, but almost all of the ones i filed fail 100% () of 
the time. i look forward to triaging just the remaining flaky ones (hopefully - 
without powers combined - by the end of this month!!)

appreciate everyone’s help - no matter how small... i already personally did a 
few “fun” random-python-class-is-missing-return-after-method stuff. 

we’ve wanted this for a while and now is our time to actually execute and make 
good on our previous dev list promises. 

best,
kjellman

> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa  wrote:
> 
> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
> 
> If you haven't been paying attention to JIRA, you likely didn't notice that
> Josh went through and triage/categorized a bunch of issues by adding
> components, and Michael took the time to open a bunch of JIRAs for failing
> tests.
> 
> How many is a bunch? Something like 35 or so just for tests currently
> failing on trunk.  If you're a regular contributor, you already know that
> dtests are flakey - it'd be great if a few of us can go through and fix a
> few. Even incremental improvements are improvements. Here's an easy search
> to find them:
> 
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC=hide
> 
> If you're a new contributor, fixing tests is often a good way to learn a
> new part of the codebase. Many of these are dtests, which live in a
> different repo ( https://github.com/apache/cassandra-dtest ) and are in
> python, but have no fear, the repo has instructions for setting up and
> running dtests(
> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
> 
> Normal contribution workflow applies: self-assign the ticket if you want to
> work on it, click on 'start progress' to indicate that you're working on
> it, mark it 'patch available' when you've uploaded code to be reviewed (in
> a github branch, or as a standalone patch file attached to the JIRA). If
> you have questions, feel free to email the dev list (that's what it's here
> for).
> 
> Many thanks will be given,
> - Jeff

Re: Integrating vendor-specific code and developing plugins

2017-05-18 Thread Michael Kjellman

That’s epic Jeff. Very cool.

Sent from my iPhone

On May 18, 2017, at 10:28 AM, Jeff Jirsa 
> wrote:

On Mon, May 15, 2017 at 5:25 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:



To me testable means that we can run the tests at the very least for every
release, but ideally they would be run more often than that.  Especially
with the push to not release unless the test board is all passing, we
should not be releasing features that we don’t have a test board for.
Ideally that means we have it in ASF CI.  If there is someone that can
commit to posting results of runs from an outside CI somewhere, then I
think that could work as well, but that gets pretty cumbersome if we have
to check 10 different CI dashboards at different locations before every
release.



It turns out there's a ppc64le jenkins slave @ asf, so I've setup
https://builds.apache.org/view/A-D/view/Cassandra/job/cassandra-devbranch-ppc64le-testall/
for testing.

Like our other devbranch-testall builds, it takes a repo+branch as
parameters, and runs unit tests. While the unit tests aren't passing, this
platform should now be considered testable.

Re: Dropped Mutation and Read messages.

2017-05-11 Thread Michael Kjellman

This discussion should be on the C* user mailing list. Thanks!

best,
kjellman

> On May 11, 2017, at 10:53 AM, Oskar Kjellin  wrote:
> 
> That seems way too low. Depending on what type of disk you have it should be 
> closer to 1-200MB.
> That's probably causing your problems. It would still take a while for you to 
> compact all your data tho 
> 
> Sent from my iPhone
> 
>> On 11 May 2017, at 19:50, varun saluja  wrote:
>> 
>> nodetool getcompactionthrougput
>> 
>> ./nodetool getcompactionthroughput
>> Current compaction throughput: 16 MB/s
>> 
>> Regards,
>> Varun Saluja
>> 
>>> On 11 May 2017 at 23:18, varun saluja  wrote:
>>> Hi,
>>> 
>>> PFB results for same. Numbers are scary here.
>>> 
>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>>> pending tasks: 137
>>>   compaction type keyspace tablecompleted   
>>>totalunit   progress
>>>Compaction   system hints   5762711108   
>>> 837522028005   bytes  0.69%
>>>Compaction   walletkeyspace   user_txn_history_v2101477894 
>>> 4722068388   bytes  2.15%
>>>Compaction   walletkeyspace   user_txn_history_v2   1511866634   
>>> 753221762663   bytes  0.20%
>>>Compaction   walletkeyspace   user_txn_history_v2   3664734135
>>> 18605501268   bytes 19.70%
>>> Active compaction remaining time :  26h32m28s
>>> 
>>> 
>>> 
 On 11 May 2017 at 23:15, Oskar Kjellin  wrote:
 What does nodetool compactionstats show?
 
 I meant compaction throttling. nodetool getcompactionthrougput
 
 
> On 11 May 2017, at 19:41, varun saluja  wrote:
> 
> Hi Oskar,
> 
> Thanks for response.
> 
> Yes, could see lot of threads for compaction. Actually we are loading 
> around 400GB data  per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2 
> days and then, we start getting Mutation drops  , longer GC and very high 
> load on system.
> 
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) 
> off-heap
> 
> The job was stopped 12 hours back. But, still these failures can be seen. 
> Can you Please let me know how shall i proceed further. If possible, 
> Please suggest some parameters for high write intensive jobs.
> 
> 
> Regards,
> Varun Saluja
> 
> 
>> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
>> Do you have a lot of compactions going on? It sounds like you might've 
>> built up a huge backlog. Is your throttling configured properly?
>> 
>>> On 11 May 2017, at 18:50, varun saluja  wrote:
>>> 
>>> Hi Experts,
>>> 
>>> Seeking your help on a production issue.  We were running high write 
>>> intensive job on our 3 node cassandra cluster V 2.1.7.
>>> 
>>> TPS on nodes were high. Job ran for more than 2 days and thereafter, 
>>> loadavg on 1 of the node increased to very high number like loadavg : 
>>> 29.
>>> 
>>> System log reports:
>>> 
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 
>>> 5000ms
>>> 
>>> The job was stopped due to heavy load. But sill after 12 hours , we can 
>>> see mutation drops messages and sudden increase on avgload
>>> 
>>> Are these hintedhandoff mutations? Can we stop these.
>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show 
>>> any load or any such activity.
>>> 
>>> Due to heavy load and GC , there are intermittent gossip failures among 
>>> node. Can you someone Please help.
>>> 
>>> PS: Load job was stopped on cluster. Everything ran fine for few hours 
>>> and and Later issue started again like mutation messages drops.
>>> 
>>> Thanks and Regards,
>>> Varun Saluja
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
> 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

2017-05-11 Thread Michael Kjellman

I'm almost done with a rebased trunk patch. Hit a few snags. I want nothing 
more to finish this thing... The latest issue was due to range tombstones and 
the fact that the deletion time was being stored in the index from 3.0 onwards. 
I hope to have everything pushed very shortly. Sorry for the delay, I'm doing 
my best... there is never enough hours in the day. :)

best,
kjellman 

> On May 11, 2017, at 1:48 AM, Kant Kodali  wrote:
> 
> oh this looks like one I am looking for
> https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra
> 3.10 or merged somewhere?
> 
> On Thu, May 11, 2017 at 1:13 AM, Kant Kodali  wrote:
> 
>> Hi DuyHai,
>> 
>> I am trying to see what are the possible things we can do to get over this
>> limitation?
>> 
>> 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help
>> at all?
>> 2. Can we have Merkle trees built for groups of rows in partition ? such
>> that we can stream only those groups where the hash is different?
>> 3. It would be interesting to see if we can spread a partition across
>> nodes.
>> 
>> I am just trying to validate some ideas that can help potentially get over
>> this 100MB limitation since we may not always fit into a time series model.
>> 
>> Thanks!
>> 
>> On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan 
>> wrote:
>> 
>>> Yes the recommendation still applies
>>> 
>>> Wide partitions have huge impact on repair (over streaming), compaction
>>> and bootstrap
>>> 
>>> Le 10 mai 2017 23:54, "Kant Kodali"  a écrit :
>>> 
>>> Hi All,
>>> 
>>> Cassandra community had always been recommending 100MB per partition as a
>>> sweet spot however does this limitation still exist given there is a
>>> B-tree
>>> implementation to identify rows inside a partition?
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>> apache/cassandra/db/rows/BTreeRow.java
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>

Re: unsubscribe

2017-04-06 Thread Michael Kjellman

http://apache.org/foundation/mailinglists.html

Sent from my iPhone

On Apr 6, 2017, at 9:57 AM, Nitija Patil 
> wrote:

unsubscribe

On Thu, Apr 6, 2017 at 10:25 PM, Vineet Gadodia 
> wrote:

unsubscribe

On Wed, Apr 5, 2017 at 1:51 AM, Ksawery Glab 
>
wrote:

unsubscribe

2017-04-05 9:45 GMT+01:00 Nitija Patil 
>:

unsubscribe

On Wed, Apr 5, 2017 at 2:05 PM, 郑蒙家(蒙家)

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Michael Kjellman

Jason has asked for review and feedback many times. Maybe be constructive and 
review his code instead of just complaining (once again)?

Sent from my iPhone

> On Nov 19, 2016, at 1:49 PM, Edward Capriolo  wrote:
> 
> I would say start with a mindset like 'people will run this in production'
> not like 'why would you expect this to work'.
> 
> Now how does this logic effect feature develement? Maybe use gossip 2.0 as
> an example.
> 
> I will play my given debby downer role. I could imagine 1 or 2 dtests and
> the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes with
> twitter announce of the release let bugs trickle in.
> 
> One could also do something comprehensive like test on clusters of 2 to
> 1000 nodes. Test with jepsen to see what happens during partitions, inject
> things like jvm pauses and account for behaivor. Log convergence times
> after given events.
> 
> Take a stand and say look "we engineered and beat the crap out of this
> feature. I deployed this release feature at my company and eat my dogfood.
> You are not my crash test dummy."
> 
> 
>> On Saturday, November 19, 2016, Jeff Jirsa  wrote:
>> 
>> Any proposal to solve the problem you describe?
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo > > wrote:
>>> 
>>> This is especially relevant if people wish to focus on removing things.
>>> 
>>> For example, gossip 2.0 sounds great, but seems geared toward huge
>> clusters
>>> which is not likely a majority of users. For those with a 20 node cluster
>>> are the indirect benefits woth it?
>>> 
>>> Also there seems to be a first push to remove things like compact storage
>>> or thrift. Fine great. But what is the realistic update path for someone.
>>> If the big players are running 2.1 and maintaining backports, the average
>>> shop without a dedicated team is going to be stuck saying (great features
>>> in 4.0 that improve performance, i would probably switch but its not
>> stable
>>> and we have that one compact storage cf and who knows what is going to
>>> happen performance wise when)
>>> 
>>> We really need to lose this realease wont be stable for 6 minor versions
>>> concept.
>>> 
>>> On Saturday, November 19, 2016, Edward Capriolo > >
>>> wrote:
>>> 
 
 
 On Friday, November 18, 2016, Jeff Jirsa > 
 ');>>
>> wrote:
 
> We should assume that we’re ditching tick/tock. I’ll post a thread on
> 4.0-and-beyond here in a few minutes.
> 
> The advantage of a prod release every 6 months is fewer incentive to
>> push
> unfinished work into a release.
> The disadvantage of a prod release every 6 months is then we either
>> have
> a very short lifespan per-release, or we have to maintain lots of
>> active
> releases.
> 
> 2.1 has been out for over 2 years, and a lot of people (including us)
>> are
> running it in prod – if we have a release every 6 months, that means
>> we’d
> be supporting 4+ releases at a time, just to keep parity with what we
>> have
> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
>> year
> old branches.
> 
> 
> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf
>> of Blake
> Eggleston" > wrote:
> 
>>> While stability is important if we push back large "core" changes
> until later we're just setting ourselves up to face the same issues
>> later on
>> 
>> In theory, yes. In practice, when incomplete features are earmarked
>> for
> a certain release, those features are often rushed out, and not always
> fully baked.
>> 
>> In any case, I don’t think it makes sense to spend too much time
> planning what goes into 4.0, and what goes into the next major release
>> with
> so many release strategy related decisions still up in the air. Are we
> going to ditch tick-tock? If so, what will it’s replacement look like?
> Specifically, when will the next “production” release happen? Without
> knowing that, it's hard to say if something should go in 4.0, or 4.5,
>> or
> 5.0, or whatever.
>> 
>> The reason I suggested a production release every 6 months is because
> (in my mind) it’s frequent enough that people won’t be tempted to rush
> features to hit a given release, but not so frequent that it’s not
> practical to support. It wouldn’t be the end of the world if some of
>> these
> tickets didn’t make it into 4.0, because 4.5 would fine.
>> 
>> On November 18, 2016 at 1:57:21 PM, kurt Greaves (
>> k...@instaclustr.com )
> wrote:
>> 
>>> On 18 November 2016 at 18:25, Jason Brown > > wrote:

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Michael Kjellman

Honest question: are you *ever* positive Ed? 

Maybe give it a shot once in a while. It will be good for your mental health. 


Sent from my iPhone

> On Nov 19, 2016, at 11:50 AM, Edward Capriolo  wrote:
> 
> This is especially relevant if people wish to focus on removing things.
> 
> For example, gossip 2.0 sounds great, but seems geared toward huge clusters
> which is not likely a majority of users. For those with a 20 node cluster
> are the indirect benefits woth it?
> 
> Also there seems to be a first push to remove things like compact storage
> or thrift. Fine great. But what is the realistic update path for someone.
> If the big players are running 2.1 and maintaining backports, the average
> shop without a dedicated team is going to be stuck saying (great features
> in 4.0 that improve performance, i would probably switch but its not stable
> and we have that one compact storage cf and who knows what is going to
> happen performance wise when)
> 
> We really need to lose this realease wont be stable for 6 minor versions
> concept.
> 
> On Saturday, November 19, 2016, Edward Capriolo 
> wrote:
> 
>> 
>> 
>> On Friday, November 18, 2016, Jeff Jirsa > > wrote:
>> 
>>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>>> 4.0-and-beyond here in a few minutes.
>>> 
>>> The advantage of a prod release every 6 months is fewer incentive to push
>>> unfinished work into a release.
>>> The disadvantage of a prod release every 6 months is then we either have
>>> a very short lifespan per-release, or we have to maintain lots of active
>>> releases.
>>> 
>>> 2.1 has been out for over 2 years, and a lot of people (including us) are
>>> running it in prod – if we have a release every 6 months, that means we’d
>>> be supporting 4+ releases at a time, just to keep parity with what we have
>>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
>>> old branches.
>>> 
>>> 
>>> On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake
>>> Eggleston"  wrote:
>>> 
> While stability is important if we push back large "core" changes
>>> until later we're just setting ourselves up to face the same issues later on
 
 In theory, yes. In practice, when incomplete features are earmarked for
>>> a certain release, those features are often rushed out, and not always
>>> fully baked.
 
 In any case, I don’t think it makes sense to spend too much time
>>> planning what goes into 4.0, and what goes into the next major release with
>>> so many release strategy related decisions still up in the air. Are we
>>> going to ditch tick-tock? If so, what will it’s replacement look like?
>>> Specifically, when will the next “production” release happen? Without
>>> knowing that, it's hard to say if something should go in 4.0, or 4.5, or
>>> 5.0, or whatever.
 
 The reason I suggested a production release every 6 months is because
>>> (in my mind) it’s frequent enough that people won’t be tempted to rush
>>> features to hit a given release, but not so frequent that it’s not
>>> practical to support. It wouldn’t be the end of the world if some of these
>>> tickets didn’t make it into 4.0, because 4.5 would fine.
 
 On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com)
>>> wrote:
 
> On 18 November 2016 at 18:25, Jason Brown  wrote:
> 
> #11559 (enhanced node representation) - decided it's *not* something we
> need wrt #7544 storage port configurable per node, so we are punting on
> 
 
 #12344 - Forward writes to replacement node with same address during
>>> replace
 depends on #11559. To be honest I'd say #12344 is pretty important,
 otherwise it makes it difficult to replace nodes without potentially
 requiring client code/configuration changes. It would be nice to get
>>> #12344
 in for 4.0. It's marked as an improvement but I'd consider it a bug and
 thus think it could be included in a later minor release.
 
 Introducing all of these in a single release seems pretty risky. I think
>>> it
> would be safer to spread these out over a few 4.x releases (as they’re
> finished) and give them time to stabilize before including them in an
>>> LTS
> release. The downside would be having to maintain backwards
>>> compatibility
> across the 4.x versions, but that seems preferable to delaying the
>>> release
> of 4.0 to include these, and having another big bang release.
 
 
 I don't think anyone expects 4.0.0 to be stable. It's a major version
 change with lots of new features; in the production world people don't
 normally move to a new major version until it has been out for quite some
 time and several minor releases have passed. Really, most people are only

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-11-08 Thread Michael Kjellman

Yes, We hit this as well. We have a internal patch that I wrote to mostly 
revert the behavior back to ByteBuffers with as small amount of code change as 
possible. Performance of our build is now even with 2.0.x and we've also 
forward ported it to 3.x (although the 3.x patch was even more complicated due 
to Bounds, RangeTombstoneBound, ClusteringPrefix which actually increases the 
number of allocations to somewhere between 11 and 13 depending on how I count 
it per indexed block -- making it even worse than what you're observing in 2.1.

We haven't upstreamed it as 2.1 is obviously not taking any changes at this 
point and the longer term solution is 
https://issues.apache.org/jira/browse/CASSANDRA-9754 (which also includes the 
changes to go back to ByteBuffers and remove as much of the Composites from the 
storage engine as possible.) Also, the solution is a bit of a hack -- although 
it was a blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if 
it works..

best,
kjellman


On Nov 8, 2016, at 11:31 AM, Dikang Gu 
> wrote:

This is very expensive:

"MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10 
tid=0x7f2fd57e1800 nid=0x1cc510 runnable [0x7f2b971b]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.cassandra.db.marshal.IntegerType.compare(IntegerType.java:29)
at 
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:98)
at 
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:31)
at java.util.TreeMap.put(TreeMap.java:545)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:254)
at 
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:228)
at 
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:104)
at 
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:156)
at 
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)


Checked the git history, it comes from this jira: 
https://issues.apache.org/jira/browse/CASSANDRA-5417

Any thoughts?


On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta 
> wrote:
Haven't seen this before, but perhaps it's related to CASSANDRA-10433? This is 
just a wild guess as it's in a related codepath, but maybe worth trying out the 
patch available to see if it helps anything...

2016-10-28 15:03 GMT-02:00 Dikang Gu 
>:
We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster to 
2.1.14 as well. The 2.1.14 node is not able to handle the same amount of read 
traffic as the 2.0.16 node, actually, it's less than 50%.

And in the perf results, the first line could go as high as 50%, as we turn up 
the read traffic, which never appeared in 2.0.16.

Any thoughts?
Thanks


Samples: 952K of event 'cycles', Event count (approx.): 229681774560
Overhead  Shared Object  Symbol
   6.52%  perf-196410.map[.] 
Lorg/apache/cassandra/db/marshal/IntegerType;.compare in 
Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   4.84%  libzip.so  [.] adler32
   2.88%  perf-196410.map[.] 
Ljava/nio/HeapByteBuffer;.get in 
Lorg/apache/cassandra/db/marshal/IntegerType;.compare
   2.39%  perf-196410.map[.] 
Ljava/nio/Buffer;.checkIndex in 
Lorg/apache/cassandra/db/marshal/IntegerType;.findMostSignificantByte
   2.03%  perf-196410.map[.] 
Ljava/math/BigInteger;.compareTo in 
Lorg/apache/cassandra/db/DecoratedKey;.compareTo
   1.65%  perf-196410.map[.] vtable chunks
   1.44%  perf-196410.map[.] 
Lorg/apache/cassandra/db/DecoratedKey;.compareTo in 
Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
   1.02%  perf-196410.map[.] 
Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   1.00%  
snappy-1.0.5.2-libsnappyjava.so
[.] 0x3804
   0.87%  perf-196410.map[.] 
Ljava/io/DataInputStream;.readFully in 
Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
   0.82%

Re: Review of Cassandra actions

2016-11-05 Thread Michael Kjellman

Thanks Jeff for your thoughtful comments. +100

Sent from my iPhone

> On Nov 5, 2016, at 6:26 PM, Jeff Jirsa  wrote:
> 
> I hope the other 7 members of the board take note of this response,
> and other similar reactions on dev@ today.
> 
> When Datastax violated trademark, they acknowledged it and worked to
> correct it. To their credit, they tried to do the right thing.
> When the PMC failed to enforce problems, we acknowledged it and worked
> to correct it. We aren't perfect, but we're trying.
> 
> When a few members the board openly violate the code of conduct, being
> condescending and disrespectful under the auspices of "enforcing the
> rules" and "protecting the community", they're breaking the rules,
> damaging the community, and nobody seems willing to acknowledge it or
> work to correct it. It's not isolated, I'll link examples if it's
> useful.
> 
> In a time when we're all trying to do the right thing to protect the
> project and the community, it's unfortunate that high ranking, long
> time members within the ASF actively work to undermine trust and
> community while flaunting the code of conduct, which requires
> friendliness, empathy, and professionalism, and the rest of the board
> is silent on the matter.
> 
> 
> 
> 
>> On Nov 5, 2016, at 4:08 PM, Dave Brosius  wrote:
>> 
>> I take this response (a second time) as a pompous way to trivialize the 
>> responses of others as to the point of their points being meaningless to 
>> you. So either explain what this means, or accept the fact that you are as 
>> Chris is exactly what people are claiming you to be. Abnoxious bullies more 
>> interested in throwing your weight around and causing havoc, destroying a 
>> community, rather than actually being motivated by improving the ASF.
>> 
>> 
>>> On 11/05/2016 06:16 PM, Jim Jagielski wrote:
>>> How about a nice game of chess?
>>> 
 On Nov 5, 2016, at 1:15 PM, Aleksey Yeschenko  wrote:
 
 I’m sorry, but this statement is so at odds with common sense that I have 
 to call it out.
 
 Of course your position grants your voice extra power. A lot of extra 
 power,
 like it or not (I have a feeling you quite like it, though).
 
 In an ideal world, that power would entail corresponding duties:
 care and consideration in your actions at least.
 Instead, you are being hotheaded, impulsive, antagonising, and immature.
 
 In what possible universe dropping that hammer threat from the ’20% off” 
 email thread,
 then following up with a Game of Thrones youtube clip is alright?
 
 That kind of behaviour is inappropriate for a board member. Frankly, it 
 wouldn’t be
 appropriate for a greeter at Walmart. If you don’t see this, we do indeed 
 have bigger
 problems.
 
 --
 AY
 
 On 5 November 2016 at 14:57:13, Jim Jagielski (j...@jagunet.com) wrote:
 
>> But I love the ability of VP's and Board to simply pretend their 
>> positions carried no weight.
>> 
> I would submit that whatever "weight" someone's position may
> carry, it is due to *who* they are, and not *what* they are.
> 
> If we have people here in the ASF or in PMCs which really think
> that titles manner in discussions like this, when one is NOT
> speaking ex cathedra, then we have bigger problems. :)
>>> 
>>

Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Michael Kjellman

And to add one additional thought to follow up: I generally am personally 
motivated to fix problems and bugs that reduce my chance of getting paged at 
3am in the morning. This is important for my mental health but also for the 
perceived stability of our products (obviously). 

Features are important as they provide gateways for adoption of all the other 
code by new customers.

Stability and performance is one of those things that doesn't "sell" well to 
new adopters (but sells very well to existing customers).

Luckily most people are on 2.1 and 3.0 and there are tons of features already 
in releases for people to adopt so we've got the "features" thing under control 
for at least a year in my opinion. 

best,
kjellman

Sent from my iPhone

> On Nov 4, 2016, at 10:33 AM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> "Avalon. The database" yes autocorrect. That's exactly what I wanted. 
> 
> That should read "scaling the database and stability." Sorry. I'm typing this 
> while walking up a big ass hill in San Francisco heading to the office. 
> 
> Sent from my iPhone
> 
>> On Nov 4, 2016, at 10:31 AM, Michael Kjellman <mkjell...@internalcircle.com> 
>> wrote:
>> 
>> Avalon. The database

Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Michael Kjellman

"Avalon. The database" yes autocorrect. That's exactly what I wanted. 

That should read "scaling the database and stability." Sorry. I'm typing this 
while walking up a big ass hill in San Francisco heading to the office. 

Sent from my iPhone

> On Nov 4, 2016, at 10:31 AM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> Avalon. The database

Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Michael Kjellman

Hi Kelly-

I can't speak to many of your questions as it's not my position to do so. What 
I can say is that at Apple we are doubling down on open source. We have tons of 
code in flight -- really big ones in fact -- many already out for review. Our 
list of enhancements we want to do grows all the time so there is no shortage 
of work to do. We also have a really great team built up with an incredible 
amount of in house knowledge. 

The stuff we work on generally is focused on Avalon. The database and 
stabilizing it. I'm not sure how much "feature" work we will do in comparison 
(although things like SASI obviously is). 

It's unfortunate how things have played out -- but let's remind ourselves this 
is a database and we're in it for the long haul. The last thing we want is to 
have the project stagnate due to infighting. 

For the foreseeable future Apple and The Last Pickle will step up a bit more of 
an active role as much as we can. 

I have no doubt in my mind this will change the project. The rate of releases 
-- what gets worked on -- bandwidth to fix low hanging fruit tickets... but at 
least I see a path forward. 

So let's try to be positive here and lead by example. It's the only thing we 
can do right now. 

best,
kjellman



Sent from my iPhone

> On Nov 4, 2016, at 9:47 AM, Kelly Sommers  wrote:
> 
> I think the community needs some clarification about what's going on.
> There's a really concerning shift going on and the story about why is
> really blurry. I've heard all kinds of wild claims about what's going on.
> 
> I've heard people say the ASF is pushing DataStax out because they don't
> like how much control they have over Cassandra. I've heard other people say
> DataStax and the ASF aren't getting along. I've heard one person who has
> pull with a friend in the ASF complained about a feature not getting
> considered (who also didn't go down the correct path of proposing) kicked
> and screamed and started the ball rolling for control change.
> 
> I don't know what's going on, and I doubt the truth is in any of those, the
> truth is probably somewhere in between. As a former Cassandra MVP and
> builder of some of the larger Cassandra clusters in the last 3 years I'm
> concerned.
> 
> I've been really happy with Jonathan and DataStax's role in the Cassandra
> community. I think they have done a great job at investing time and money
> towards the good interest in the project. I think it is unavoidable a
> single company bootstraps large projects like this into popularity. It's
> those companies investments who give the ability to grow diversity in later
> stages. The committer list in my opinion is the most diverse its ever been,
> hasn't it? Apple is a big player now.
> 
> I don't think reducing DataStax's role for the sake of diversity is smart.
> You grow diversity by opening up new opportunities for others. Grow the
> committer list perhaps. Mentor new people to join that list. You don't kick
> someone to the curb and hope things improve. You add.
> 
> I may be way off on what I'm seeing but there's not much to go by but
> gossip (ahaha :P) and some ASF meeting notes and DataStax blog posts.
> 
> August 17th 2016 ASF changed the Apache Cassandra chair
> https://www.apache.org/foundation/records/minutes/2016/board_minutes_2016_08_17.txt
> 
> "The Board expressed continuing concern that the PMC was not acting
> independently and that one company had undue influence over the project."
> 
> August 19th 2016 Jonothan Ellis steps down as chair
> http://www.datastax.com/2016/08/a-look-back-a-look-forward
> 
> November 2nd 2016 DataStax moves committers to DSE from Cassandra.
> http://www.datastax.com/2016/11/serving-customers-serving-the-community
> 
> I'm really concerned if indeed the ASF is trying to change control and
> diversity  of organizations by reducing DataStax's role. As I said earlier,
> I've been really happy at the direction DataStax and Jonathan has taken the
> project and I would much prefer see additional opportunities along side
> theirs grow instead of subtracting. The ultimate question that's really
> important is whether DataStax and Jonathan have been steering the project
> in the right direction. If the answer is yes, then is there really anything
> broken? Only if the answer is no should change happen, in my opinion.
> 
> Can someone at the ASF please clarify what is going on? The ASF meeting
> notes are very concerning.
> 
> Thank you for listening,
> Kelly Sommers

Re: Moderation

2016-11-04 Thread Michael Kjellman

@Chris: instead of promoting the arguing going on on this thread could you 
please help lead by example and reply to Kelly's questions in her email? 
Thanks. 

I don't enjoy watching a community I care about continue to explode in front of 
my eyes ☹️

best,
kjellman

Sent from my iPhone

> On Nov 4, 2016, at 10:10 AM, Chris Mattmann  wrote:
> 
> I have apmail karma and can add moderators. 
> 
> Jason I can add you - please confirm you would like to be added. Did you file 
> the ticket - if so point me to it. If you haven't yet, no worries I can still 
> add you. Let me know. Thanks.
> 
>> On 2016-11-04 09:54 (-0700), Jason Brown  wrote: 
>> Gary,
>> 
>> I've just started looking into the moderator component due to this thread;
>> I admit I did not know about it before (my fault). Yes, I would like to be
>> added. Apparently, I need to file an INFRA ticket (as per
>> https://www.apache.org/dev/committers.html#mailing-list-moderators), which
>> I will do in the next few minutes.
>> 
>> -Jason
>> 
>>> On Fri, Nov 4, 2016 at 9:51 AM, Gary Dusbabek  wrote:
>>> 
>>> I'm beginning to wonder if I'm the only one with moderator privs. Any other
>>> committer/PMCs interested?
>>> 
>>> Sorry, it's a chore to begin with and I've been traveling this week.
>>> 
>>> Gary.
>>> 
>>> On Fri, Nov 4, 2016 at 3:47 PM, Chris Mattmann 
>>> wrote:
>>> 
 Hi Folks,
 
 Kelly Sommers sent a message to dev@cassandra and I'm trying to figure
 out if it's in moderation.
 
 Can the moderators speak up?
 
 Cheers,
 Chris
 
 
>>> 
>>

Re: 8099 Storage Format Documentation as used with PRIMARY_INDEX

2016-10-19 Thread Michael Kjellman

Ugh, just finally figured the "header" bit of my question out. Mega lame. :\

> On Oct 18, 2016, at 9:17 AM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> I'm working on writing Birch for trunk and I noticed the following:
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java#L503
> 
> Prior to 3.0 the offset was the literal offset into the data file, yet now we 
> seem to be doing the position encoded with the key (for all rows regardless 
> of if they're > 64kb and thus have an index component) plus the serialized 
> offset. I also see there is now a a "header" offset.
> 
> In RowIndexEntry there is:
> 
> 
> /**
> * @return the offset to the start of the header information for this row.
> * For some formats this may not be the start of the row.
> */
> public long headerOffset()
> {
>return 0;
> }
> 
> /**
> * The length of the row header (partition key, partition deletion and static 
> row).
> * This value is only provided for indexed entries and this method will throw
> * {@code UnsupportedOperationException} if {@code !isIndexed()}.
> */
> public long headerLength()
> {
>throw new UnsupportedOperationException();
> }
> 
> 
> In 2.1 we stored the partition key, deletion, but not static row -- but we 
> didn't need or use this so I'm guessing this is actually just to support 
> static rows? Is there any further documentation around the header in other 
> classes that I just haven't come across yet? Any thoughts on position + 
> offset and why this behavior changed? Thanks
> 
> best,
> kjellman

Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman

Sorry, No. Always document your assumptions. I shouldn't need to git blame a 
thousand commits and read thru a billion tickets to maybe understand why 
something was done. Clearly thru the conversations on this topic I've had on 
IRC and the responses so far on this email thread it's not/still not obvious.

best,
kjellman

On Oct 18, 2016, at 10:07 AM, Benedict Elliott Smith 
> wrote:

This is what JIRA is for.

Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman

Yeah, it has been there for years -- that being said most of the community is 
just catching up to 2.1 and 3.0 now where the usage did appear to change over 
2.0-- and I'm more trying to figure out what the intent was in the various 
usages all over the codebase and make sure it's actually doing that. Maybe even 
add some comments about that intent. :)

In 2.1 I saw that we were doing this to get the file descriptor in some cases 
(which obviously will return the wrong file descriptor so most likely would 
have made this even more of a potential no-op than it already was?):

public static int getfd(String path)
{
RandomAccessFile file = null;
try
{
file = new RandomAccessFile(path, "r");
return getfd(file.getFD());
}
catch (Throwable t)
{
JVMStabilityInspector.inspectThrowable(t);
// ignore
return -1;
}
finally
{
try
{
if (file != null)
file.close();
}
catch (Throwable t)
{
// ignore
}
}
}


On Oct 18, 2016, at 9:34 AM, Jake Luciani 
<jak...@gmail.com<mailto:jak...@gmail.com>> wrote:

Although given we have an in process page cache[1] now this may not be
needed anymore?
This is only for the data file though.  I think its been years? since we
showed it helped so perhaps someone should show if this is still
working/helping in the real world.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5863


On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

Specifically regarding the behavior in different kernels, from `man
posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
this was interpreted literally as "zero bytes", rather than as meaning "all
bytes through to the end of the file"."

On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com><mailto:mkjell...@internalcircle.com>>
 wrote:

Right, so in SSTableReader#GlobalTidy$tidy it does:
// don't ideally want to dropPageCache for the file until all instances
have been released
CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);

It seems to me every time the reference is released on a new sstable we
would immediately tidy() it and then call posix_fadvise with
POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
thinking is doing so in respect to the API behavior in modern Linux kernel
builds?). Am I reading things correctly here? Sorta hard as there are many
different code paths the reference could have tidy() called.

Why would we want to drop the segment we just write from the page cache --
wouldn't that most likely be the most hot data, and even if it turned out
not to be wouldn't it be better in this case to have kernel be smart at
what it's best at?

best,
kjellman

On Oct 18, 2016, at 8:50 AM, Jake Luciani 
<jak...@gmail.com<mailto:jak...@gmail.com><mailto:jaker
s...@gmail.com<mailto:s...@gmail.com>>> wrote:

The main point is to avoid keeping things in the page cache that are no
longer needed like compacted data that has been early opened elsewhere.

On Oct 18, 2016 11:29 AM, "Michael Kjellman" 
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>
<mailto:mkjell...@internalcircle.com>>
wrote:

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the
behavior of providing a length of 0 to posix_fadvise changed in some 2.6
kernels. We don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what
does it do or not do today -- but what problem was it added to solve and
what's the expected behavior regardless of kernel versions.

best,
kjellman

Sent from my iPhone





--
http://twitter.com/tjake

Re: Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman

Cool, as I would have assumed they would need to be. Given they were initially 
commented out on 6/30/15 maybe cleanup and removal of that dead code is still 
at least warranted.

On Oct 18, 2016, at 9:15 AM, Oleksandr Petrov 
> wrote:

Unit tests will be completely rewritten I suspect.

8099 Storage Format Documentation as used with PRIMARY_INDEX

2016-10-18 Thread Michael Kjellman

I'm working on writing Birch for trunk and I noticed the following:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java#L503

Prior to 3.0 the offset was the literal offset into the data file, yet now we 
seem to be doing the position encoded with the key (for all rows regardless of 
if they're > 64kb and thus have an index component) plus the serialized offset. 
I also see there is now a a "header" offset.

In RowIndexEntry there is:


/**
 * @return the offset to the start of the header information for this row.
 * For some formats this may not be the start of the row.
 */
public long headerOffset()
{
return 0;
}

/**
 * The length of the row header (partition key, partition deletion and static 
row).
 * This value is only provided for indexed entries and this method will throw
 * {@code UnsupportedOperationException} if {@code !isIndexed()}.
 */
public long headerLength()
{
throw new UnsupportedOperationException();
}


In 2.1 we stored the partition key, deletion, but not static row -- but we 
didn't need or use this so I'm guessing this is actually just to support static 
rows? Is there any further documentation around the header in other classes 
that I just haven't come across yet? Any thoughts on position + offset and why 
this behavior changed? Thanks

best,
kjellman

Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman

Within a single SegmentedFile?

On Oct 18, 2016, at 9:02 AM, Ariel Weisberg 
> wrote:

With compaction there can be hot and cold data mixed together.

Re: Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman

Gotcha, I didn't know we were actually bringing them back from the dead! 

That being said, won't the unit tests need to be re-writtten (or at least 
refactored) after your work? Couldn't we use /* */ comments instead of every 
single line one by one? Given we use source control couldn't we remove the dead 
code and get it from the revision history if we need it in the future?

> On Oct 18, 2016, at 8:18 AM, Oleksandr Petrov <oleksandr.pet...@gmail.com> 
> wrote:
> 
> I'm currently working on actually making Super Columns work in CQL context.
> Currently they do not really work[1].
> 
> It's not a very small piece of work. It was in the pipeline for some time,
> although there most likely were more important things that had to be worked
> on. I understand your disappointment and am sorry you stumbled upon this.
> But for now you may just disregard the commented tests. My branch is going
> to be ready for review soon.
> 
> [1] https://issues.apache.org/jira/browse/CASSANDRA-12373
> 
> 
> On Tue, Oct 18, 2016 at 5:10 PM Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> There was a bunch of tests hastily and messly commented out line by line
>> (*whyy?*) ColumnFamilyStoreTest with comments that they are pending
>> SuperColumns support post 8099.
>> 
>> Could those responsible please cleanup after themselves? It's been a while
>> since 8099 was committed in the first place and I don't see us adding Super
>> Column support at this point and the unit tests surly will need to be
>> rewritten anyways.
>> 
>> As my mother always said, pick your dirty wet towel in the hamper off the
>> floor and put it in the hamper please
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone
> 
> -- 
> Alex Petrov

Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman

Right, so in SSTableReader#GlobalTidy$tidy it does:
// don't ideally want to dropPageCache for the file until all instances have 
been released
CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);

It seems to me every time the reference is released on a new sstable we would 
immediately tidy() it and then call posix_fadvise with POSIX_FADV_DONTNEED with 
an offset of 0 and a length of 0 (which I'm thinking is doing so in respect to 
the API behavior in modern Linux kernel builds?). Am I reading things correctly 
here? Sorta hard as there are many different code paths the reference could 
have tidy() called.

Why would we want to drop the segment we just write from the page cache -- 
wouldn't that most likely be the most hot data, and even if it turned out not 
to be wouldn't it be better in this case to have kernel be smart at what it's 
best at?

best,
kjellman

> On Oct 18, 2016, at 8:50 AM, Jake Luciani <jak...@gmail.com> wrote:
> 
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
> 
> On Oct 18, 2016 11:29 AM, "Michael Kjellman" <mkjell...@internalcircle.com>
> wrote:
> 
>> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
>> fashion no comments were provided.
>> 
>> There is a check the OS is Linux (okay, a start) but it turns out the
>> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
>> kernels. We don't check the kernel version -- or even note it.
>> 
>> What is the *expected* outcome of our use of posix_fadvise -- not what
>> does it do or not do today -- but what problem was it added to solve and
>> what's the expected behavior regardless of kernel versions.
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone

Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman

Sure -- my bad, I aggregated them all of them up for you:
https://github.com/apache/cassandra/search?utf8=✓=CLibrary.trySkipCache=Code<https://github.com/apache/cassandra/search?utf8=%E2%9C%93=CLibrary.trySkipCache=Code>
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/test/unit/org/apache/cassandra/utils/CLibraryTest.java#L34
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/db/commitlog/MemoryMappedSegment.java#L102
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java#L218
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/hints/HintsWriter.java#L292
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/io/util/FileHandle.java#L167
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java#L174
https://github.com/apache/cassandra/blob/f2a354763877cfeaf1dd017b84a7c8ee9eafd885/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2281
https://github.com/apache/cassandra/blob/f2a354763877cfeaf1dd017b84a7c8ee9eafd885/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2282

Or if you use IDEA this should work pretty well too:
[cid:543B66BF-5E99-4227-A24D-1AB8C0341D97@localhost]

best,
kjellman


On Oct 18, 2016, at 8:33 AM, Benedict Elliott Smith 
<bened...@apache.org<mailto:bened...@apache.org>> wrote:

... and continuing in the fashion of behaviours one might like to disabuse
people of, no code link is provided.



On 18 October 2016 at 16:28, Michael Kjellman 
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>>
wrote:

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the
behavior of providing a length of 0 to posix_fadvise changed in some 2.6
kernels. We don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what
does it do or not do today -- but what problem was it added to solve and
what's the expected behavior regardless of kernel versions.

best,
kjellman

Sent from my iPhone

Use of posix_fadvise

2016-10-18 Thread Michael Kjellman

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra 
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the behavior 
of providing a length of 0 to posix_fadvise changed in some 2.6 kernels. We 
don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what does it 
do or not do today -- but what problem was it added to solve and what's the 
expected behavior regardless of kernel versions. 

best,
kjellman

Sent from my iPhone

Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman

There was a bunch of tests hastily and messly commented out line by line 
(*whyy?*) ColumnFamilyStoreTest with comments that they are pending 
SuperColumns support post 8099. 

Could those responsible please cleanup after themselves? It's been a while 
since 8099 was committed in the first place and I don't see us adding Super 
Column support at this point and the unit tests surly will need to be rewritten 
anyways. 

As my mother always said, pick your dirty wet towel in the hamper off the floor 
and put it in the hamper please

best,
kjellman

Sent from my iPhone

Re: Question on assert

2016-09-21 Thread Michael Kjellman

Yeah, I understand what you're saying, don't get me wrong.

However, I just spent close to a year total working and writing CASSANDRA-9754 
and when you're dealing with IO, sometimes asserts are the right way to go. I 
found putting them there are sanity checks mostly to ensure that code changes 
to other parts of the code don't have unexpected interactions with the input 
bounds expected by a method. I think asserts are fine (and correct) in these 
cases.


> On Sep 21, 2016, at 11:16 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> 
> You are essentially arguing, "if you turn off -ea your screwed" which is a
> symptom of a larger problem that I am pointing out.
> 
> Forget the "5%" thing. I am having a discussion about use of assert.
> 
> You have:
> 1) checked exceptions
> 2) unchecked exceptions
> 3) Error (like ioError which we sometime have to track)
> 
> The common case for assert is to only be used in testing. This is why -ea
> is off by default.
> 
> My point is that using assert as a Apache Cassandra specific "psuedo
> exception" seems problematic. I can point at tickets in the Cassandra Jira
> where the this is not trapped properly. It appears to me that having deal
> with a 4th "pseudo exception" is code smell.
> 
> Sometimes you see assert in place of a bounds check or a null check that
> you would never want to turn off. Other times it is uses as a quasi
> IllegalStateException. Other times an class named "estimator" asserts when
> the "estimate" "overflows". This seem far away from the defined purpose of
> assert.
> 
> The glaring issue is that it bubbles through try catch so it hardly makes
> me feel "safe" either on or off.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Sep 21, 2016 at 1:34 PM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> Asserts have their place as sanity checks. Just like exceptions have their
>> place.
>> 
>> They can both live in harmony and they both serve a purpose.
>> 
>> What doesn't serve a purpose is that comment encouraging n00b users to get
>> a mythical 5% performance increase and then get silent corruption when
>> their disk/io goes sideways and the asserts might have caught things before
>> it went really wrong.
>> 
>> Sent from my iPhone
>> 
>> On Sep 21, 2016, at 10:31 AM, Edward Capriolo <edlinuxg...@gmail.com
>> <mailto:edlinuxg...@gmail.com>> wrote:
>> 
>> " potential 5% performance win when you've corrupted all their data."
>> This is somewhat of my point. Why do assertions that sometimes are trapped
>> "protect my data" better then a checked exception?
>> 
>> On Wed, Sep 21, 2016 at 1:24 PM, Michael Kjellman <
>> mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:
>> 
>> I hate that comment with a passion. Please please please please do
>> yourself a favor and *always* run with asserts on. `-ea` for life. In
>> practice I'd be surprised if you actually got a reliable 5% performance win
>> and I doubt your customers will care about a potential 5% performance win
>> when you've corrupted all their data.
>> 
>> best,
>> kjellman
>> 
>> On Sep 21, 2016, at 10:21 AM, Edward Capriolo <edlinuxg...@gmail.com
>> <mailto:edlinuxg...@gmail.com>>
>> wrote:
>> 
>> There are a variety of assert usages in the Cassandra. You can find
>> several
>> tickets like mine.
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-12643
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-11537
>> 
>> Just to prove that I am not the only one who runs into these:
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-12484
>> 
>> To paraphrase another ticket that I read today and can not find,
>> "The problem is X throws Assertion which is not caught by the Exception
>> handler and it bubbles over and creates a thread death."
>> 
>> The jvm.properties file claims this:
>> 
>> # enable assertions.  disabling this in production will give a modest
>> # performance benefit (around 5%).
>> -ea
>> 
>> If assertions incur a "5% penalty" but are not always trapped what value
>> do
>> they add?
>> 
>> These are common sentiments about how assert should be used: (not trying
>> to
>> make this a this is what the internet says type debate)
>> 
>> http://stackoverflow.com/questions/2758224/what-does-
>> the-java-assert-keyword-do-and-when-should-it-be-used
>> 
>&

Re: Question on assert

2016-09-21 Thread Michael Kjellman

Asserts have their place as sanity checks. Just like exceptions have their 
place.

They can both live in harmony and they both serve a purpose.

What doesn't serve a purpose is that comment encouraging n00b users to get a 
mythical 5% performance increase and then get silent corruption when their 
disk/io goes sideways and the asserts might have caught things before it went 
really wrong.

Sent from my iPhone

On Sep 21, 2016, at 10:31 AM, Edward Capriolo 
<edlinuxg...@gmail.com<mailto:edlinuxg...@gmail.com>> wrote:

" potential 5% performance win when you've corrupted all their data."
This is somewhat of my point. Why do assertions that sometimes are trapped
"protect my data" better then a checked exception?

On Wed, Sep 21, 2016 at 1:24 PM, Michael Kjellman <
mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

I hate that comment with a passion. Please please please please do
yourself a favor and *always* run with asserts on. `-ea` for life. In
practice I'd be surprised if you actually got a reliable 5% performance win
and I doubt your customers will care about a potential 5% performance win
when you've corrupted all their data.

best,
kjellman

On Sep 21, 2016, at 10:21 AM, Edward Capriolo 
<edlinuxg...@gmail.com<mailto:edlinuxg...@gmail.com>>
wrote:

There are a variety of assert usages in the Cassandra. You can find
several
tickets like mine.

https://issues.apache.org/jira/browse/CASSANDRA-12643

https://issues.apache.org/jira/browse/CASSANDRA-11537

Just to prove that I am not the only one who runs into these:

https://issues.apache.org/jira/browse/CASSANDRA-12484

To paraphrase another ticket that I read today and can not find,
"The problem is X throws Assertion which is not caught by the Exception
handler and it bubbles over and creates a thread death."

The jvm.properties file claims this:

# enable assertions.  disabling this in production will give a modest
# performance benefit (around 5%).
-ea

If assertions incur a "5% penalty" but are not always trapped what value
do
they add?

These are common sentiments about how assert should be used: (not trying
to
make this a this is what the internet says type debate)

http://stackoverflow.com/questions/2758224/what-does-
the-java-assert-keyword-do-and-when-should-it-be-used

"Assertions
<http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.10>
(by
way of the *assert* keyword) were added in Java 1.4. They are used to
verify the correctness of an invariant in the code. They should never be
triggered in production code, and are indicative of a bug or misuse of a
code path. They can be activated at run-time by way of the -eaoption on
the
java command, but are not turned on by default."

http://stackoverflow.com/questions/1957645/when-to-use-
an-assertion-and-when-to-use-an-exception

"An assertion would stop the program from running, but an exception would
let the program continue running."

I look at how Cassandra uses assert and how it manifests in how the code
operates in production. Assert is something like semi-unchecked
exception.
All types of internal Util classes might throw it, downstream code is
essentially unaware and rarely specifically handles it. They do not
always
result in the hard death one would expect from an assert.

I know this is a ballpark type figure, but would "5% performance penalty"
be in the ballpark of a checked exception? Being that they tend to bubble
through things uncaught do they do more danger than good?

Re: Question on assert

2016-09-21 Thread Michael Kjellman

I hate that comment with a passion. Please please please please do yourself a 
favor and *always* run with asserts on. `-ea` for life. In practice I'd be 
surprised if you actually got a reliable 5% performance win and I doubt your 
customers will care about a potential 5% performance win when you've corrupted 
all their data.

best,
kjellman

> On Sep 21, 2016, at 10:21 AM, Edward Capriolo  wrote:
> 
> There are a variety of assert usages in the Cassandra. You can find several
> tickets like mine.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-12643
> 
> https://issues.apache.org/jira/browse/CASSANDRA-11537
> 
> Just to prove that I am not the only one who runs into these:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-12484
> 
> To paraphrase another ticket that I read today and can not find,
> "The problem is X throws Assertion which is not caught by the Exception
> handler and it bubbles over and creates a thread death."
> 
> The jvm.properties file claims this:
> 
> # enable assertions.  disabling this in production will give a modest
> # performance benefit (around 5%).
> -ea
> 
> If assertions incur a "5% penalty" but are not always trapped what value do
> they add?
> 
> These are common sentiments about how assert should be used: (not trying to
> make this a this is what the internet says type debate)
> 
> http://stackoverflow.com/questions/2758224/what-does-the-java-assert-keyword-do-and-when-should-it-be-used
> 
> "Assertions
>  (by
> way of the *assert* keyword) were added in Java 1.4. They are used to
> verify the correctness of an invariant in the code. They should never be
> triggered in production code, and are indicative of a bug or misuse of a
> code path. They can be activated at run-time by way of the -eaoption on the
> java command, but are not turned on by default."
> 
> http://stackoverflow.com/questions/1957645/when-to-use-an-assertion-and-when-to-use-an-exception
> 
> "An assertion would stop the program from running, but an exception would
> let the program continue running."
> 
> I look at how Cassandra uses assert and how it manifests in how the code
> operates in production. Assert is something like semi-unchecked exception.
> All types of internal Util classes might throw it, downstream code is
> essentially unaware and rarely specifically handles it. They do not always
> result in the hard death one would expect from an assert.
> 
> I know this is a ballpark type figure, but would "5% performance penalty"
> be in the ballpark of a checked exception? Being that they tend to bubble
> through things uncaught do they do more danger than good?

Re: Failing tests 2016-08-24 [cassandra-3.9]

2016-08-25 Thread Michael Kjellman

Awesome Joel

Sent from my iPhone

On Aug 24, 2016, at 8:22 PM, Joel Knighton 
> wrote:

===
testall: All passed!

===
dtest: 2 failures
 scrub_test.TestScrubIndexes.test_standalone_scrub
   CASSANDRA-12337. I've root-caused this; the failure is cosmetic
   but user-facing, so I plan on fixing this soon.

 commitlog_test.TestCommitLog.test_commitlog_replay_on_startup
   CASSANDRA-12213. This is still being analyzed.

===
novnode: All passed!

===
upgrade: All passed!

While it is somewhat due to the stars aligning such that our flaky tests
all didn't fail this run, it is very exciting to see an upgrade test run
with 0 failures. This is 50+ fewer failures than two weeks ago.

Re: Failing tests 2016-08-22 [cassandra-3.9]

2016-08-23 Thread Michael Kjellman

Looks like some very nice progress here! Mucho Exciting!! 

On Aug 22, 2016, at 10:44 PM, Joel Knighton 
> wrote:

===
testall: All passed!

===
dtest: 2 failures
 upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test
   Looks like a new, flaky failure. I'll follow up on this and get a ticket
   created tomorrow.

 materialized_views_test.TestMaterializedViews
 .add_dc_after_mv_network_replication_test
   CASSANDRA-12140. Known issue, still needs to be solved.

===
novnode: 6 failures
 6 failures in cql_tests.SlowQueryTester. This was a test regression
 quickly fixed in CASSANDRA-12514.

===
upgrade: 1 failure
 upgrade_tests.cql_tests
 .TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_3_x
 .bug_5732_test
 CASSANDRA-12457. A fix is in development.

Thanks for all the fish.

2016-08-19 Thread Michael Kjellman

Just wanted to say thank you publicly to Jonathan Ellis for his tireless work 
making this community and software what it is. He's always been level headed 
and I certainly wouldn't be where I am without his leadership.

So, Jonathan, thanks for all the fish.

best,
kjellman

Sent from my iPhone

Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Michael Kjellman

+1

Sent from my iPhone

On Aug 15, 2016, at 6:48 PM, Brandon Williams 
> wrote:

So will I, if that happens, which has never happened in the last ~7 years.

On Mon, Aug 15, 2016 at 4:27 PM, Jeff Jirsa 
>
wrote:

On 8/15/16, 2:15 PM, "Marvin Humphrey" 
> wrote:

Julian Hyde, who made the proposal, is active in the Apache Incubator ...
  I propose that when a JIRA is created, we send an email to both dev@
and
  issues@. This will be an extra 40 emails per month on the dev list.
I am
  really cautious about increasing the number of messages on the dev
list,
  because I think high-volume lists discourage part-time contributors,
but I
  think this change is worthwhile. It will make people aware of
  conversations that are happening and if it helps to channel
conversations
  onto JIRA cases it could possibly even REDUCE the volume on the dev
list.

That's a useful example. However, that's a project with 30-40 issues per
month (1300 over its lifetime) - Cassandra is sitting at 244 in the past 30
days, 12000 over its lifetime.

I think a lot of us part-time contributors appreciate efforts to increase
visibility and certainly welcome growing the project by making it easier to
recruit and retain more contributors, but is the noise of 10 more new email
threads per day going to get into the "high volume lists discourage
part-time contributors" range Julian discussed?

I'm a part time contributor. If this list gets ~10 threads per day with
2-3 replies each, I'm going to have to start filtering it out of necessity
(because I can't keep up with that volume).

Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Michael Kjellman

I get 2500+ emails a day and I don't filter dev as I like to stay engaged. If 
this list becomes too noisy everyone will just filter it into a black hole. Sad.

Sent from my iPhone

On Aug 15, 2016, at 3:05 PM, Russell Bradberry 
<rbradbe...@gmail.com<mailto:rbradbe...@gmail.com>> wrote:

So then what was the point of Ellis’s proposal, and this discussion, if there 
was never a choice in the matter in the first place?


On 8/15/16, 2:03 PM, "Chris Mattmann" 
<mattm...@apache.org<mailto:mattm...@apache.org>> wrote:

   I’m sorry but you are massively confused if you believe that the ASF mailing 
lists
   aren’t the source of truth. They are. That’s not optional. If you are an ASF 
project,
   mailing lists are the source of truth. Period.

   On 8/15/16, 11:01 AM, "Michael Kjellman" 
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

   I'm a big fan of mailing lists, but google makes issues very findable 
for new people to the project as JIRA gets indexed. They won't be able to find 
the same thing on an email they didn't get -- because they weren't in the 
project in the first place.

   Mailing lists are good for broad discussion or bringing specific issues 
to the attention of the broader community. It should never be the source of 
truth.

   best,
   kjellman

   Sent from my iPhone

   On Aug 15, 2016, at 2:57 PM, Chris Mattmann 
<mattm...@apache.org<mailto:mattm...@apache.org><mailto:mattm...@apache.org>> 
wrote:

   Realize it’s not just about committers and PMC members that are *already*
   on the PMC or that are developing the project. It’s about how to engage 
the
   *entire* community including those that are not yet on the committer or
   PMC roster. That is the future (and current) lifeblood of the project. 
The mailing
   list aren’t just an unfortunate necessity of being an Apache project. 
They *are*
   the lifeblood of the Apache project.



   On 8/15/16, 10:44 AM, "Brandon Williams" 
<dri...@gmail.com<mailto:dri...@gmail.com><mailto:dri...@gmail.com>> wrote:

  I too, use this method quite a bit, almost every single day.

  On Mon, Aug 15, 2016 at 12:43 PM, Yuki Morishita 
<mor.y...@gmail.com<mailto:mor.y...@gmail.com><mailto:mor.y...@gmail.com>> 
wrote:

   As an active committer, the most important thing for me is to be able
   to *look up* design discussion and decision easily later.

   I often look up the git history or CHANGES.txt for changes that I'm
   interested in, then look up JIRA by following JIRA ticket number
   written to the comment or text.
   If we move to dev mailing list, I would request to post permalink to
   that thread posted to JIRA, which I think is just one extra step that
   isn't necessary if we simply use JIRA.

   So, I'm +1 to just post JIRA link to dev list.


   On Mon, Aug 15, 2016 at 12:35 PM, Chris Mattmann 
<mattm...@apache.org<mailto:mattm...@apache.org><mailto:mattm...@apache.org>>
   wrote:
   This is a good outward flow of info to the dev list. However, there
   needs to be
   inward flow too – having the convo on the dev list will be a good start
   to that.
   I hope to see more inclusivity here.



   On 8/15/16, 10:26 AM, "Aleksey Yeschenko" 
<alek...@apache.org<mailto:alek...@apache.org><mailto:alek...@apache.org>> 
wrote:

  Well, if you read carefully what Jeremiah and I have just proposed,
   it wouldn’t be an issue.

  The notable major changes would start off on dev@ (think, a
   summary, a link to the JIRA, and maybe an attached spec doc).

  No need to follow the JIRA feed. Watch dev@ for those announcements
   and start watching the invidual JIRA tickets if interested.

  This creates the least amount of noise: you miss nothing important,
   and at the same time you won’t be receiving mail from
  dev@ for each individual comment - including those on proposals you
   don’t care about.

  We aren’t doing it currently, but we could, and probably should.

  --
  AY

  On 15 August 2016 at 18:22:36, Chris Mattmann 
(mattm...@apache.org<mailto:mattm...@apache.org><mailto:mattm...@apache.org>)
   wrote:

  Discussion belongs on the dev list. Putting discussion in JIRA, is
   fine, but realize,
  there is a lot of noise in that signal and people may or may not be
   watching
  the JIRA list. In fact, I don’t see JIRA sent to the dev list at all
   so you are basically
  forking the conversation to a high noise list by putting it all in
   JIRA.





  On 8/15/16, 10:11 AM, "Aleksey Yeschenko" 
<alek...@apache.org<mailto:alek...@apache.org

Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Michael Kjellman

I'm a big fan of mailing lists, but google makes issues very findable for new 
people to the project as JIRA gets indexed. They won't be able to find the same 
thing on an email they didn't get -- because they weren't in the project in the 
first place.

Mailing lists are good for broad discussion or bringing specific issues to the 
attention of the broader community. It should never be the source of truth.

best,
kjellman

Sent from my iPhone

On Aug 15, 2016, at 2:57 PM, Chris Mattmann 
> wrote:

Realize it’s not just about committers and PMC members that are *already*
on the PMC or that are developing the project. It’s about how to engage the
*entire* community including those that are not yet on the committer or
PMC roster. That is the future (and current) lifeblood of the project. The 
mailing
list aren’t just an unfortunate necessity of being an Apache project. They *are*
the lifeblood of the Apache project.

On 8/15/16, 10:44 AM, "Brandon Williams" 
> wrote:

   I too, use this method quite a bit, almost every single day.

   On Mon, Aug 15, 2016 at 12:43 PM, Yuki Morishita 
> wrote:

As an active committer, the most important thing for me is to be able
to *look up* design discussion and decision easily later.

I often look up the git history or CHANGES.txt for changes that I'm
interested in, then look up JIRA by following JIRA ticket number
written to the comment or text.
If we move to dev mailing list, I would request to post permalink to
that thread posted to JIRA, which I think is just one extra step that
isn't necessary if we simply use JIRA.

So, I'm +1 to just post JIRA link to dev list.

On Mon, Aug 15, 2016 at 12:35 PM, Chris Mattmann 
>
wrote:
This is a good outward flow of info to the dev list. However, there
needs to be
inward flow too – having the convo on the dev list will be a good start
to that.
I hope to see more inclusivity here.

On 8/15/16, 10:26 AM, "Aleksey Yeschenko" 
> wrote:

   Well, if you read carefully what Jeremiah and I have just proposed,
it wouldn’t be an issue.

   The notable major changes would start off on dev@ (think, a
summary, a link to the JIRA, and maybe an attached spec doc).

   No need to follow the JIRA feed. Watch dev@ for those announcements
and start watching the invidual JIRA tickets if interested.

   This creates the least amount of noise: you miss nothing important,
and at the same time you won’t be receiving mail from
   dev@ for each individual comment - including those on proposals you
don’t care about.

   We aren’t doing it currently, but we could, and probably should.

   --
   AY

   On 15 August 2016 at 18:22:36, Chris Mattmann 
(mattm...@apache.org)
wrote:

   Discussion belongs on the dev list. Putting discussion in JIRA, is
fine, but realize,
   there is a lot of noise in that signal and people may or may not be
watching
   the JIRA list. In fact, I don’t see JIRA sent to the dev list at all
so you are basically
   forking the conversation to a high noise list by putting it all in
JIRA.

   On 8/15/16, 10:11 AM, "Aleksey Yeschenko" 
>
wrote:

   I too feel like it would be sufficient to announce those major JIRAs
on the dev@ list, but keep all discussion itself to JIRA, where it
belongs.

   You don’t need to follow every ticket this way, just subscribe to
dev@ and then start watching the select major JIRAs you care about.

   --
   AY

   On 15 August 2016 at 18:08:20, Jeremiah D Jordan (
jeremiah.jor...@gmail.com) wrote:

   I like keeping things in JIRA because then everything is in one
place, and it is easy to refer someone to it in the future.
   But I agree that JIRA tickets with a bunch of design discussion and
POC’s and such in them can get pretty long and convoluted.

   I don’t really like the idea of moving all of that discussion to
email which makes it has harder to point someone to it. Maybe a better idea
would be to have a “design/POC” JIRA and an “implementation” JIRA. That way
we could still keep things in JIRA, but the final decision would be kept
“clean”.

   Though it would be nice if people would send an email to the dev
list when proposing “design” JIRA’s, as not everyone has time to follow
every JIRA ever made to see that a new design JIRA was created that they
might be interested in participating on.

   My 2c.

   -Jeremiah

On Aug 15, 2016, at 9:22 AM, Jonathan Ellis 
>
wrote:

A long time ago, I was a proponent of keeping most development
discussions
on Jira, where tickets can be self contained and the threadless
nature
helps keep discussions from getting sidetracked.

But Cassandra was a lot smaller then, and as we've grown

Re: Jira down, again?

2016-06-20 Thread Michael Kjellman

Hm... weird, JIRA isn't working again? S bizarre!! 

> On Jun 15, 2016, at 5:38 PM, Michael Kjellman <mkjell...@internalcircle.com> 
> wrote:
> 
> down. again.
> 
>> On Jun 14, 2016, at 11:14 AM, Alex Popescu <al...@datastax.com> wrote:
>> 
>> I've been trying to get to a ticket for the last 2h and I only get service
>> unavailable :-(
>> 
>> On Tue, Jun 14, 2016 at 10:26 AM, Michael Kjellman <
>> mkjell...@internalcircle.com> wrote:
>> 
>>> and, it's down again. :(
>>> 
>>>> On Jun 14, 2016, at 4:48 AM, Dave Brosius <dbros...@apache.org> wrote:
>>>> 
>>>> They are aware of these things
>>>> 
>>>> https://twitter.com/infrabot <https://twitter.com/infrabot>
>>>> 
>>>> On 06/14/2016 05:28 AM, Giampaolo Trapasso wrote:
>>>>> Hi to all,
>>>>> at the moment is the same for me. Is there a way to notify to someone
>>> this
>>>>> situation?
>>>>> 
>>>>> Giampaolo
>>>>> 
>>>>> 2016-06-13 23:27 GMT+02:00 Mahdi Mohammadi <mah...@gmail.com>:
>>>>> 
>>>>>> And when it is not down, it is very slow for me.
>>>>>> 
>>>>>> Do others have the same experience?
>>>>>> 
>>>>>> Best Regards
>>>>>> 
>>>>>> On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams <dri...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Everyone.
>>>>>>> 
>>>>>>> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
>>>>>>> mkjell...@internalcircle.com> wrote:
>>>>>>> 
>>>>>>>> Seems like Apache Jira is 100% down, again, for like the 500th time
>>> in
>>>>>>> the
>>>>>>>> last 2 months. Just me or everyone?
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Bests,
>> 
>> Alex Popescu | @al3xandru
>> Sen. Product Manager @ DataStax
>> 
>> <http://cassandrasummit.org/Email_Signature>
>> 
>> » DataStax Enterprise - the database for cloud applications. «
>

Re: Jira down, again?

2016-06-15 Thread Michael Kjellman

down. again.

> On Jun 14, 2016, at 11:14 AM, Alex Popescu <al...@datastax.com> wrote:
> 
> I've been trying to get to a ticket for the last 2h and I only get service
> unavailable :-(
> 
> On Tue, Jun 14, 2016 at 10:26 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> and, it's down again. :(
>> 
>>> On Jun 14, 2016, at 4:48 AM, Dave Brosius <dbros...@apache.org> wrote:
>>> 
>>> They are aware of these things
>>> 
>>> https://twitter.com/infrabot <https://twitter.com/infrabot>
>>> 
>>> On 06/14/2016 05:28 AM, Giampaolo Trapasso wrote:
>>>> Hi to all,
>>>> at the moment is the same for me. Is there a way to notify to someone
>> this
>>>> situation?
>>>> 
>>>> Giampaolo
>>>> 
>>>> 2016-06-13 23:27 GMT+02:00 Mahdi Mohammadi <mah...@gmail.com>:
>>>> 
>>>>> And when it is not down, it is very slow for me.
>>>>> 
>>>>> Do others have the same experience?
>>>>> 
>>>>> Best Regards
>>>>> 
>>>>> On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams <dri...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Everyone.
>>>>>> 
>>>>>> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
>>>>>> mkjell...@internalcircle.com> wrote:
>>>>>> 
>>>>>>> Seems like Apache Jira is 100% down, again, for like the 500th time
>> in
>>>>>> the
>>>>>>> last 2 months. Just me or everyone?
>>> 
>> 
>> 
> 
> 
> -- 
> Bests,
> 
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
> 
> <http://cassandrasummit.org/Email_Signature>
> 
> » DataStax Enterprise - the database for cloud applications. «

Re: NewBie Question

2016-06-15 Thread Michael Kjellman

This was forwarded to me yesterday... a helpful first step 
https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md

> On Jun 15, 2016, at 9:54 AM, Jonathan Haddad  wrote:
> 
> Maybe some brave soul will document the 3.0 on disk format as part of
> https://issues.apache.org/jira/browse/CASSANDRA-8700.
> 
> On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford 
> wrote:
> 
>> Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
>> engine.
>> 
>> 
>> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
>> 
>> On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey 
>> wrote:
>> 
 http://wiki.apache.org/cassandra/ArchitectureSSTable
>>> 
>>> Be aware that this page hasn't been updated since 2013, so it doesn't
>>> reflect any changes to the SSTable format since then, including the
>>> new storage engine introduced in 3.0 (see CASSANDRA-8099).
>>> 
>>> That said, I believe the linked Apache wiki page is the best
>>> documentation for the format. Unfortunately, if you want a better or
>>> more current understanding, you'll have to read the code and read some
>>> SSTables.
>>> 
>>

Re: Jira down, again?

2016-06-14 Thread Michael Kjellman

and, it's down again. :(

> On Jun 14, 2016, at 4:48 AM, Dave Brosius <dbros...@apache.org> wrote:
> 
> They are aware of these things
> 
> https://twitter.com/infrabot <https://twitter.com/infrabot>
> 
> On 06/14/2016 05:28 AM, Giampaolo Trapasso wrote:
>> Hi to all,
>> at the moment is the same for me. Is there a way to notify to someone this
>> situation?
>> 
>> Giampaolo
>> 
>> 2016-06-13 23:27 GMT+02:00 Mahdi Mohammadi <mah...@gmail.com>:
>> 
>>> And when it is not down, it is very slow for me.
>>> 
>>> Do others have the same experience?
>>> 
>>> Best Regards
>>> 
>>> On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams <dri...@gmail.com>
>>> wrote:
>>> 
>>>> Everyone.
>>>> 
>>>> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
>>>> mkjell...@internalcircle.com> wrote:
>>>> 
>>>>> Seems like Apache Jira is 100% down, again, for like the 500th time in
>>>> the
>>>>> last 2 months. Just me or everyone?
>

Jira down, again?

2016-06-13 Thread Michael Kjellman

Seems like Apache Jira is 100% down, again, for like the 500th time in the last 
2 months. Just me or everyone?

Re: NewBie Question ~ Book for Cassandra

2016-06-13 Thread Michael Kjellman

Bhuvan,

You didn't disrespect anyone, so please don't apologize! Appreciate your 
positive and helpful comment for the OP :) 

best,
kjellman

> On Jun 13, 2016, at 8:50 AM, Bhuvan Rawal  wrote:
> 
> Hi Matt,
> 
> I suggested the resources keeping in mind the ease with which one can
> learn. My idea was not to disrespect Apache or community in any form, it
> was just to facilitate learning of a Newbie.
> While having a good wiki would be amazing and I believe we all agree on
> this Thread that current Documentation has a lot of scope for improvement.
> And I'm completely willing to contribute in whatever way possible to the
> docs and getting it reviewed.
> 
> Best Regards,
> Bhuvan
> 
> On Mon, Jun 13, 2016 at 8:17 PM, Eric Evans 
> wrote:
> 
>> On Mon, Jun 13, 2016 at 8:05 AM, Mattmann, Chris A (3980)
>>  wrote:
>>> However also see that besides the current documentation, there needs to
>> be
>>> a roadmap for making Apache Cassandra and *its* documentation (not
>> *DataStax’s*)
>>> up to par for a basic user to build, deploy and run Cassandra. I don’t
>> think that’s
>>> the current case, is it?
>> 
>> There is CASSANDRA-8700
>> (https://issues.apache.org/jira/browse/CASSANDRA-8700), which is a
>> step in this direction I hope.
>> 
>> One concern I do have though is that changing the tech used to
>> author/publish documentation won't in itself be enough to get good
>> docs.  In fact, moving the docs in-tree raises the barrier to
>> contribution in the sense that instead of mashing 'Edit', you have to
>> put together a patch and have it reviewed.
>> 
>> That said, I also think that we've historically set the bar way too
>> high to committer/PMC, and that this may be an opportunity to change
>> that; There ought to be a path to the PMC for documentation authors
>> and translators (and this is typical in other projects).  So, I will
>> personally do my best to set aside some time each week to review and
>> merge documentation changes, and to champion regular doc contributors
>> for committership.  Hopefully there are others willing to do the same!
>> 
>> 
>> --
>> Eric Evans
>> john.eric.ev...@gmail.com
>>

Re: Java Driver 3.0 for Apache Cassandra - Documentation Outdated?

2016-06-06 Thread Michael Kjellman

I think it comes down to having full time tech writers employed and paid. If 
Datastax has the $$ to provide a significant benefit to the community (well 
thought out documentation) that's better than little or no documentation (if it 
was only done via developers who most likely won't document or do a poor job at 
documentation).

Having some documentation is much better for the community than the alternative 
that "the code is the documentation".

Nothing is free.

On Jun 6, 2016, at 4:50 PM, Chris Mattmann 
<mattm...@apache.org<mailto:mattm...@apache.org>> wrote:

Excellent, why am I the first person to ask that, and why didn’t
a PMC member point that out right away and why did it take me asking
to point to the Apache docs.

This is what I am talking about in terms of the Apache community..





On 6/6/16, 4:47 PM, "Michael Kjellman" 
<mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

http://cassandra.apache.org/doc/cql3/CQL.html

On Jun 6, 2016, at 4:42 PM, Mattmann, Chris A (3980) 
<chris.a.mattm...@jpl.nasa.gov<mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:

Hi,

So, the core documentation for a key part of Cassandra is hosted
at DataStax?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov<mailto:chris.a.mattm...@nasa.gov>
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/6/16, 7:32 AM, "Mahdi Mohammadi" 
<mah...@gmail.com<mailto:mah...@gmail.com>> wrote:

Team,

I was checking the documentation for TupleType in DataStax docs here
<https://docs.datastax.com/en/latest-java-driver/java-driver/reference/tupleTypes.html>
and
the code example was like this:

TupleType theType = TupleType.of(DataType.cint(), DataType.text(),
DataType.cfloat());


But in the code, the *TupleType.of* has two additional parameters not
mentioned in the documentation:


*public static TupleType of(ProtocolVersion protocolVersion, CodecRegistry
codecRegistry, DataType... types)*

Maybe I am looking in the wrong place. Could someone please explain how can
I instantiate a *TupleType*?

I have the same question for *Map* type.

Thanks for your help.

===
Best Regards

Re: Cassandra Java Driver and DataStax

2016-06-04 Thread Michael Kjellman

No need to argue your point to me anymore. I've already tuned you out.

These are good people who I consider my friends and insulting people just shows 
your arguments really have no merit. 

Good luck with your new driver contribution! I look forward to reviewing the 
code. 

Sent from my iPhone

> On Jun 4, 2016, at 10:10 AM, James Carman  wrote:
> 
> I apologized else-thread about that one.  It was a low blow.  Anyway, to
> answer your question. The Cassandra community wins!  How do we know if they
> won't make you pay for the driver in the future (after all your code is
> written against it)?  It has happened before.  Also, the rest of the
> community can have a say in the direction (because that's the Apache Way).
> The driver can be more intimate with the database, because it's the same
> people developing it.
> 
>> On Sat, Jun 4, 2016 at 1:06 PM Aleksey Yeschenko  wrote:
>> 
>> An eloquent and powerful response, but please, reply to my points instead
>> of resorting to ad hominem arguments.
>> 
>> In practical terms, who would benefit from such a merge, and who is
>> suffering from the current state of affairs?
>> 
>> --
>> AY
>> 
>> On 4 June 2016 at 18:03:05, James Carman (ja...@carmanconsulting.com)
>> wrote:
>> 
>> "Sr. Software Engineer at DataStax", imagine that.
>> 
>> On Sat, Jun 4, 2016 at 1:01 PM Aleksey Yeschenko 
>> wrote:
>> 
>>> As a member of that governing body (Cassandra PMC), I would much prefer
>>> not to deal with the drivers as well.
>>> 
>>> And I’m just as certain that java-driver - and other driver communities -
>>> would much rather prefer to keep their process and organisation instead
>> of
>>> being forced to conform to ours.
>>> 
>>> I’m finding it hard to see a single party that would benefit from such a
>>> merge, and who suffers from the current state of things.
>>> 
>>> --
>>> AY
>>> 
>>> On 4 June 2016 at 17:46:48, James Carman (ja...@carmanconsulting.com)
>>> wrote:
>>> 
>>> How does it add more complexity by having one governing body (the PMC)?
>>> What I am suggesting is that the driver project be somewhat of a
>> subproject
>>> or a "module". It can still have its own life cycle, just like it does
>> now.
>>> 
>>> On Sat, Jun 4, 2016 at 12:44 PM Nate McCall 
>>> wrote:
>>> 
 It doesnt. But then we add complexity in communicating and managing
 versions, releases, etc. to the project. Again, from my experience with
 hector, I just didnt want the hassle of owning that within the project
 confines.
 
 On Sat, Jun 4, 2016 at 11:30 AM, James Carman <
>>> ja...@carmanconsulting.com>
 wrote:
 
> Who said the driver has to be released with the database?
> 
> On Sat, Jun 4, 2016 at 12:29 PM Nate McCall 
> wrote:
> 
>> On Sat, Jun 4, 2016 at 10:05 AM, James Carman <
> ja...@carmanconsulting.com>
>> wrote:
>> 
>>> So why not just donate the Java driver and keep that in house?
> Cassandra
>> is
>>> a Java project. Makes sense to me.
>>> 
>>> 
>> I won't deny there is an argument to be made here, but as a former
 client
>> maintainer (Hector), current ASF committer (Usergrid) and active
> community
>> member since late 2009, my opinion is that this would be a step
> backwards.
>> 
>> Maintaining Hector independently allowed me the freedom to release
 major
>> features with technology that I wanted to use while maintaining
 backwards
>> compatibility without having to be bound to the project's release
>>> cycle
> and
>> process. (And to use a build system that didnt suck).
>> 
>> The initial concern of the use of the word "controls" is *super*
>> not
 cool
>> and I hope that this is being fixed. That said, the reality, from
>> my
>> (external to DataStax) perspective, is that this is not the case. I
 like
>> the current project separation the way it is and don't feel like
>>> there
 is
>> any attempt at "control" of the java driver's direction and
 development.
>> 
>> -Nate
>> 
> 
 
 
 
 --
 -
 Nate McCall
 Austin, TX
 @zznate
 
 CTO
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
>>> 
>>

Re: Cassandra 2.0.x OOM during startsup - schema version inconsistency after reboot

2016-05-08 Thread Michael Kjellman

I'd recommend you create a JIRA! That way you can get some traction on the 
issue. Obviously an OOM is never correct, even if your process is wrong in some 
way!

Best,
kjellman 

Sent from my iPhone

> On May 8, 2016, at 8:48 PM, Michael Fong  
> wrote:
> 
> Hi, all,
> 
> 
> Haven't heard any responses so far, and this isue has troubled us for quite 
> some time. Here is another update:
> 
> We have noticed several times that The schema version may change after 
> migration and reboot:
> 
> Here is the scenario:
> 
> 1.   Two node cluster (1 & 2).
> 
> 2.   There are some schema changes, i.e. create a few new columnfamily. 
> The cluster will wait until both nodes have schema version in sync (describe 
> cluster) before moving on.
> 
> 3.   Right before node2 is rebooted, the schema version is consistent; 
> however, after ndoe2 reboots and starts servicing, the MigrationManager would 
> gossip different schema version.
> 
> 4.   Afterwards, both nodes starts exchanging schema  message 
> indefinitely until one of the node dies.
> 
> We currently suspect the change of schema is due to replying the old entry in 
> commit log. We wish to continue dig further, but need experts help on this.
> 
> I don't know if anyone has seen this before, or if there is anything wrong 
> with our migration flow though..
> 
> Thanks in advance.
> 
> Best regards,
> 
> 
> Michael Fong
> 
> From: Michael Fong [mailto:michael.f...@ruckuswireless.com]
> Sent: Thursday, April 21, 2016 6:41 PM
> To: u...@cassandra.apache.org; dev@cassandra.apache.org
> Subject: RE: Cassandra 2.0.x OOM during bootstrap
> 
> Hi, all,
> 
> Here is some more information on before the OOM happened on the rebooted node 
> in a 2-node test cluster:
> 
> 
> 1.   It seems the schema version has changed on the rebooted node after 
> reboot, i.e.
> Before reboot,
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> 
> After rebooting node 2,
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) 
> Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> 
> 
> 
> 2.   After reboot, both nods repeatedly send MigrationTask to each other 
> - we suspect it is related to the schema version (Digest) mismatch after Node 
> 2 rebooted:
> The node2  keeps submitting the migration task over 100+ times to the other 
> node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node 
> /192.168.88.33 has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) 
> Updating topology for /192.168.88.33
> INFO [GossipStage:1] 2016-04-19 11:18:18,263 StorageService.java (line 1544) 
> Node /192.168.88.33 state jump to normal
> INFO [GossipStage:1] 2016-04-19 11:18:18,264 TokenMetadata.java (line 414) 
> Updating topology for /192.168.88.33
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 
> 102) Submitting migration task for /192.168.88.33
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 
> 102) Submitting migration task for /192.168.88.33
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) 
> Can't send schema pull request: node /192.168.88.33 is down.
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) 
> Can't send schema pull request: node /192.168.88.33 is down.
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 
> 977) removing expire time for endpoint : /192.168.88.33
> INFO [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 MigrationManager.java 
> (line 102) Submitting migration task for /192.168.88.33
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 977) removing expire time for endpoint : /192.168.88.33
> INFO [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 MigrationManager.java 
> (line 102) Submitting migration task for /192.168.88.33
> DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 977) removing expire time for endpoint : /192.168.88.33
> INFO [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP
> DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,356 MigrationManager.java 
> (line 102) Submitting migration task for /192.168.88.33
> .
> 
> 
> On the otherhand, Node 1 keeps updating its gossip information, followed by 
> receiving and submitting

Re: [Proposal] Mandatory comments

2016-05-05 Thread Michael Kjellman

My vote is to start with BigTableScanner (SSTableScanner).. 5 iterators that 
all do something different with each other depending on how used with zero 
comments -- in a critical code path. What could go wrong!

> On May 5, 2016, at 11:26 AM, Dave Brosius  wrote:
> 
> A less controversial tact would be to actively solicit input from 
> contributors, etc, about what methods/classes are confusing, and put those 
> classes/methods on a priority list for adding good javadoc. When that list 
> goes to ~0, you've probably done enough.
> 
> The key tho is to actively solicit, and make it easy to do so. It's important 
> to differentiate the list being 0 because you've done a good job, and 0 
> because people didn't know about it, or was to difficult to ask for.
> 
> --dave
> 
> ---
> 
> 
> On 2016-05-05 11:46, Jack Krupansky wrote:
>> FWIW, I recently wrote up a bunch of notes on Code Quality and published
>> them on Medium. There are notes on comments and consistency and boilerplate
>> buried in there.
>> WARNING: There's a lot of stuff there and it is not for the  faint of heart
>> or those not truly committed to code quality.
>> tl;dr - I'm not a fan of boiler plate just to say you did something, but...
>> I am a fan of consistency, but that doesn't mean every situation is the
>> same, just that similar situations should be treated similarly - unless
>> there is some reasonable reason to do otherwise.
>> See:
>> https://medium.com/@jackkrupansky/code-quality-preamble-932626a3131c#.ynrjbryus
>> https://medium.com/@jackkrupansky/software-and-product-quality-notes-no-1-346ab1d8df24#.xzg1ihuxb
>> https://medium.com/@jackkrupansky/code-quality-notes-no-1-4dc522a5e29c#.cm7tan2zu
>> https://medium.com/@jackkrupansky/code-quality-notes-no-2-7939377b73c6#.zco8oq3dj
>> -- Jack Krupansky
>> On Thu, May 5, 2016 at 10:55 AM, Eric Evans 
>> wrote:
>>> On Wed, May 4, 2016 at 12:14 PM, Jonathan Ellis  wrote:
>>> > On Wed, May 4, 2016 at 2:27 AM, Sylvain Lebresne 
>>> > wrote:
>>> >
>>> >> On Tue, May 3, 2016 at 6:57 PM, Eric Evans 
>>> >> wrote:
>>> >>
>>> >> > On Mon, May 2, 2016 at 11:26 AM, Sylvain Lebresne <
>>> sylv...@datastax.com>
>>> >> > wrote:
>>> >> > > Looking forward to other's opinions and feedbacks on this proposal.
>>> >> >
>>> >> > We might want to leave just a little wiggle room for judgment on the
>>> >> > part of the reviewer, for the very simple cases.  Documenting
>>> >> > something like setFoo(int) with "Sets foo" can get pretty tiresome for
>>> >> > everyone, and doesn't add any value.
>>> >> >
>>> >>
>>> >> I knew someone was going to bring this :). In principle, I don't really
>>> >> disagree. In practice though,
>>> >> I suspect it's sometimes just easier to adhere to such simple rule
>>> somewhat
>>> >> strictly. In particular,
>>> >> I can guarantee that we don't all agree where the border lies between
>>> what
>>> >> warrants a javadoc
>>> >> and what doesn't. Sure, there is a few cases where you're just
>>> paraphrasing
>>> >> the method name
>>> >> (and while it might often be the case for getters and setters, it's
>>> worth
>>> >> noting that we don't really
>>> >> do much of those in C*), but how hard is it to write a one line comment?
>>> >> Surely that's a negligeable
>>> >> part of writing a patch and we're not that lazy.
>>> >>
>>> >
>>> > I'm more concerned that this kind of boilerplate commenting obscures
>>> rather
>>> > than clarifies.  When I'm reading code i look for comments to help me
>>> > understand key points, points that aren't self-evident.  If we institute
>>> a
>>> > boilerplate "comment everything" rule then I lose that signpost.
>>> This.
>>> Additionally you could also probably argue that it obscures the true
>>> purpose to leaving a comment; It becomes a check box to tick, having
>>> some javadoc attached to every method, rather than genuinely looking
>>> for the value that could be added with quality comments (or even
>>> altering the approach so that the code is more obvious in the absence
>>> of them).
>>> The reason I suggested "wiggle room", is that I think everyone
>>> basically agrees that the default should be to leave good comments
>>> (and that that hasn't been the case), that we should start making this
>>> a requirement to successful review, and that we can afford to leave
>>> some room for judgment on the part of the reviewer.  Worse-case is
>>> that we find in doing so that there isn't much common ground on what
>>> constitutes a quality comment versus useless boilerplate, and that we
>>> have to remove any wiggle room and make it 100% mandatory (I don't
>>> think that will (has to) be the case, though).
>>> --
>>> Eric Evans
>>> john.eric.ev...@gmail.com

Re: A short guid on how to contribute patches to Cassandra

2016-02-09 Thread Michael Kjellman

Move to Wiki?

Sent from my iPhone

> On Feb 9, 2016, at 5:59 AM, Aleksey Yeschenko  wrote:
> 
> Hello everyone,
> 
> I’ve compiled a short guide for contributors (who aren’t committers yet) 
> about how to properly contribute Cassandra patches:
> 
> https://docs.google.com/document/d/1d_AzYQo74de9utbbpyXxW2w-b0__sFhC7b24ibiJqjw/edit?usp=sharing
> 
> Following the outlined recommendations make the lives of committers much 
> easier, without adding much hassle to contributor process.
> 
> Follow the steps and feel the love.
> 
> -- 
> AY

Re: Versioning policy?

2016-01-16 Thread Michael Kjellman

Correct, this is an open source project. 

If you want a Enterprise support story Datastax has an Enterprise option for 
you. 

> On Jan 16, 2016, at 11:19 AM, Anuj Wadehra  wrote:
> 
> Hi Jonathan
> 
> It would be really nice if you could share your thoughts on the four points 
> raised regarding the Cassandra EOL process. I think similar things happen for 
> other open source products and it would be really nice if we could streamline 
> such things for Apache Cassandra.
> 
> ThanksAnuj
> 
> Sent from Yahoo Mail on Android 
> 
>  On Thu, 14 Jan, 2016 at 11:28 pm, Anuj Wadehra 
> wrote:   Hi Jonathan,
> Thanks for the crisp communication regarding the tick tock release & EOL.
> I think its worth considering some points regarding EOL policy and it would 
> be great if you can share your thoughts on below points:
> 1.  EOL of a release should be based on "most stable"/"production ready" 
> version date rather than "GA" date of subsequent major releases.
> 2.  I think we should have "Formal EOL Announcement" on Apache Cassandra 
> website.  
> 3. "Formal EOL Announcement" should come at least 6 months before the EOL, so 
> that users get reasonable time to  upgrade.
> 4. EOL Policy (even if flexible) should be stated on Apache Cassandra website
> 
> EOL thread on users mailing list ended with the conclusion of raising a 
> Wishlist JIRA but I think above points are more about working on policy and 
> processes rather than just a wish list. 
> 
> ThanksAnuj
> 
> 
> 
> Sent from Yahoo Mail on Android 
> 
>   On Thu, 14 Jan, 2016 at 10:57 pm, Jonathan Ellis wrote:  
> Hi Maciek,
> 
> First let's talk about the tick-tock series, currently 3.x.  This is pretty
> simple: outside of the regular monthly releases, we will release fixes for
> critical bugs against the most recent bugfix release, the way we did
> recently with 3.1.1 for CASSANDRA-10822 [1].  No older tick-tock releases
> will be patched.
> 
> Now, we also have three other release series currently being supported:
> 
> 2.1.x: supported with critical fixes only until 4.0 is released, projected
> in November 2016 [2]
> 2.2.x: maintained until 4.0 is released
> 3.0.x: maintained for 6 months after 4.0, i.e. projected until May 2017
> 
> I will add this information to the releases page [3].
> 
> [1]
> https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201512.mbox/%3CCAKkz8Q3StqRFHfMgCMRYaaPdg+HE5N5muBtFVt-=v690pzp...@mail.gmail.com%3E
> [2] 4.0 will be an ordinary tick-tock release after 3.11, but we will be
> sunsetting deprecated features like Thrift so bumping the major version
> seems appropriate
> [3] http://cassandra.apache.org/download/
> 
>> On Sun, Jan 10, 2016 at 9:29 PM, Maciek Sakrejda  wrote:
>> 
>> There was a discussion recently about changing the Cassandra EOL policy on
>> the users list [1], but it didn't really go anywhere. I wanted to ask here
>> instead to clear up the status quo first. What's the current versioning
>> policy? The tick-tock versioning blog post [2] states in passing that two
>> major releases are maintained, but I have not found this as an official
>> policy stated anywhere. For comparison, the Postgres project lays this out
>> very clearly [3]. To be clear, I'm not looking for any official support,
>> I'm just asking for clarification regarding the maintenance policy: if a
>> critical bug or security vulnerability is found in version X.Y.Z, when can
>> I expect it to be fixed in a bugfix patch to that major version, and when
>> do I need to upgrade to the next major version.
>> 
>> [1]: http://www.mail-archive.com/user@cassandra.apache.org/msg45324.html
>> [2]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
>> [3]: http://www.postgresql.org/support/versioning/
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-11 Thread Michael Kjellman

Last I checked — and I could be wrong — we’ve never had to think about what to 
number a Cassandra version due to a ticket that could “impact” our users so 
dramatically due to the scope of the changes from a single ticket. Food for 
thought.

love,
kjellman

 On May 11, 2015, at 2:20 PM, Alex Popescu al...@datastax.com wrote:
 
 On Mon, May 11, 2015 at 2:16 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 
 I'm not sure if the complications surrounding the versioning of the drivers
 should be factored into the releases of Cassandra.
 
 
 I agree. If we could come up with a versioning scheme that would also work
 for drivers, that would be
 the ideal case as it will prove quite helpful to our users.
 
 
 I think that 3.0
 signals a massive change and calling the release containing 8099 a .1 would
 be drastically underplaying how big of a release it is - from the
 perspective of the end user it would be a disservice.
 
 
 I see. My last suggestion could work though as it signals both releases
 having significant impact.
 
 
 
 
 On Mon, May 11, 2015 at 2:09 PM Jonathan Ellis jbel...@gmail.com wrote:
 
 I do like 2.2 and 3.0 over 3.0 and 3.1 because going from 2.x to 3.x
 signals that 8099 really is a big change.
 
 On Mon, May 11, 2015 at 3:28 PM, Alex Popescu al...@datastax.com
 wrote:
 
 On Sun, May 10, 2015 at 2:14 PM, Robert Stupp sn...@snazy.de wrote:
 
 Instead of labeling it 2.2, I’d like to propose to label it 3.0 (so
 basically just move 8099 to 3.1).
 In the end it’s ”only a label”. But there are a lot of new
 user-facing
 features in it that justifies a major release.
 
 
 +1 on labeling the proposed 2.2 as 3.0 and moving (8099 to 3.1)
 
 1. Tons of new features that feel more than just a 2.2
 2. The majority of features planned for 3.0 are actually ready for this
 version
 3. in order to avoid compatiblity questions (and version compatibility
 matrices), the drivers developed by DataStax have
followed the Cassandra versions so far. The Python and C# drivers
 are
 already at 2.5 as they added some major features.
 
   Renaming the proposed 2.2 as 3.0 would allow us to continue to use
 this
 versioning policy until all drivers are supporting
   the latest Cassandra version and continue to not require a user to
 check
 a compatibility matrix.
 
 
 --
 Bests,
 
 Alex Popescu | @al3xandru
 Sen. Product Manager @ DataStax
 
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced
 
 
 
 
 
 -- 
 Bests,
 
 Alex Popescu | @al3xandru
 Sen. Product Manager @ DataStax

Re: 3.0 and the Cassandra release process

2015-03-18 Thread Michael Kjellman

For most of my life I’ve lived on the software bleeding edge both personally 
and professionally. Maybe it’s a personal weakness, but I guess I get a thrill 
out of the problem solving aspect?

Recently I came to a bit of an epiphany — the closer I keep to the daily build 
— generally the happier I am on a daily basis. Bugs happen, but for the most 
part (aside from show stopper bugs), pain points for myself in a given daily 
build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 
hours, and then life is better the next day again. In comparison, the old 
waterfall model generally means taking an “official” release at some point and 
waiting for some poor soul (or developer) to actually run the thing. No matter 
how good the QA team is, until it’s actually used in the real world, most bugs 
aren’t found.

If you and your organization can wait 24 hours * number of bugs discovered 
after people actually started using the thing, you end up with a “usable build” 
around the holy-grail minor X.X.5 release of Cassandra.

I love the idea of the LTS model Jonathan describes because it means more code 
can get real testing and “bake” for longer instead of sitting largely unused on 
some git repository in a datacenter far far away. A lot of code has changed 
between 2.0 and trunk today. The code has diverged to the point that if you 
write something for 2.0 (as the most stable major branch currently available), 
merging it forward to 3.0 or after generally means rewriting it. If the only 
thing that comes out of this is a smaller delta of LOC between the deployable 
version/branch and what we can develop against and what QA is focused on I 
think that’s a massive win.

Something like CASSANDRA-8099 will need 2x the baking time of even many of the 
more risky changes the project has made. While I wouldn’t want to run a build 
with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes 
blocked, most likely many containing new bugs of their own, but have no 
exposure at all to even the most involved C* developers.

I really think this will be a huge win for the project and I’m super thankful 
for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a 
much more sustainable release model for the entire community.

best,
kjellman

 
 On Mar 18, 2015, at 3:02 PM, Ariel Weisberg ariel.weisb...@datastax.com 
 wrote:
 
 Hi,
 
 Keep in mind it is a bug fix release every month and a feature release every 
 two months.
 
 For development that is really a two month cycle with all bug fixes being 
 backported one release. As a developer if you want to get something in a 
 release you have two months and you should be sizing pieces of large tasks so 
 they ship at least every two months.
 
 Ariel
 On Mar 18, 2015, at 5:58 PM, Terrance Shepherd tscana...@gmail.com wrote:
 
 I like the idea but I agree that every month is a bit aggressive. I have no
 say but:
 
 I would say 4 releases a year instead of 12. with 2 months of new features
 and 1 month of bug squashing per a release. With the 4th quarter just bugs.
 
 I would also proposed 2 year LTS releases for the releases after the 4th
 quarter. So everyone could get a new feature release every quarter and the
 stability of super major versions for 2 years.
 
 On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius dbros...@mebigfatguy.com
 wrote:
 
 It would seem the practical implications of this is that there would be
 significantly more development on branches, with potentially more
 significant delays on merging these branches. This would imply to me that
 more Jenkins servers would need to be set up to handle auto-testing of more
 branches, as if feature work spends more time on external branches, it is
 then likely to be be less tested (even if by accident) as less developers
 would be working on that branch. Only when a feature was blessed to make it
 to the release-tracked branch, would it become exposed to the majority of
 developers/testers, etc doing normal running/playing/testing.
 
 This isn't to knock the idea in anyway, just wanted to mention what i
 think the outcome would be.
 
 dave
 
 
 
 
 On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 Cassandra 2.1 was released in September, which means that if we were
 on
 track with our stated goal of six month releases, 3.0 would be done
 about
 now.  Instead, we haven't even delivered a beta.  The immediate cause
 this
 time is blocking for 8099
 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the
 reality
 is
 that nobody should really be surprised.  Something always comes up --
 we've
 averaged about nine months since 1.0, with 2.1 taking an entire year.
 
 We could make theory align with reality by acknowledging, if nine
 months
 is our 'natural' release schedule, then so be it.  But I think we
 can
 do
 better.
 
 Broadly speaking, we have two constituencies with Cassandra releases:
 
 First, we have the users who are building or

Re: 3.0 and the Cassandra release process

2015-03-17 Thread Michael Kjellman

❤️ it. +1

-kjellman

 On Mar 17, 2015, at 2:06 PM, Jonathan Ellis jbel...@gmail.com wrote:
 
 Cassandra 2.1 was released in September, which means that if we were on
 track with our stated goal of six month releases, 3.0 would be done about
 now.  Instead, we haven't even delivered a beta.  The immediate cause this
 time is blocking for 8099
 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is
 that nobody should really be surprised.  Something always comes up -- we've
 averaged about nine months since 1.0, with 2.1 taking an entire year.
 
 We could make theory align with reality by acknowledging, if nine months
 is our 'natural' release schedule, then so be it.  But I think we can do
 better.
 
 Broadly speaking, we have two constituencies with Cassandra releases:
 
 First, we have the users who are building or porting an application on
 Cassandra.  These users want the newest features to make their job easier.
 If 2.1.0 has a few bugs, it's not the end of the world.  They have time to
 wait for 2.1.x to stabilize while they write their code.  They would like
 to see us deliver on our six month schedule or even faster.
 
 Second, we have the users who have an application in production.  These
 users, or their bosses, want Cassandra to be as stable as possible.
 Assuming they deploy on a stable release like 2.0.12, they don't want to
 touch it.  They would like to see us release *less* often.  (Because that
 means they have to do less upgrades while remaining in our backwards
 compatibility window.)
 
 With our current big release every X months model, these users' needs are
 in tension.
 
 We discussed this six months ago, and ended up with this:
 
 What if we tried a [four month] release cycle, BUT we would guarantee that
 you could do a rolling upgrade until we bump the supermajor version? So 2.0
 could upgrade to 3.0 without having to go through 2.1.  (But to go to 3.1
 or 4.0 you would have to go through 3.0.)
 
 
 Crucially, I added
 
 Whether this is reasonable depends on how fast we can stabilize releases.
 2.1.0 will be a good test of this.
 
 
 Unfortunately, even after DataStax hired half a dozen full-time test
 engineers, 2.1.0 continued the proud tradition of being unready for
 production use, with wait for .5 before upgrading once again looking like
 a good guideline.
 
 I’m starting to think that the entire model of “write a bunch of new
 features all at once and then try to stabilize it for release” is broken.
 We’ve been trying that for years and empirically speaking the evidence is
 that it just doesn’t work, either from a stability standpoint or even just
 shipping on time.
 
 A big reason that it takes us so long to stabilize new releases now is
 that, because our major release cycle is so long, it’s super tempting to
 slip in “just one” new feature into bugfix releases, and I’m as guilty of
 that as anyone.
 
 For similar reasons, it’s difficult to do a meaningful freeze with big
 feature releases.  A look at 3.0 shows why: we have 8099 coming, but we
 also have significant work done (but not finished) on 6230, 7970, 6696, and
 6477, all of which are meaningful improvements that address demonstrated
 user pain.  So if we keep doing what we’ve been doing, our choices are to
 either delay 3.0 further while we finish and stabilize these, or we wait
 nine months to a year for the next release.  Either way, one of our
 constituencies gets disappointed.
 
 So, I’d like to try something different.  I think we were on the right
 track with shorter releases with more compatibility.  But I’d like to throw
 in a twist.  Intel cuts down on risk with a “tick-tock” schedule for new
 architectures and process shrinks instead of trying to do both at once.  We
 can do something similar here:
 
 One month releases.  Period.  If it’s not done, it can wait.
 *Every other release only accepts bug fixes.*
 
 By itself, one-month releases are going to dramatically reduce the
 complexity of testing and debugging new releases -- and bugs that do slip
 past us will only affect a smaller percentage of users, avoiding the “big
 release has a bunch of bugs no one has seen before and pretty much everyone
 is hit by something” scenario.  But by adding in the second rule, I think
 we have a real chance to make a quantum leap here: stable, production-ready
 releases every two months.
 
 So here is my proposal for 3.0:
 
 We’re just about ready to start serious review of 8099.  When that’s done,
 we branch 3.0 and cut a beta and then release candidates.  Whatever isn’t
 done by then, has to wait; unlike prior betas, we will only accept bug
 fixes into 3.0 after branching.
 
 One month after 3.0, we will ship 3.1 (with new features).  At the same
 time, we will branch 3.2.  New features in trunk will go into 3.3.  The 3.2
 branch will only get bug fixes.  We will maintain backwards compatibility
 for all of 3.x; eventually (no less than a year) we will pick a release to
 be 4.0, and drop

Re: 2.1 rc3?

2014-07-05 Thread Michael Kjellman

Also putting my 2 cents in for more testing/another release.

 On Jul 5, 2014, at 4:31 PM, Jason Brown jasedbr...@gmail.com wrote:
 
 +1 on more testing. TBH, I was a little scared when I found #7465 as it was
 rather easy to uncover.
 
 
 On Wed, Jul 2, 2014 at 11:32 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:
 
 Pretty sure we got this head of the hydra. Question is if any more will
 spring up in its place.
 
 
 On Wed, Jul 2, 2014 at 7:28 PM, Jonathan Ellis jbel...@gmail.com wrote:
 
 https://issues.apache.org/jira/browse/CASSANDRA-7465 is a pretty big
 one, I'd like to get some more testing with the fix before rolling
 -final.  thoughts?
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Re: Proposed changes to C* Release Schedule

2014-06-24 Thread Michael Kjellman

Humm — sorry guys — I never got Chris or Jonathan’s responses for some reason.

That being said, sounds like a good compromise Sylvain. Fingers crossed this 
turns into a good experiment! Thanks

best,
kjellman


 On Jun 24, 2014, at 10:09 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 
 On Tue, Jun 24, 2014 at 5:26 PM, Jonathan Ellis jbel...@gmail.com wrote:
 
 
 What if we tried a quicker release cycle, BUT we would guarantee that
 you could do a rolling upgrade until we bump the supermajor version?
 So 2.0 could upgrade to 3.0 without having to go through 2.1.  (But to
 go to 3.1 or 4.0 you would have to go through 3.0.)
 
 
 I was thinking of something along those lines so I'm in favor of giving
 that a try.
 
 More precisely, I was thinking we could lower the release cycle to 4 month
 (less
 feels hard to achieve) but make a supermajor only every 2 releases (or
 less
 often, though guarantee that you could do a rolling upgrade imply that we
 rigorously
 test that and I think aiming for 2 release in a row initially is a good
 start).
 
 It's worth acknowledging that this will probably involve a tad more merging
 work
 but it feels that increase might be reasonable.
 
 --
 Sylvain
 
 
 On Tue, Jun 24, 2014 at 8:27 AM, Chris Burroughs
 chris.burrou...@gmail.com wrote:
 On 06/17/2014 01:16 PM, Michael Kjellman wrote:
 
 That being said I don’t think i’m alone by identifying the problem.
 
 
 FWIW I'm not doing anything wildly unusual and I've been on a fork for as
 long as I've been on 1.2 (and various times before).  Almost everyone
 being
 on 1.2 with 3 other equally weighted active branches seems like an
 obvious
 not-great situation for running cassandra or developing.
 
 I like the idea of shortening the release cycle and LTS style releases
 and
 they feel like the most direct approach.  I'm a little wary of more
 branches
 since that could backfire and make the problem worse.
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Proposed changes to C* Release Schedule

2014-06-17 Thread Michael Kjellman

Hi Dev@ List—

TL;DR:
I’d love it if we could modify the C* release cycle to include an additional 
“experimental” release branch that straddles the current major releases that 
includes somewhat “untested” or “risky” commits that normally would only go 
into the next major release. Releases based from this branch wouldn’t contain 
any features that require breaking changes or are considered highly “untested” 
or “risky” but would include the many other commits that today are considered 
too unsafe to put into the previous stable branch. This will allow us to run 
code closer to the current stable release branch when we are unable to move 
fully to the new major release branch. Also, during the release cycle of the 
next major release branch the project can get feedback from a subset of the 
total changes that will ultimately make it into that final new major release. 
Also — i’m aware that any additional branches/releases will add additional work 
for any developer that works on C*. It would be great if we could strike a 
balance that hopefully doesn’t add significant additional merging/rebasing/work 
for the team...

The Longer Story:
Last week I had a conversation with a few people regarding a proposed change to 
the current C* release schedule. 

Other than an attempt to make Jonathan and Sylvian’s lives more difficult, it 
would be ideal if we could better sync our internal release schedule with more 
recent Cassandra releases. The current cycle has resulted in currently “active” 
branches for 1.2, 2.0, 2.1, and +3.0. Official stable releases are from 2.0, 
beta’s/RC’s from 2.1, and there is the potential for another out-of-band 
1.2/previous stable release build. We would love to always run the current 
“stable” release in production but generally/historically it takes time and a 
few minor releases to the current “major” branch stable to get to a state where 
we can accept for use in production. Additionally, as major releases are 
currently used to make “breaking” changes that require a more involved and 
risky upgrade process, it’s a much bigger deal to deploy a new major into 
production than a release without breaking changes. (upgrade-sstables for 
example is required when upgrading to a new major release branch. this 
unavoidable step adds lots of temporary load to the cluster and means 
deploying/upgrading to major releases tends to be a bit more risky than between 
minor releases and a more involved/long running process). This means even 
though there are months worth of stable hard work/awesome improvements in the 
current “stable” major release branch (today this is 2.0), we end up with an 
unavoidable and undesired lag in getting more recent C* changes pushed into 
production. This means we are unable to provide feedback on newer changes 
sooner to the community, stuck and unable to get even a subset of the awesome 
changes as we can’t yet take ALL the changes from the new major release branch, 
and finally if we find an issue in production or want to work on new 
functionality it would be ideal if we can write it against a release that is 
closer to the next major release while also providing us a reasonable way to 
get the feature deployed internally on a branch we are running.

Currently, the project generally tends to include all risky/breaking/more 
“feature” oriented tickets only into the next major release + trunk. However, 
there is a subset of these changes that are “somewhat” more risky changes but 
pose little/less/no risk the commit with introduce a regression outside of the 
scope of the patch/component. Additionally, any changes that  depend on other 
higher risk/breaking commits/changes wouldn’t be candidates for this proposed 
release branch. In a perfect world we would love to target a new “interim” or 
“experimental” train of releases which is loosely the most stable current 
release train but also includes a subset of changes from the next major train. 
(While we were discussing we thought about possible parallels to the concept of 
a LTS (Long Term Support) release cycle and what some people have dubbed the 
“tick-tock” release cycle.) This might look something like 1.2 branch + all 
moderately-to-“less”-risky/non-breaking commits which currently would only end 
up in a 2.0 or 2.1 release. (Off the top of my head, immediately bad candidates 
for this build would be for changes to components such as gossip, streaming, or 
any patch that changes the storage format etc). This would enable the project 
to provide builds for more active/risk-adverse users looking for a reasonable 
way to get more features and changes into production than with today’s release 
cycle. Additionally, this would hopefully facilitate/increase quicker feedback 
to the project on a subset of the new major release branch and any bugs found 
could be reported against an actual reproducible release instead of some custom 
build with a given number of patches from Jira or git SHAs applied/backported.

Re: Proposed changes to C* Release Schedule

2014-06-17 Thread Michael Kjellman

No, it generally takes months to reach a minor revision of the current major 
release to reach a release stable enough for most to use in production even if 
they “live on the edge. Generally there ends up being a very low number of 
users who've actively deployed released versions 2.0.0-2.0.5 as they too need 
to evaluate the build and test it in their QA/Staging environments. There are 
plenty of changes that went into 2.0 that are less risky that are good changes 
for the “I’m living on the edge” crowd sans the breaking changes and larger 
risky factors. (I thought i had made this clear — so I apologize I apparently 
didn’t). Additionally, the longer we have to deploy releases from a particular 
branch (while a new “stable branches matures), the farther that branch 
deviates from trunk. This makes it very hard for us to target development 
against, especially as we do everything in our power to avoid any sort of 
branching or running of a special or non-released build.

best,
michael


 On Jun 17, 2014, at 12:42 AM, Jacob Rhoden jacob.rho...@me.com wrote:
 
 Isn't this how it works now? Aka
 
 2.0 is the I'm risk averse stable, and
 2.1 is the I'm living on the edge stable 
 
 __
 Sent from iPhone
 
 On 17 Jun 2014, at 5:16 pm, Michael Kjellman mkjell...@internalcircle.com 
 wrote:
 
 Hi Dev@ List—
 
 TL;DR:
 I’d love it if we could modify the C* release cycle to include an additional 
 “experimental” release branch that straddles the current major releases that 
 includes somewhat “untested” or “risky” commits that normally would only go 
 into the next major release. Releases based from this branch wouldn’t 
 contain any features that require breaking changes or are considered highly 
 “untested” or “risky” but would include the many other commits that today 
 are considered too unsafe to put into the previous stable branch. This will 
 allow us to run code closer to the current stable release branch when we are 
 unable to move fully to the new major release branch. Also, during the 
 release cycle of the next major release branch the project can get feedback 
 from a subset of the total changes that will ultimately make it into that 
 final new major release. Also — i’m aware that any additional 
 branches/releases will add additional work for any developer that works on 
 C*. It would be great if we could strike a balance that hopefully doesn’t 
 add significant additional merging/rebasing/work for the team...
 
 The Longer Story:
 Last week I had a conversation with a few people regarding a proposed change 
 to the current C* release schedule. 
 
 Other than an attempt to make Jonathan and Sylvian’s lives more difficult, 
 it would be ideal if we could better sync our internal release schedule with 
 more recent Cassandra releases. The current cycle has resulted in currently 
 “active” branches for 1.2, 2.0, 2.1, and +3.0. Official stable releases are 
 from 2.0, beta’s/RC’s from 2.1, and there is the potential for another 
 out-of-band 1.2/previous stable release build. We would love to always run 
 the current “stable” release in production but generally/historically it 
 takes time and a few minor releases to the current “major” branch stable to 
 get to a state where we can accept for use in production. Additionally, as 
 major releases are currently used to make “breaking” changes that require a 
 more involved and risky upgrade process, it’s a much bigger deal to deploy a 
 new major into production than a release without breaking changes. 
 (upgrade-sstables for example is required when upgrading to a new major 
 release branch. this unavoidable step adds lots of temporary load to the 
 cluster and means deploying/upgrading to major releases tends to be a bit 
 more risky than between minor releases and a more involved/long running 
 process). This means even though there are months worth of stable hard 
 work/awesome improvements in the current “stable” major release branch 
 (today this is 2.0), we end up with an unavoidable and undesired lag in 
 getting more recent C* changes pushed into production. This means we are 
 unable to provide feedback on newer changes sooner to the community, stuck 
 and unable to get even a subset of the awesome changes as we can’t yet take 
 ALL the changes from the new major release branch, and finally if we find an 
 issue in production or want to work on new functionality it would be ideal 
 if we can write it against a release that is closer to the next major 
 release while also providing us a reasonable way to get the feature deployed 
 internally on a branch we are running.
 
 Currently, the project generally tends to include all risky/breaking/more 
 “feature” oriented tickets only into the next major release + trunk. 
 However, there is a subset of these changes that are “somewhat” more risky 
 changes but pose little/less/no risk the commit with introduce a regression 
 outside of the scope of the patch/component

Re: Proposed changes to C* Release Schedule

2014-06-17 Thread Michael Kjellman

It's a bit about features - but it's more an attempt to achieve the goals of 
what might happen with a 4 week release cycle (but that itself -- in practice 
didn't prove to be valid/reasonable). 

If something like an executor service for performance is changed (for example) 
it is definitely a more risky change than what would currently go into 1.2 -- 
but most likely we would want to get patches like that into a usable build.

So I guess: a) reduce code drift between branches we run in production b) get 
newer features into production faster where breaking changes aren't required 
for the scope of the patch. 

Additionally - it's also a question of what release we use when we identify an 
issue we want to work on internally. If we are on 1.2 because we can't yet take 
ALL of 2.0 - do we now need to target our work against 1.2? I would rather 
write it against the months worth of changes that have happened since. 

Finally, it's an attempt to make the internal forking not as common as it might 
be today. As you said - this is somewhat of a common process.

 On Jun 17, 2014, at 8:52 AM, Jake Luciani jak...@gmail.com wrote
 
 Hi Michael,
 
 I didn't get to hear the in person conversation so taking a step back.
 The proposal seems to be in response to a common problem.  i.e.  I'm on C*
 version X and I need feature Y which is only available on version Z. Is
 this correct?
 
 The options have been: a) upgrade to version Z or b) fork C* and backport.
 Coming my my previous job where I ran a prod C* cluster I felt this and I
 expect many others do too.  We did have to fork and backport patches we
 needed and it was hard.
 
 This is specific to features and not bugs, since bugs are fixed in all
 versions affected.
 
 -Jake
 
 
 
 
 
 
 On Tue, Jun 17, 2014 at 3:16 AM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 Hi Dev@ List—
 
 TL;DR:
 I’d love it if we could modify the C* release cycle to include an
 additional “experimental” release branch that straddles the current major
 releases that includes somewhat “untested” or “risky” commits that normally
 would only go into the next major release. Releases based from this branch
 wouldn’t contain any features that require breaking changes or are
 considered highly “untested” or “risky” but would include the many other
 commits that today are considered too unsafe to put into the previous
 stable branch. This will allow us to run code closer to the current stable
 release branch when we are unable to move fully to the new major release
 branch. Also, during the release cycle of the next major release branch the
 project can get feedback from a subset of the total changes that will
 ultimately make it into that final new major release. Also — i’m aware that
 any additional branches/releases will add additional work for any developer
 that works on C*. It would be great if we could strike a balance that
 hopefully doesn’t add significant additional merging/rebasing/work for the
 team...
 
 The Longer Story:
 Last week I had a conversation with a few people regarding a proposed
 change to the current C* release schedule.
 
 Other than an attempt to make Jonathan and Sylvian’s lives more difficult,
 it would be ideal if we could better sync our internal release schedule
 with more recent Cassandra releases. The current cycle has resulted in
 currently “active” branches for 1.2, 2.0, 2.1, and +3.0. Official stable
 releases are from 2.0, beta’s/RC’s from 2.1, and there is the potential for
 another out-of-band 1.2/previous stable release build. We would love to
 always run the current “stable” release in production but
 generally/historically it takes time and a few minor releases to the
 current “major” branch stable to get to a state where we can accept for use
 in production. Additionally, as major releases are currently used to make
 “breaking” changes that require a more involved and risky upgrade process,
 it’s a much bigger deal to deploy a new major into production than a
 release without breaking changes. (upgrade-sstables for example is required
 when upgrading to a new major release branch. this unavoidable step adds
 lots of temporary load to the cluster and means deploying/upgrading to
 major releases tends to be a bit more risky than between minor releases and
 a more involved/long running process). This means even though there are
 months worth of stable hard work/awesome improvements in the current
 “stable” major release branch (today this is 2.0), we end up with an
 unavoidable and undesired lag in getting more recent C* changes pushed into
 production. This means we are unable to provide feedback on newer changes
 sooner to the community, stuck and unable to get even a subset of the
 awesome changes as we can’t yet take ALL the changes from the new major
 release branch, and finally if we find an issue in production or want to
 work on new functionality it would be ideal if we can write it against a
 release that is closer to the next

Re: Proposed changes to C* Release Schedule

2014-06-17 Thread Michael Kjellman

I agree — but there are lots of people though who lag in adoption while waiting 
for more “risky” or “breaking” changes to shake out/stabilize that want to 
continue deploying newer fixes. No?

 On Jun 17, 2014, at 9:51 AM, Jake Luciani jak...@gmail.com wrote:
 
 I'm not sure many people have the problem you are describing.  This is more
 of a C* developer issue than a C* user issue.
 
 
 Is the below what you are describing we move to?:
 
 1.2 - 2.0 - 2.1 - 3.0 stable
 1.2 - 2.0 - 2.1 - 3.0 experimental
 
 Specific changes would be backported based on the less riskyness of the
 change which you are assuming will be constant across versions?
 
 -Jake
 
 
 
 
 On Tue, Jun 17, 2014 at 12:28 PM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 It's a bit about features - but it's more an attempt to achieve the goals
 of what might happen with a 4 week release cycle (but that itself -- in
 practice didn't prove to be valid/reasonable).
 
 If something like an executor service for performance is changed (for
 example) it is definitely a more risky change than what would currently go
 into 1.2 -- but most likely we would want to get patches like that into a
 usable build.
 
 So I guess: a) reduce code drift between branches we run in production b)
 get newer features into production faster where breaking changes aren't
 required for the scope of the patch.
 
 Additionally - it's also a question of what release we use when we
 identify an issue we want to work on internally. If we are on 1.2 because
 we can't yet take ALL of 2.0 - do we now need to target our work against
 1.2? I would rather write it against the months worth of changes that have
 happened since.
 
 Finally, it's an attempt to make the internal forking not as common as it
 might be today. As you said - this is somewhat of a common process.
 
 On Jun 17, 2014, at 8:52 AM, Jake Luciani jak...@gmail.com wrote
 
 Hi Michael,
 
 I didn't get to hear the in person conversation so taking a step back.
 The proposal seems to be in response to a common problem.  i.e.  I'm on
 C*
 version X and I need feature Y which is only available on version Z. Is
 this correct?
 
 The options have been: a) upgrade to version Z or b) fork C* and
 backport.
 Coming my my previous job where I ran a prod C* cluster I felt this and I
 expect many others do too.  We did have to fork and backport patches we
 needed and it was hard.
 
 This is specific to features and not bugs, since bugs are fixed in all
 versions affected.
 
 -Jake
 
 
 
 
 
 
 On Tue, Jun 17, 2014 at 3:16 AM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 Hi Dev@ List—
 
 TL;DR:
 I’d love it if we could modify the C* release cycle to include an
 additional “experimental” release branch that straddles the current
 major
 releases that includes somewhat “untested” or “risky” commits that
 normally
 would only go into the next major release. Releases based from this
 branch
 wouldn’t contain any features that require breaking changes or are
 considered highly “untested” or “risky” but would include the many other
 commits that today are considered too unsafe to put into the previous
 stable branch. This will allow us to run code closer to the current
 stable
 release branch when we are unable to move fully to the new major release
 branch. Also, during the release cycle of the next major release branch
 the
 project can get feedback from a subset of the total changes that will
 ultimately make it into that final new major release. Also — i’m aware
 that
 any additional branches/releases will add additional work for any
 developer
 that works on C*. It would be great if we could strike a balance that
 hopefully doesn’t add significant additional merging/rebasing/work for
 the
 team...
 
 The Longer Story:
 Last week I had a conversation with a few people regarding a proposed
 change to the current C* release schedule.
 
 Other than an attempt to make Jonathan and Sylvian’s lives more
 difficult,
 it would be ideal if we could better sync our internal release schedule
 with more recent Cassandra releases. The current cycle has resulted in
 currently “active” branches for 1.2, 2.0, 2.1, and +3.0. Official stable
 releases are from 2.0, beta’s/RC’s from 2.1, and there is the potential
 for
 another out-of-band 1.2/previous stable release build. We would love to
 always run the current “stable” release in production but
 generally/historically it takes time and a few minor releases to the
 current “major” branch stable to get to a state where we can accept for
 use
 in production. Additionally, as major releases are currently used to
 make
 “breaking” changes that require a more involved and risky upgrade
 process,
 it’s a much bigger deal to deploy a new major into production than a
 release without breaking changes. (upgrade-sstables for example is
 required
 when upgrading to a new major release branch. this unavoidable step adds
 lots of temporary load to the cluster and means deploying

Re: Proposed changes to C* Release Schedule

2014-06-17 Thread Michael Kjellman

Totally agree — Also — i’m aware that any additional branches/releases will 
add additional work for any developer that works on C*. It would be great if we 
could strike a balance that hopefully doesn’t add significant additional 
merging/rebasing/work for the team…”

That being said I don’t think i’m alone by identifying the problem. The 
proposed solution was what we came up with in the hour or so we discussed this 
in person. How else can you shrink the release schedule without creating 
another branch? Also — the idea is to only have this branch “active” during the 
overlap when major release branches need to stabilize.

 On Jun 17, 2014, at 10:03 AM, Brandon Williams dri...@gmail.com wrote:
 
 If that's what we want, merging is going to be much more painful.
 Currently we merge:
 
 1.2-2.0-2.1-3.0
 
 If we add an experimental branch for each, we still have to merge the
 stable branch into experiemental:
 
 1-2-1.2ex, 2.0-2.0ex, 2.1-2.1ex, 3.0-3.0ex
 
 And then the experimentals into each other:
 
 1.2ex-2.0ex, 2.0ex-2.1ex, 2.1ex-3.0ex
 
 That's quite a lot of merging in the end.
 
 
 On Tue, Jun 17, 2014 at 11:51 AM, Jake Luciani jak...@gmail.com wrote:
 
 I'm not sure many people have the problem you are describing.  This is more
 of a C* developer issue than a C* user issue.
 
 
 Is the below what you are describing we move to?:
 
 1.2 - 2.0 - 2.1 - 3.0 stable
 1.2 - 2.0 - 2.1 - 3.0 experimental
 
 Specific changes would be backported based on the less riskyness of the
 change which you are assuming will be constant across versions?
 
 -Jake
 
 
 
 
 On Tue, Jun 17, 2014 at 12:28 PM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 It's a bit about features - but it's more an attempt to achieve the goals
 of what might happen with a 4 week release cycle (but that itself -- in
 practice didn't prove to be valid/reasonable).
 
 If something like an executor service for performance is changed (for
 example) it is definitely a more risky change than what would currently
 go
 into 1.2 -- but most likely we would want to get patches like that into a
 usable build.
 
 So I guess: a) reduce code drift between branches we run in production b)
 get newer features into production faster where breaking changes aren't
 required for the scope of the patch.
 
 Additionally - it's also a question of what release we use when we
 identify an issue we want to work on internally. If we are on 1.2 because
 we can't yet take ALL of 2.0 - do we now need to target our work against
 1.2? I would rather write it against the months worth of changes that
 have
 happened since.
 
 Finally, it's an attempt to make the internal forking not as common as it
 might be today. As you said - this is somewhat of a common process.
 
 On Jun 17, 2014, at 8:52 AM, Jake Luciani jak...@gmail.com wrote
 
 Hi Michael,
 
 I didn't get to hear the in person conversation so taking a step back.
 The proposal seems to be in response to a common problem.  i.e.  I'm on
 C*
 version X and I need feature Y which is only available on version Z. Is
 this correct?
 
 The options have been: a) upgrade to version Z or b) fork C* and
 backport.
 Coming my my previous job where I ran a prod C* cluster I felt this
 and I
 expect many others do too.  We did have to fork and backport patches we
 needed and it was hard.
 
 This is specific to features and not bugs, since bugs are fixed in all
 versions affected.
 
 -Jake
 
 
 
 
 
 
 On Tue, Jun 17, 2014 at 3:16 AM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 Hi Dev@ List—
 
 TL;DR:
 I’d love it if we could modify the C* release cycle to include an
 additional “experimental” release branch that straddles the current
 major
 releases that includes somewhat “untested” or “risky” commits that
 normally
 would only go into the next major release. Releases based from this
 branch
 wouldn’t contain any features that require breaking changes or are
 considered highly “untested” or “risky” but would include the many
 other
 commits that today are considered too unsafe to put into the previous
 stable branch. This will allow us to run code closer to the current
 stable
 release branch when we are unable to move fully to the new major
 release
 branch. Also, during the release cycle of the next major release
 branch
 the
 project can get feedback from a subset of the total changes that will
 ultimately make it into that final new major release. Also — i’m aware
 that
 any additional branches/releases will add additional work for any
 developer
 that works on C*. It would be great if we could strike a balance that
 hopefully doesn’t add significant additional merging/rebasing/work for
 the
 team...
 
 The Longer Story:
 Last week I had a conversation with a few people regarding a proposed
 change to the current C* release schedule.
 
 Other than an attempt to make Jonathan and Sylvian’s lives more
 difficult,
 it would be ideal if we could better sync our internal release
 schedule
 with more recent

Re: Proposed changes to C* Release Schedule

2014-06-17 Thread Michael Kjellman

Agreed. But I think shrinking the release cycle naturally will get features 
into usable releases more quickly solving b as well. :)

 On Jun 17, 2014, at 10:28 AM, Jake Luciani jak...@gmail.com wrote:
 
 So there are two issues this proposal is trying to address:
 
 1. Shrink the release cycle.
 
 2. Backport things to stable releases.
 
 We should discuss these separately since together it's hard to discuss.
 1. is less controversial I would think :)
 
 
 
 
 
 
 On Tue, Jun 17, 2014 at 1:16 PM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 Totally agree — Also — i’m aware that any additional branches/releases
 will add additional work for any developer that works on C*. It would be
 great if we could strike a balance that hopefully doesn’t add significant
 additional merging/rebasing/work for the team…”
 
 That being said I don’t think i’m alone by identifying the problem. The
 proposed solution was what we came up with in the hour or so we discussed
 this in person. How else can you shrink the release schedule without
 creating another branch? Also — the idea is to only have this branch
 “active” during the overlap when major release branches need to stabilize.
 
 On Jun 17, 2014, at 10:03 AM, Brandon Williams dri...@gmail.com wrote:
 
 If that's what we want, merging is going to be much more painful.
 Currently we merge:
 
 1.2-2.0-2.1-3.0
 
 If we add an experimental branch for each, we still have to merge the
 stable branch into experiemental:
 
 1-2-1.2ex, 2.0-2.0ex, 2.1-2.1ex, 3.0-3.0ex
 
 And then the experimentals into each other:
 
 1.2ex-2.0ex, 2.0ex-2.1ex, 2.1ex-3.0ex
 
 That's quite a lot of merging in the end.
 
 
 On Tue, Jun 17, 2014 at 11:51 AM, Jake Luciani jak...@gmail.com wrote:
 
 I'm not sure many people have the problem you are describing.  This is
 more
 of a C* developer issue than a C* user issue.
 
 
 Is the below what you are describing we move to?:
 
 1.2 - 2.0 - 2.1 - 3.0 stable
 1.2 - 2.0 - 2.1 - 3.0 experimental
 
 Specific changes would be backported based on the less riskyness of
 the
 change which you are assuming will be constant across versions?
 
 -Jake
 
 
 
 
 On Tue, Jun 17, 2014 at 12:28 PM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 It's a bit about features - but it's more an attempt to achieve the
 goals
 of what might happen with a 4 week release cycle (but that itself -- in
 practice didn't prove to be valid/reasonable).
 
 If something like an executor service for performance is changed (for
 example) it is definitely a more risky change than what would currently
 go
 into 1.2 -- but most likely we would want to get patches like that
 into a
 usable build.
 
 So I guess: a) reduce code drift between branches we run in production
 b)
 get newer features into production faster where breaking changes
 aren't
 required for the scope of the patch.
 
 Additionally - it's also a question of what release we use when we
 identify an issue we want to work on internally. If we are on 1.2
 because
 we can't yet take ALL of 2.0 - do we now need to target our work
 against
 1.2? I would rather write it against the months worth of changes that
 have
 happened since.
 
 Finally, it's an attempt to make the internal forking not as common as
 it
 might be today. As you said - this is somewhat of a common process.
 
 On Jun 17, 2014, at 8:52 AM, Jake Luciani jak...@gmail.com wrote
 
 Hi Michael,
 
 I didn't get to hear the in person conversation so taking a step back.
 The proposal seems to be in response to a common problem.  i.e.  I'm
 on
 C*
 version X and I need feature Y which is only available on version Z.
 Is
 this correct?
 
 The options have been: a) upgrade to version Z or b) fork C* and
 backport.
 Coming my my previous job where I ran a prod C* cluster I felt this
 and I
 expect many others do too.  We did have to fork and backport patches
 we
 needed and it was hard.
 
 This is specific to features and not bugs, since bugs are fixed in all
 versions affected.
 
 -Jake
 
 
 
 
 
 
 On Tue, Jun 17, 2014 at 3:16 AM, Michael Kjellman 
 mkjell...@internalcircle.com wrote:
 
 Hi Dev@ List—
 
 TL;DR:
 I’d love it if we could modify the C* release cycle to include an
 additional “experimental” release branch that straddles the current
 major
 releases that includes somewhat “untested” or “risky” commits that
 normally
 would only go into the next major release. Releases based from this
 branch
 wouldn’t contain any features that require breaking changes or are
 considered highly “untested” or “risky” but would include the many
 other
 commits that today are considered too unsafe to put into the previous
 stable branch. This will allow us to run code closer to the current
 stable
 release branch when we are unable to move fully to the new major
 release
 branch. Also, during the release cycle of the next major release
 branch
 the
 project can get feedback from a subset of the total changes that will
 ultimately make it into that final new

Re: Cannot subscribe to users mailing list

2014-05-12 Thread Michael Kjellman

The ASF recently had catastrophic mail server issues. You should try again as 
some messages were unfortunately lost :(

https://blogs.apache.org/infra/entry/mail_outage

best,
michael
 
On May 12, 2014, at 9:58 AM, Maciej Miklas mac.mik...@gmail.com wrote:

 Hi *,
 
 
 I’ve tried to subscribe to user-subscr...@cassandra.apache.org, it takes one 
 day to receive confirmation, and later on nothing - no emails, I’ve been 
 waiting for two days.  Could someone check it out?
 
 
 Regards,
 Maciej

Re: Timeuuid inserted with now(), how to get the value back in Java client?

2014-03-28 Thread Michael Kjellman

Create a new Date() and insert that instead of using the NOW CQL macro. You 
have the time you inserted...

Also, this is the type of question for the user list in the future.

Hope that helps and I understood your question correctly. 

 On Mar 28, 2014, at 4:46 PM, Andy Atj2 andya...@gmail.com wrote:
 
 I'm writing a Java client to a Cassandra db.
 
 One of the main primary keys is a timeuuid.
 
 I plan to do INSERTs using now() and have Cassandra generate the value of
 the timeuuid.
 
 After the INSERT, I need the Cassandra-generated timeuuid value. Is there
 an easy way to get it, without having to re-query for the record I just
 inserted, hoping to get only one record back? Remember, I don't have the PK.
 
 Eg, in every other db there's a way to get the generated PK back. In sql
 it's @@identity, in oracle its...etc etc.
 
 I know Cassandra is not an RDBMS. All I want is the value Cassandra just
 generated.
 
 Thanks,
 Andy

===

Find out how eSigning generates significant financial benefit.
Read the Barracuda SignNow ROI whitepaper at 
https://signnow.com/l/business/esignature_roi

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-13 Thread Michael Kjellman

Ed-

I understand and respect your enthusiasm for Thrift, but it's ship has sailed. 
Yes- if you understand the low level thrift API I'm sure you can have a 
rewarding experience, but as someone who wrote a client and had to abstract 
thrift...I don't have many kind words, and I certainly have less hair on my 
head...

Every line of code ever written has the chance of bugs and thankfully this 
project has a super dedicated group of people who are very very responsive at 
fixing those. The sorting and paging bugs might not have happened in thrift 
because that logic is and has always been pushed onto the client. (Where there 
are also lots of bugs). I like the model where the bug is fixed once for all 
languages and clients personally...

CQL has worked for me in 9 different sets of application logic as of now..and 
C* is more accessible to others because of it. Application code is simpler, 
client code is simpler, learning curve for new uses is easier. Win. Win. Win. 

IMHO, If you put 1/4 of the energy into CQL that you do into fighting for 
Thrift, I'm scared to think how amazing CQL would be. 

Best,
Michael

 On Mar 13, 2014, at 5:59 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 There was a paging bug in 2.0 and a user just reported a bug sorting a one
 row dataset.
 
 So if you want to argue cql has surpassed thrift in all ways, one way it
 clearly has not is correctness.
 
 To demonatrate, search the changelog for cql bugs that return wrong result.
 Then do the same search for thrift bugs that return the wrong result and
 compare.
 
 If nubes to the ml can pick up bugs and performance regressions it is a
 serious issue.
 
 On Wednesday, March 12, 2014, Jonathan Ellis jbel...@gmail.com wrote:
 I don't know if an IN query already does this without source diving,
 but it could certainly do so without needing extra syntax.
 
 On Wed, Mar 12, 2014 at 7:16 PM, Nicolas Favre-Felix nico...@acunu.com
 wrote:
 If any new use cases
 come to light that can be done with Thrift but not CQL, we will commit
 to supporting those in CQL.
 
 Hello,
 
 (going back to the original topic...)
 
 I just wanted to point out that there is in my opinion an important
 use case that is doable in Thrift but not in CQL, which is to fetch
 several CQL rows from the same partition in a single isolated read. We
 lose the benefit of partition-level isolation if there is no way to
 read rows together.
 Of course we can perform range queries and even scan over
 multi-dimensional clustering keys with CASSANDRA-4851, but we still
 can't fetch rows using a set of clustering keys.
 
 I couldn't find a JIRA for this feature, does anyone know if there is
 one?
 
 Cheers,
 Nicolas
 
 --
 For what it's worth, +1 on freezing Thrift.
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced
 
 -- 
 Sorry this was sent from mobile. Will do less grammar and spell check than
 usual.

===

JOIN US! Live webinar featuring 451 Research  West Windsor Township: 
Best Practices in Security Convergence 
March 18 at 10am PDT.
RSVP at http://www.barracuda.com/451webinar

Re: Product Catalog Data Format Investigation (NOW-18)

2014-02-21 Thread Michael Kjellman

Your link doesn¹t resolve and I¹m 99% sure you have the wrong mailing list.

Best,
Michael

On 2/21/14, 4:11 PM, Paul Cichonski paul.cichon...@lithium.com wrote:

Hey All,

I've documented the results of my investigation into product data
standards here: 
http://confluence.dev.lithium.com/pages/viewpage.action?pageId=47110202.

Unfortunately it does not seem like there is a clear winner, but
hopefully customer input will help drive our choice. Personally, I think
that since our internal schema is likely to be a very small subset of
what these standards express, that it will be feasible to support all of
them out of the box and simply map the data we care about into our
internal representation.

Comments/Ideas are welcome (please add to confluence).

Thanks,
Paul


===

Find out how eSigning generates significant financial benefit.
Read the Barracuda SignNow ROI whitepaper at 
https://signnow.com/l/business/esignature_roi

Re: [VOTE CLOSED] Release Apache Cassandra 1.2.13 (Strike 2)

2013-12-17 Thread Michael Kjellman

Well. I feel stupid now :)

 On Dec 16, 2013, at 10:49 PM, Sylvain Lebresne sylv...@datastax.com wrote:
 
 On Tue, Dec 17, 2013 at 7:43 AM, Michael Kjellman
 mkjell...@barracuda.comwrote:
 
 I¹ve also seen behavior where prepared statements are lost during a
 rolling restart..haven¹t had a chance to debug/git bisect yet. Anyone else
 seen anything similar?
 
 I'd be somewhat surprised if someone hasn't. We don't persists prepared
 statements over
 node restart. it's supposed to be the client job to re-prepare statements
 when a node
 is restarted.
 
 --
 Sylvain
 
 
 
 On 12/16/13, 10:40 PM, Sylvain Lebresne sylv...@datastax.com wrote:
 
 Alright, this vote is thus close. I'll re-roll when we've made sure we
 fixed all the regressions so we'll see when that is.
 
 --
 Sylvain
 
 
 On Tue, Dec 17, 2013 at 1:01 AM, Brandon Williams dri...@gmail.com
 wrote:
 
 Or rather: https://issues.apache.org/jira/browse/CASSANDRA-6493
 
 I would still prefer a 24h vote after resolution though.
 
 
 On Mon, Dec 16, 2013 at 5:56 PM, Brandon Williams dri...@gmail.com
 wrote:
 
 Changing to -1 as
 https://issues.apache.org/jira/browse/CASSANDRA-6488seems to be a real
 problem.
 
 
 On Mon, Dec 16, 2013 at 12:12 PM, Brandon Williams dri...@gmail.com
 wrote:
 
 +1
 
 
 On Mon, Dec 16, 2013 at 3:38 AM, Sylvain Lebresne
 sylv...@datastax.com
 wrote:
 
 So now that CASSANDRA-6485 has been committed, I propose the
 following
 artifacts for release as 1.2.13.
 
 sha1: 4be9e6720d9f94a83aa42153c3e71ae1e557d2d9
 Git:
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/
 tags/1.2.13-tentative
 Artifacts:
 https://repository.apache.org/content/repositories/orgapachecassandra-050
 /org/apache/cassandra/apache-cassandra/1.2.13/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-050
 /
 
 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/
 
 Since it is a re-roll, I propose an expedited vote so the vote will
 be
 open
 for 24 hours (but longer if needed).
 
 [1]: http://goo.gl/j4xUAK (CHANGES.txt)
 [2]: http://goo.gl/iu5Og9 (NEWS.txt)
 
 
 ===
 
 
 
 Find out how eSigning generates significant financial benefit.
 
 Read the Barracuda SignNow ROI whitepaper at
 https://signnow.com/l/business/esignature_roi
 

===

Find out how eSigning generates significant financial benefit.
Read the Barracuda SignNow ROI whitepaper at 
https://signnow.com/l/business/esignature_roi

Re: [VOTE CLOSED] Release Apache Cassandra 1.2.13 (Strike 2)

2013-12-16 Thread Michael Kjellman

I¹ve also seen behavior where prepared statements are lost during a
rolling restart..haven¹t had a chance to debug/git bisect yet. Anyone else
seen anything similar?

On 12/16/13, 10:40 PM, Sylvain Lebresne sylv...@datastax.com wrote:

Alright, this vote is thus close. I'll re-roll when we've made sure we
fixed all the regressions so we'll see when that is.

--
Sylvain


On Tue, Dec 17, 2013 at 1:01 AM, Brandon Williams dri...@gmail.com
wrote:

 Or rather: https://issues.apache.org/jira/browse/CASSANDRA-6493

 I would still prefer a 24h vote after resolution though.


 On Mon, Dec 16, 2013 at 5:56 PM, Brandon Williams dri...@gmail.com
 wrote:

  Changing to -1 as
 https://issues.apache.org/jira/browse/CASSANDRA-6488seems to be a real
 problem.
 
 
  On Mon, Dec 16, 2013 at 12:12 PM, Brandon Williams dri...@gmail.com
 wrote:
 
  +1
 
 
  On Mon, Dec 16, 2013 at 3:38 AM, Sylvain Lebresne
sylv...@datastax.com
 wrote:
 
  So now that CASSANDRA-6485 has been committed, I propose the
following
  artifacts for release as 1.2.13.
 
  sha1: 4be9e6720d9f94a83aa42153c3e71ae1e557d2d9
  Git:
 
 
 
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/
tags/1.2.13-tentative
  Artifacts:
 
 
 
https://repository.apache.org/content/repositories/orgapachecassandra-050
/org/apache/cassandra/apache-cassandra/1.2.13/
  Staging repository:
 
 
 
https://repository.apache.org/content/repositories/orgapachecassandra-050
/
 
  The artifacts as well as the debian package are also available here:
  http://people.apache.org/~slebresne/
 
  Since it is a re-roll, I propose an expedited vote so the vote will
be
  open
  for 24 hours (but longer if needed).
 
  [1]: http://goo.gl/j4xUAK (CHANGES.txt)
  [2]: http://goo.gl/iu5Og9 (NEWS.txt)
 
 
 
 



===

Find out how eSigning generates significant financial benefit.
Read the Barracuda SignNow ROI whitepaper at 
https://signnow.com/l/business/esignature_roi

1 2 >

1 - 100 of 129 matches

Mail list logo