Re: Weekly Cassandra Wrap-Up: Oct 16 Edition

2017-10-16 Thread Jon Haddad
Regarding the stress tests, if you’re willing to share, I’m starting a repo 
where we can keep a bunch of different stress profiles.  I’d like to start 
running them on releases before we agree to push them out.  If anyone has a 
stress test they are willing to share, please get in touch with me!



> On Oct 16, 2017, at 8:37 AM, Jeff Jirsa  wrote:
> 
> I got some feedback last week that I should try this on Monday morning, so
> let's see if we can nudge a few people into action this week.
> 
> 3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't be a
> surprise to anyone here - you should have seen the votes and release
> notifications. The people working directly ON Cassandra every day are
> probably very aware of the number and nature of fixes in those versions -
> if you're not aware, the Change lists are HUGE, and some of the fixes are
> VERY IMPORTANT. So this week's wrap-up is really a reflection on the size
> of those two release changelogs.
> 
> One of the advantages of the Cassandra project is the size of the user base
> - I don't know if we have accurate counts (and some of the "surveys" are
> laughable), but we know it's on the order of thousands (probably tens of
> thousands) of companies, and some huge number of instances (not willing to
> speculate here, we know it's at least in the hundreds of thousands, may be
> well into the millions). Historically, the best stabilizer of a release was
> people upgrading their unusual use cases, finding bugs that the developers
> hadn't anticipated (and therefore tests didn't exist for those edge cases),
> reporting them, and the next release would be slightly better than the one
> before it. The chicken/egg problem here is pretty obvious, and while a lot
> of us are spending a lot of time making things better, I want to use this
> email to ask a favor (in 3 parts):
> 
> 1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test
> cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a
> thousand node cluster, most of the weird stuff we've seen in the post-3.0
> world deals with data, not cluster size. Grab some of your prod data if you
> can, throw it into a test cluster, add a node/remove a node, tell us if it
> doesn't work.
> 2) Please run a stress workload against that test cluster, even if it's
> 5-10 minutes. Purpose here is two-fold: like #1, it'll help us find some
> edge cases we haven't seen before, but it'll also help us identify holes in
> stress coverage. We have some tickets to add UDTs to stress (
> https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT (
> https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your stress
> profile should be more than "80% reads 20% writes" - try to actually model
> your schema and query behavior. Do you use static columns? Do you use
> collections?  If you're unable to model your use case because of a
> deficiency in stress, open a JIRA. If things break, open a JIRA. If it
> works perfectly, I'm interested in seeing your stress yaml and results
> (please send it to me privately, don't spam the list).
> 3) If you're somehow not able to run stress because you don't have hardware
> for a spare cluster, profiling your live cluster is also incredibly useful.
> TLP has some notes on how to generate flame graphs -
> https://github.com/thelastpickle/lightweight-java-profiler - I saw one
> example from a cluster that really surprised me. There are versions and use
> cases that we know have been heavily profiled, but there are probably
> versions and use cases where nobody's ever run much in the way of
> profiling. If you're running openjdk in prod, and you're able to SAFELY
> attach a profiler to generate some flame graphs, please send those to me
> (again, privately please, I don't think the whole list needs a copy).
> 
> My hope in all of this is to build up a corpus of real world use cases (and
> real current state via profiling) that we can leverage to make testing and
> performance better going forward. If I get much in the way of response to
> either of these, I'll try to send out a summary in next week's email).
> 
> - Jeff


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Weekly Cassandra Wrap-Up: Oct 16 Edition

2017-10-16 Thread Jeff Jirsa
Also learned of https://github.com/aragozin/jvm-tools , which can generate
flame graphs easily without requiring a restart with an agent, and works on
openjdk+oracle.



On Mon, Oct 16, 2017 at 8:37 AM, Jeff Jirsa  wrote:

>
> I got some feedback last week that I should try this on Monday morning, so
> let's see if we can nudge a few people into action this week.
>
> 3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't be a
> surprise to anyone here - you should have seen the votes and release
> notifications. The people working directly ON Cassandra every day are
> probably very aware of the number and nature of fixes in those versions -
> if you're not aware, the Change lists are HUGE, and some of the fixes are
> VERY IMPORTANT. So this week's wrap-up is really a reflection on the size
> of those two release changelogs.
>
> One of the advantages of the Cassandra project is the size of the user
> base - I don't know if we have accurate counts (and some of the "surveys"
> are laughable), but we know it's on the order of thousands (probably tens
> of thousands) of companies, and some huge number of instances (not willing
> to speculate here, we know it's at least in the hundreds of thousands, may
> be well into the millions). Historically, the best stabilizer of a release
> was people upgrading their unusual use cases, finding bugs that the
> developers hadn't anticipated (and therefore tests didn't exist for those
> edge cases), reporting them, and the next release would be slightly better
> than the one before it. The chicken/egg problem here is pretty obvious, and
> while a lot of us are spending a lot of time making things better, I want
> to use this email to ask a favor (in 3 parts):
>
> 1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test
> cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a
> thousand node cluster, most of the weird stuff we've seen in the post-3.0
> world deals with data, not cluster size. Grab some of your prod data if you
> can, throw it into a test cluster, add a node/remove a node, tell us if it
> doesn't work.
> 2) Please run a stress workload against that test cluster, even if it's
> 5-10 minutes. Purpose here is two-fold: like #1, it'll help us find some
> edge cases we haven't seen before, but it'll also help us identify holes in
> stress coverage. We have some tickets to add UDTs to stress (
> https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT (
> https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your
> stress profile should be more than "80% reads 20% writes" - try to actually
> model your schema and query behavior. Do you use static columns? Do you use
> collections?  If you're unable to model your use case because of a
> deficiency in stress, open a JIRA. If things break, open a JIRA. If it
> works perfectly, I'm interested in seeing your stress yaml and results
> (please send it to me privately, don't spam the list).
> 3) If you're somehow not able to run stress because you don't have
> hardware for a spare cluster, profiling your live cluster is also
> incredibly useful. TLP has some notes on how to generate flame graphs -
> https://github.com/thelastpickle/lightweight-java-profiler - I saw one
> example from a cluster that really surprised me. There are versions and use
> cases that we know have been heavily profiled, but there are probably
> versions and use cases where nobody's ever run much in the way of
> profiling. If you're running openjdk in prod, and you're able to SAFELY
> attach a profiler to generate some flame graphs, please send those to me
> (again, privately please, I don't think the whole list needs a copy).
>
> My hope in all of this is to build up a corpus of real world use cases
> (and real current state via profiling) that we can leverage to make testing
> and performance better going forward. If I get much in the way of response
> to either of these, I'll try to send out a summary in next week's email).
>
> - Jeff
>
>
>


Re: Weekly Cassandra Wrap-up

2017-04-04 Thread Jeff Jirsa
On Sun, Apr 2, 2017 at 7:53 PM, Jeff Jirsa  wrote:

> 3) As a follow-up to #2, I proposed pushing up some CircleCI and Travis
> YML files into the active branches to make testing easier -  (
> https://lists.apache.org/thread.html/48e73ff0d2aff5af3d6feb20af5e7f
> 4318c17379471abb2c16c2dcdf@%3Cdev.cassandra.apache.org%3E /
> https://issues.apache.org/jira/browse/CASSANDRA-13388 ). If anyone has
> strong opinions on Circle/Travis, or is great with setting up yaml files to
> configure parallel CI tests, wouldn't mind having some feedback here (like
> "we can split up nosetests like ", or "let's use  instead because
> it'll be easier", or "just push it as is and we'll iterate on it later" -
> I'm personally leaning towards "push it and iterate later", since it's not
> actually impacting the database at runtime, but other feedback is always
> great)
>
>
^ the circleci portion of 13388 is now pushed. Contributors who always
wanted to run unit tests but didn't know how, all you have to do is link
circleci to your github account, and pull/push to your repo, and it'll kick
off a build.

- Jeff


Re: Weekly Cassandra Wrap-up

2017-04-03 Thread Vinay Chella
Priam could go in Cassandra cluster management section.

Priam - https://github.com/Netflix/Priam​

​Thanks,
Vinay​ Chella


On Mon, Apr 3, 2017 at 2:43 PM, Brandon Williams  wrote:

> Not specific to C*, but still sjk plus a million times over:
>
> https://github.com/aragozin/jvm-tools
>
> On Mon, Apr 3, 2017 at 4:17 PM, Jeff Jirsa  wrote:
>
> > Monday follow-up:
> >
> > There exist a lot of tools that a lot of people know exist and use for
> > managing clusters. Examples are repair scripts like
> > - https://github.com/spotify/cassandra-reaper
> > - https://github.com/BrianGallew/cassandra_range_repair
> > - https://github.com/thelastpickle/cassandra-reaper
> >
> > Or cassandra puppet modules:
> > - https://github.com/locp/cassandra
> >
> > Or Brian Hess' counter+loader tools:
> > - https://github.com/brianmhess/cassandra-count
> > - https://github.com/brianmhess/cassandra-loader
> >
> > Or Brian Gallew's tools at:
> > - https://github.com/BrianGallew/cassandra_tools
> >
> > I'd like to make a more comprehensive list of these third party tools and
> > include them in the docs - are there any tools you find useful in using
> > Cassandra that should be listed?
> >
> >
> >
> >
> > On Sun, Apr 2, 2017 at 7:53 PM, Jeff Jirsa  wrote:
> >
> > > From the email list:
> > >
> > > 1) Some grad students are doing some research on Cassandra and have
> open
> > > questions about thread interactions - would be great if someone had
> time
> > to
> > > help answer their questions: https://lists.apache.org/thread.html/
> > > 70ae332f54676352755bcb421b66a56fe27e296f8828eff26727bb58@%
> > > 3Cdev.cassandra.apache.org%3E
> > >
> > > 2) The ongoing discussion of code quality, testing, and coverage
> > continues
> > > - if you haven't read it yet, but you care about release quality, you
> > > probably should ( https://lists.apache.org/thread.html/
> > > 0854341ae3ab41ceed2ae8a03f2486cf2325e4fca6fd800bf4297dd4@%
> > > 3Cdev.cassandra.apache.org%3E )
> > >
> > > 3) As a follow-up to #2, I proposed pushing up some CircleCI and Travis
> > > YML files into the active branches to make testing easier -  (
> > > https://lists.apache.org/thread.html/48e73ff0d2aff5af3d6feb20af5e7f
> > > 4318c17379471abb2c16c2dcdf@%3Cdev.cassandra.apache.org%3E /
> > > https://issues.apache.org/jira/browse/CASSANDRA-13388 ). If anyone has
> > > strong opinions on Circle/Travis, or is great with setting up yaml
> files
> > to
> > > configure parallel CI tests, wouldn't mind having some feedback here
> > (like
> > > "we can split up nosetests like ", or "let's use  instead
> because
> > > it'll be easier", or "just push it as is and we'll iterate on it
> later" -
> > > I'm personally leaning towards "push it and iterate later", since it's
> > not
> > > actually impacting the database at runtime, but other feedback is
> always
> > > great)
> > >
> > >
> > > JIRA stuff:
> > >
> > > - https://issues.apache.org/jira/browse/CASSANDRA-13396 - we use
> SLF4J,
> > > but really only support logback (though we didn't actively exclude
> other
> > > loggers until recently); people with long histories working on the
> > project
> > > (especially you Datastax and TLP folks), may want to throw in their two
> > > cents on this ticket. People commenting should endeavor to keep things
> > fact
> > > based, and not emotional.
> > >
> > > Some more patch-available JIRAs but no reviewers:
> > > - https://issues.apache.org/jira/browse/CASSANDRA-13397 (Return value
> of
> > > CountDownLatch.await() not being checked in Repair )
> > > - https://issues.apache.org/jira/browse/CASSANDRA-13307 (Make CQLSH
> > Great
> > > Again by making CQLSH downgrade native protocol properly)
> > > - https://issues.apache.org/jira/browse/CASSANDRA-12962 (SASI
> needlessly
> > > rebuilding empty indices)
> > > - https://issues.apache.org/jira/browse/CASSANDRA-13067 (Huge AWS
> > > filesystems overflow Long.MAX_INT)
> > > - https://issues.apache.org/jira/browse/CASSANDRA-12748 (GREP_COLOR
> > > environment variable breaks things)
> > > And These two BTree related patches have been around a while, still no
> > > reviewer:
> > > - https://issues.apache.org/jira/browse/CASSANDRA-9989 &
> > > https://issues.apache.org/jira/browse/CASSANDRA-9988
> > >
> > > - Jeff
> > >
> >
>


Re: Weekly Cassandra Wrap-up

2017-04-03 Thread Chris Lohfink
https://github.com/tolbertam/sstable-tools 

https://github.com/instaclustr/cassandra-sstable-tools 

https://github.com/spotify/cassandra-opstools 



> On Apr 3, 2017, at 3:34 PM, Jonathan Haddad  wrote:
> 
> Tablesnap: https://github.com/JeremyGrosser/tablesnap
> Prometheus JMX exporter: https://github.com/prometheus/jmx_exporter
> 
> 
> On Mon, Apr 3, 2017 at 2:43 PM Brandon Williams  wrote:
> 
>> Not specific to C*, but still sjk plus a million times over:
>> 
>> https://github.com/aragozin/jvm-tools
>> 
>> On Mon, Apr 3, 2017 at 4:17 PM, Jeff Jirsa  wrote:
>> 
>>> Monday follow-up:
>>> 
>>> There exist a lot of tools that a lot of people know exist and use for
>>> managing clusters. Examples are repair scripts like
>>> - https://github.com/spotify/cassandra-reaper
>>> - https://github.com/BrianGallew/cassandra_range_repair
>>> - https://github.com/thelastpickle/cassandra-reaper
>>> 
>>> Or cassandra puppet modules:
>>> - https://github.com/locp/cassandra
>>> 
>>> Or Brian Hess' counter+loader tools:
>>> - https://github.com/brianmhess/cassandra-count
>>> - https://github.com/brianmhess/cassandra-loader
>>> 
>>> Or Brian Gallew's tools at:
>>> - https://github.com/BrianGallew/cassandra_tools
>>> 
>>> I'd like to make a more comprehensive list of these third party tools and
>>> include them in the docs - are there any tools you find useful in using
>>> Cassandra that should be listed?
>>> 
>>> 
>>> 
>>> 
>>> On Sun, Apr 2, 2017 at 7:53 PM, Jeff Jirsa  wrote:
>>> 
 From the email list:
 
 1) Some grad students are doing some research on Cassandra and have
>> open
 questions about thread interactions - would be great if someone had
>> time
>>> to
 help answer their questions: https://lists.apache.org/thread.html/
 70ae332f54676352755bcb421b66a56fe27e296f8828eff26727bb58@%
 3Cdev.cassandra.apache.org%3E
 
 2) The ongoing discussion of code quality, testing, and coverage
>>> continues
 - if you haven't read it yet, but you care about release quality, you
 probably should ( https://lists.apache.org/thread.html/
 0854341ae3ab41ceed2ae8a03f2486cf2325e4fca6fd800bf4297dd4@%
 3Cdev.cassandra.apache.org%3E )
 
 3) As a follow-up to #2, I proposed pushing up some CircleCI and Travis
 YML files into the active branches to make testing easier -  (
 https://lists.apache.org/thread.html/48e73ff0d2aff5af3d6feb20af5e7f
 4318c17379471abb2c16c2dcdf@%3Cdev.cassandra.apache.org%3E /
 https://issues.apache.org/jira/browse/CASSANDRA-13388 ). If anyone has
 strong opinions on Circle/Travis, or is great with setting up yaml
>> files
>>> to
 configure parallel CI tests, wouldn't mind having some feedback here
>>> (like
 "we can split up nosetests like ", or "let's use  instead
>> because
 it'll be easier", or "just push it as is and we'll iterate on it
>> later" -
 I'm personally leaning towards "push it and iterate later", since it's
>>> not
 actually impacting the database at runtime, but other feedback is
>> always
 great)
 
 
 JIRA stuff:
 
 - https://issues.apache.org/jira/browse/CASSANDRA-13396 - we use
>> SLF4J,
 but really only support logback (though we didn't actively exclude
>> other
 loggers until recently); people with long histories working on the
>>> project
 (especially you Datastax and TLP folks), may want to throw in their two
 cents on this ticket. People commenting should endeavor to keep things
>>> fact
 based, and not emotional.
 
 Some more patch-available JIRAs but no reviewers:
 - https://issues.apache.org/jira/browse/CASSANDRA-13397 (Return value
>> of
 CountDownLatch.await() not being checked in Repair )
 - https://issues.apache.org/jira/browse/CASSANDRA-13307 (Make CQLSH
>>> Great
 Again by making CQLSH downgrade native protocol properly)
 - https://issues.apache.org/jira/browse/CASSANDRA-12962 (SASI
>> needlessly
 rebuilding empty indices)
 - https://issues.apache.org/jira/browse/CASSANDRA-13067 (Huge AWS
 filesystems overflow Long.MAX_INT)
 - https://issues.apache.org/jira/browse/CASSANDRA-12748 (GREP_COLOR
 environment variable breaks things)
 And These two BTree related patches have been around a while, still no
 reviewer:
 - https://issues.apache.org/jira/browse/CASSANDRA-9989 &
 https://issues.apache.org/jira/browse/CASSANDRA-9988
 
 - Jeff
 
>>> 
>> 



Re: Weekly Cassandra Wrap-up

2017-04-03 Thread Jeff Jirsa
Monday follow-up:

There exist a lot of tools that a lot of people know exist and use for
managing clusters. Examples are repair scripts like
- https://github.com/spotify/cassandra-reaper
- https://github.com/BrianGallew/cassandra_range_repair
- https://github.com/thelastpickle/cassandra-reaper

Or cassandra puppet modules:
- https://github.com/locp/cassandra

Or Brian Hess' counter+loader tools:
- https://github.com/brianmhess/cassandra-count
- https://github.com/brianmhess/cassandra-loader

Or Brian Gallew's tools at:
- https://github.com/BrianGallew/cassandra_tools

I'd like to make a more comprehensive list of these third party tools and
include them in the docs - are there any tools you find useful in using
Cassandra that should be listed?




On Sun, Apr 2, 2017 at 7:53 PM, Jeff Jirsa  wrote:

> From the email list:
>
> 1) Some grad students are doing some research on Cassandra and have open
> questions about thread interactions - would be great if someone had time to
> help answer their questions: https://lists.apache.org/thread.html/
> 70ae332f54676352755bcb421b66a56fe27e296f8828eff26727bb58@%
> 3Cdev.cassandra.apache.org%3E
>
> 2) The ongoing discussion of code quality, testing, and coverage continues
> - if you haven't read it yet, but you care about release quality, you
> probably should ( https://lists.apache.org/thread.html/
> 0854341ae3ab41ceed2ae8a03f2486cf2325e4fca6fd800bf4297dd4@%
> 3Cdev.cassandra.apache.org%3E )
>
> 3) As a follow-up to #2, I proposed pushing up some CircleCI and Travis
> YML files into the active branches to make testing easier -  (
> https://lists.apache.org/thread.html/48e73ff0d2aff5af3d6feb20af5e7f
> 4318c17379471abb2c16c2dcdf@%3Cdev.cassandra.apache.org%3E /
> https://issues.apache.org/jira/browse/CASSANDRA-13388 ). If anyone has
> strong opinions on Circle/Travis, or is great with setting up yaml files to
> configure parallel CI tests, wouldn't mind having some feedback here (like
> "we can split up nosetests like ", or "let's use  instead because
> it'll be easier", or "just push it as is and we'll iterate on it later" -
> I'm personally leaning towards "push it and iterate later", since it's not
> actually impacting the database at runtime, but other feedback is always
> great)
>
>
> JIRA stuff:
>
> - https://issues.apache.org/jira/browse/CASSANDRA-13396 - we use SLF4J,
> but really only support logback (though we didn't actively exclude other
> loggers until recently); people with long histories working on the project
> (especially you Datastax and TLP folks), may want to throw in their two
> cents on this ticket. People commenting should endeavor to keep things fact
> based, and not emotional.
>
> Some more patch-available JIRAs but no reviewers:
> - https://issues.apache.org/jira/browse/CASSANDRA-13397 (Return value of
> CountDownLatch.await() not being checked in Repair )
> - https://issues.apache.org/jira/browse/CASSANDRA-13307 (Make CQLSH Great
> Again by making CQLSH downgrade native protocol properly)
> - https://issues.apache.org/jira/browse/CASSANDRA-12962 (SASI needlessly
> rebuilding empty indices)
> - https://issues.apache.org/jira/browse/CASSANDRA-13067 (Huge AWS
> filesystems overflow Long.MAX_INT)
> - https://issues.apache.org/jira/browse/CASSANDRA-12748 (GREP_COLOR
> environment variable breaks things)
> And These two BTree related patches have been around a while, still no
> reviewer:
> - https://issues.apache.org/jira/browse/CASSANDRA-9989 &
> https://issues.apache.org/jira/browse/CASSANDRA-9988
>
> - Jeff
>