Rebuild / removenode with MV is inconsistent

2017-03-01 Thread benjamin roth
Hi there, Today I come up with the following thesis: A rebuild / removenode may break the base-table <> MV contract. I'd even claim that a rebuild / removenode requires rebuilding all MVs to guarantee MV consistency. Reason: A node can have base tables with MVs. This is no problem. If these are

Re: Non-zero nodes are marked as down after restarting cassandra process

2017-03-01 Thread benjamin roth
You should always drain nodes before stopping the daemon whenever possible. This avoids commitlog replay on startup. This can take a while. But according to your description commit log replay seems not to be the cause. I once had a similar effect. Some nodes appeared down for some other nodes and

Re: AWS NVMe i3 instances performances

2017-03-01 Thread Romain Hardouin
Thanks for your feedback Daemeon!I'm a disappointed and I hope that some system settings will allow to leverage NVMe :-/What i3 instances did you benchmarked?Did you have a "preview access" to i3? Or was it available in a specific region before the announcement? Best,Romain Le Mercredi 1

Re: AWS NVMe i3 instances performances

2017-03-01 Thread daemeon reiydelle
We did. Found that, even with (CentOS, Ubuntu both for application compatibility reasons) that there is somewhat less IO and better CPU throughput at the price point. At the time my optimization work for that client ended, Amazon was looking at the IO issue, as perhaps the frame configurations

Re: Non-zero nodes are marked as down after restarting cassandra process

2017-03-01 Thread Ben Dalling
Hi Andrew, We were having problems with gossip TCP connections being held open and changed our SOP for stopping cassandra to being: nodetool disablegossip nodetool drain service cassandra stop This seemed to close down the gossip cleanly (the nodetool drain is advised as well) and meant that

Non-zero nodes are marked as down after restarting cassandra process

2017-03-01 Thread Andrew Jorgensen
Helllo, I have a cassandra cluster running on cassandra 3.0.3 and am seeing some strange behavior that I cannot explain when restarting cassandra nodes. The cluster is currently setup in a single datacenter and consists of 55 nodes. I am currently in the process of restarting nodes in the cluster

Re: Resources for fire drills

2017-03-01 Thread Oskar Kjellin
Throttle your compaction so low that it practically stops and then try so save the nodes to simulate not keeping up with compaction Sent from my iPhone > On 1 Mar 2017, at 14:35, Stefan Podkowinski wrote: > > I've just created a page for this topic that we can use to collect

Re: Resources for fire drills

2017-03-01 Thread Stefan Podkowinski
I've just created a page for this topic that we can use to collect some content: https://github.com/spodkowinski/cassandra-collab/blob/docs_firedrill/doc/source/operating/failure_scenarios.rst I've invited both of you Malte and Benjamin as collaborators in github, so you can either push changes

Re: Resources for fire drills

2017-03-01 Thread benjamin roth
@Doc: http://cassandra.apache.org/doc/latest/ is built from the git repo. So you can add documentation in doc/source and submit a patch. I personally think that is not the very best place or way to build a knowledge DB but thats what we have. 2017-03-01 13:39 GMT+01:00 Malte Pickhan

Re: Resources for fire drills

2017-03-01 Thread Malte Pickhan
Hi, really cool that this discussion gets attention. You are right my question was quite open. For me it would already be helpful to compile a list like Ben started with scenarios that can happen to a cluster and what actions/strategies you have to take to resolve the incident without loosing

Re: Resources for fire drills

2017-03-01 Thread Stefan Podkowinski
I've been thinking about this for a while, but haven't found a practical solution yet, although the term "fire drill" leaves a lot of room for interpretation. The most basic requirements I'd have for these kind of trainings would start with automated cluster provisioning for each scenario (either

AWS NVMe i3 instances performances

2017-03-01 Thread Romain Hardouin
Hi all, AWS launched i3 instances few days ago*. NVMe SSDs seem very promising! Did someone already benchmark an i3 with Cassandra? e.g. i2 vs i3If yes, with which OS and kernel version?Did you make any system tuning for NVMe? e.g. PCIe IRQ? etc. We plan to make some benchmarks but Debian is not

Re: Resources for fire drills

2017-03-01 Thread benjamin roth
But if you want to do fire-drills you only have to break things on purpose. Examples: - Cut off a commitlog file at a random position and restart CS - Overwrite some bytes in an SSTables and read all data from it - Delete some files in /var/lib/cassandra and try to restore them from backups or

Re: Resources for fire drills

2017-03-01 Thread benjamin roth
As far as I know there is no such resource, at least not officially. IMHO things like this can be improved a lot within the CS community. I just proposed on the dev-list to move the official docs out of the repo into an easier to maintain place like a Wiki or sth. This could help the community to

Re: Resources for fire drills

2017-03-01 Thread Malte Pickhan
Yeah thats the point. What I mean are some overview for basic scenarios for firedrills, so that you can exercise them with your team. Best > On 1 Mar 2017, at 11:01, benjamin roth wrote: > > Could you specify it a little bit? There are really a lot of things that can > go

Re: Resources for fire drills

2017-03-01 Thread benjamin roth
Could you specify it a little bit? There are really a lot of things that can go wrong. 2017-03-01 10:59 GMT+01:00 Malte Pickhan : > Hi Cassandra users, > > I am looking for some resources/guides for firedrill scenarios with apache > cassandra. > > Do you know anything

Resources for fire drills

2017-03-01 Thread Malte Pickhan
Hi Cassandra users, I am looking for some resources/guides for firedrill scenarios with apache cassandra. Do you know anything like that? Best, Malte