Re: Dropping down replication factor
nothing in logs on the node that it was streaming from. however, I think I found the issue on the other node in the C rack: ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354 StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5] Streaming error occurred org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /ephemeral/cassandra/data/... I did a 'cat /var/log/cassandra/system.log|grep Corrupt' and it seems it's a single Index.db file and nothing on the other node. I think nodetool scrub or offline sstablescrub might be in order but with the current load I'm not sure I can take it offline for very long. Thanks again for the help. On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsawrote: > Compaction is backed up – that may be normal write load (because of the > rack imbalance), or it may be a secondary index build. Hard to say for > sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack > probably not necessary, streaming is being marked as failed and it’s > turning itself off. Not sure why streaming is marked as failing, though, > anything on the sending sides? > > > > > > From: Brian Spindler > Reply-To: > Date: Saturday, August 12, 2017 at 6:34 PM > To: > Subject: Re: Dropping down replication factor > > Thanks for replying Jeff. > > Responses below. > > On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa wrote: > >> Answers inline >> >> -- >> Jeff Jirsa >> >> >> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: >> > >> > Hi folks, hopefully a quick one: >> > >> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's >> all in one region but spread across 3 availability zones. It was nicely >> balanced with 4 nodes in each. >> > >> > But with a couple of failures and subsequent provisions to the wrong az >> we now have a cluster with : >> > >> > 5 nodes in az A >> > 5 nodes in az B >> > 2 nodes in az C >> > >> > Not sure why, but when adding a third node in AZ C it fails to stream >> after getting all the way to completion and no apparent error in logs. >> I've looked at a couple of bugs referring to scrubbing and possible OOM >> bugs due to metadata writing at end of streaming (sorry don't have ticket >> handy). I'm worried I might not be able to do much with these since the >> disk space usage is high and they are under a lot of load given the small >> number of them for this rack. >> >> You'll definitely have higher load on az C instances with rf=3 in this >> ratio > > >> Streaming should still work - are you sure it's not busy doing something? >> Like building secondary index or similar? jstack thread dump would be >> useful, or at least nodetool tpstats >> >> Only other thing might be a backup. We do incrementals x1hr and > snapshots x24h; they are shipped to s3 then links are cleaned up. The > error I get on the node I'm trying to add to rack C is: > > ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception > encountered during startup > java.lang.RuntimeException: Error during boostrap: Stream failed > at > org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) > [apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) > [apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) > [apache-cassandra-2.1.15.jar:2.1.15] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) > ~[guava-16.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) > ~[guava-16.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-16.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-16.0.jar:na] > at >
Re: Dropping down replication factor
Compaction is backed up that may be normal write load (because of the rack imbalance), or it may be a secondary index build. Hard to say for sure. nodetool compactionstats¹ if you¹re able to provide it. The jstack probably not necessary, streaming is being marked as failed and it¹s turning itself off. Not sure why streaming is marked as failing, though, anything on the sending sides? From: Brian SpindlerReply-To: Date: Saturday, August 12, 2017 at 6:34 PM To: Subject: Re: Dropping down replication factor Thanks for replying Jeff. Responses below. On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa wrote: > Answers inline > > -- > Jeff Jirsa > > >> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: >> > >> > Hi folks, hopefully a quick one: >> > >> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's all >> in one region but spread across 3 availability zones. It was nicely balanced >> with 4 nodes in each. >> > >> > But with a couple of failures and subsequent provisions to the wrong az we >> now have a cluster with : >> > >> > 5 nodes in az A >> > 5 nodes in az B >> > 2 nodes in az C >> > >> > Not sure why, but when adding a third node in AZ C it fails to stream after >> getting all the way to completion and no apparent error in logs. I've looked >> at a couple of bugs referring to scrubbing and possible OOM bugs due to >> metadata writing at end of streaming (sorry don't have ticket handy). I'm >> worried I might not be able to do much with these since the disk space usage >> is high and they are under a lot of load given the small number of them for >> this rack. > > You'll definitely have higher load on az C instances with rf=3 in this ratio > > Streaming should still work - are you sure it's not busy doing something? Like > building secondary index or similar? jstack thread dump would be useful, or at > least nodetool tpstats > Only other thing might be a backup. We do incrementals x1hr and snapshots x24h; they are shipped to s3 then links are cleaned up. The error I get on the node I'm trying to add to rack C is: ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception encountered during startup java.lang.RuntimeException: Error during boostrap: Stream failed at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:11 66) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.jav a:944) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:7 40) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:6 17) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) [apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:5 66) [apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) [apache-cassandra-2.1.15.jar:2.1.15] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(S treamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.15.jar:2.1.15] at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.ex ecute(MoreExecutors.java:297) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionLis t.java:156) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:1 45) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture .java:202) ~[guava-16.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResult Future.java:209) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(Stre amResultFuture.java:185) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java :413) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.ja va:700) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.jav a:661) ~[apache-cassandra-2.1.15.jar:2.1.15] at
Re: Dropping down replication factor
Thanks for replying Jeff. Responses below. On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsawrote: > Answers inline > > -- > Jeff Jirsa > > > > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: > > > > Hi folks, hopefully a quick one: > > > > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's > all in one region but spread across 3 availability zones. It was nicely > balanced with 4 nodes in each. > > > > But with a couple of failures and subsequent provisions to the wrong az > we now have a cluster with : > > > > 5 nodes in az A > > 5 nodes in az B > > 2 nodes in az C > > > > Not sure why, but when adding a third node in AZ C it fails to stream > after getting all the way to completion and no apparent error in logs. > I've looked at a couple of bugs referring to scrubbing and possible OOM > bugs due to metadata writing at end of streaming (sorry don't have ticket > handy). I'm worried I might not be able to do much with these since the > disk space usage is high and they are under a lot of load given the small > number of them for this rack. > > You'll definitely have higher load on az C instances with rf=3 in this > ratio > Streaming should still work - are you sure it's not busy doing something? > Like building secondary index or similar? jstack thread dump would be > useful, or at least nodetool tpstats > > Only other thing might be a backup. We do incrementals x1hr and snapshots x24h; they are shipped to s3 then links are cleaned up. The error I get on the node I'm trying to add to rack C is: ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception encountered during startup java.lang.RuntimeException: Error during boostrap: Stream failed at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) [apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) [apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) [apache-cassandra-2.1.15.jar:2.1.15] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.15.jar:2.1.15] at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:209) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:185) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:413) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:700) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:661) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.15.jar:2.1.15] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_112] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_112] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112] WARN [StorageServiceShutdownHook] 2017-08-12 23:54:51,582 Gossiper.java:1462 - No local state or state is in silent
Re: Dropping down replication factor
Answers inline -- Jeff Jirsa > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: > > Hi folks, hopefully a quick one: > > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's all in > one region but spread across 3 availability zones. It was nicely balanced > with 4 nodes in each. > > But with a couple of failures and subsequent provisions to the wrong az we > now have a cluster with : > > 5 nodes in az A > 5 nodes in az B > 2 nodes in az C > > Not sure why, but when adding a third node in AZ C it fails to stream after > getting all the way to completion and no apparent error in logs. I've looked > at a couple of bugs referring to scrubbing and possible OOM bugs due to > metadata writing at end of streaming (sorry don't have ticket handy). I'm > worried I might not be able to do much with these since the disk space usage > is high and they are under a lot of load given the small number of them for > this rack. You'll definitely have higher load on az C instances with rf=3 in this ratio Streaming should still work - are you sure it's not busy doing something? Like building secondary index or similar? jstack thread dump would be useful, or at least nodetool tpstats > > Rather than troubleshoot this further, what I was thinking about doing was: > - drop the replication factor on our keyspace to two Repair before you do this, or you'll lose your consistency guarantees > - hopefully this would reduce load on these two remaining nodes It should, racks awareness guarantees on replica per rack if rf==num racks, so right now those 2 c machines have 2.5x as much data as the others. This will drop that requirement and drop the load significantly > - run repairs/cleanup across the cluster > - then shoot these two nodes in the 'c' rack Why shoot the c instances? Why not drop RF and then add 2 more C instances, then increase RF back to 3, run repair, then Decom the extra instances in a and b? > - run repairs/cleanup across the cluster > > Would this work with minimal/no disruption? The big risk of running rf=2 is that quorum==all - any gc pause or node restarting will make you lose HA or strong consistency guarantees. > Should I update their "rack" before hand or after ? You can't change a node's rack once it's in the cluster, it SHOULD refuse to start if you do that > What else am I not thinking about? > > My main goal atm is to get back to where the cluster is in a clean consistent > state that allows nodes to properly bootstrap. > > Thanks for your help in advance. > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Dropping down replication factor
Hi folks, hopefully a quick one: We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's all in one region but spread across 3 availability zones. It was nicely balanced with 4 nodes in each. But with a couple of failures and subsequent provisions to the wrong az we now have a cluster with : 5 nodes in az A 5 nodes in az B 2 nodes in az C Not sure why, but when adding a third node in AZ C it fails to stream after getting all the way to completion and no apparent error in logs. I've looked at a couple of bugs referring to scrubbing and possible OOM bugs due to metadata writing at end of streaming (sorry don't have ticket handy). I'm worried I might not be able to do much with these since the disk space usage is high and they are under a lot of load given the small number of them for this rack. Rather than troubleshoot this further, what I was thinking about doing was: - drop the replication factor on our keyspace to two - hopefully this would reduce load on these two remaining nodes - run repairs/cleanup across the cluster - then shoot these two nodes in the 'c' rack - run repairs/cleanup across the cluster Would this work with minimal/no disruption? Should I update their "rack" before hand or after ? What else am I not thinking about? My main goal atm is to get back to where the cluster is in a clean consistent state that allows nodes to properly bootstrap. Thanks for your help in advance. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Answering the question of Cassandra Summit 2017
Thanks for sending that out Patrick! Really sad not to have a 2017 summit, but looking forward to 2018 summit(s). On Fri, Aug 11, 2017 at 5:27 PM, Patrick McFadinwrote: > Hello Cassandra Community, > > I know this is a hot topic so let me TL;DR for those of you looking for > the short answer. Will there be a Cassandra Summit in 2017? DataStax will > not be holding a Cassandra Summit in 2017, but instead multiple DataStax > Summits in 2018. > > More details. Last year was pretty chaotic for the Cassandra community and > where DataStax fit with the project. I don’t need to re-cap all the drama. > You can go look at the dev and user lists around this time last year if you > want to re-live it. It’s safe to say 2016 is a year I wouldn’t want to do > again for a lot of reasons. Those of us at the Cassandra Summit 2016 knew > it was the end of something, now the question is what’s next and what will > it be? > > Having a place for people together and talk about what we as a community > do is really important. Those of you that know me, know how much I live > that. When we started talking summit inside DataStax, we realized it would > be a hot button issue. When I started talking to people in the community, > it was even more of a hot button issue. Having DataStax run the Cassandra > Summit was going to cause a lot of heartache and would further divide the > community with questions and accusations. It’s just much easier to hold a > DataStax Summit so we are just out there plainly and move forward. > > What DataStax will be doing different. > > We will be moving back to a more regional format instead of the big bang > single event in San Jose starting in 2018. Fun fact. Almost 80% of > attendees of Cassandra Summit were from the Bay Area. That means we have > developers and operators from a lot of other places being excluded which > isn’t cool. We will also be inviting talks from the Cassandra Community. > You don’t have to be a DataStax customer or partner to get on the speaking > list. > > If there is some new group or company that launches a Cassandra Summit, > DataStax will happily be a sponsor. There are some for-profit, professional > conference companies like the Linux Foundation out there that just may and > if so, I’ll see you there. After being involved in making the Cassandra > Summit happen for years, I can say it’s no small effort. > > There it is. Fire away with your questions, comments. All I ask is keep it > respectful because this is a community of amazing people. You have changed > the world over these years and I know it won’t stop. You know I got a hug > for you wherever we just happen to meet. > > Patrick > >