I would discourage dropping to RF=2 because if you're using CL=*QUORUM, it
won't be able to tolerate a node outage.

You mentioned a couple of days ago that there's an index file that is
corrupted on 10.40.17.114. Could you try moving out the sstable set
associated with that corrupt file and try again? Though I echo Jeff's
comments and I'm concerned you have a hardware issue on that node since
OpsCenter tables got corrupted too. The replace method certainly sounds
like a good idea.

On Sun, Aug 13, 2017 at 7:58 AM, <brian.spind...@gmail.com> wrote:

> Hi folks, hopefully a quick one:
>
> We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's all
> in one region but spread across 3 availability zones.  It was nicely
> balanced with 4 nodes in each.
>
> But with a couple of failures and subsequent provisions to the wrong az we
> now have a cluster with :
>
> 5 nodes in az A
> 5 nodes in az B
> 2 nodes in az C
>
> Not sure why, but when adding a third node in AZ C it fails to stream
> after getting all the way to completion and no apparent error in logs.
> I've looked at a couple of bugs referring to scrubbing and possible OOM
> bugs due to metadata writing at end of streaming (sorry don't have ticket
> handy).  I'm worried I might not be able to do much with these since the
> disk space usage is high and they are under a lot of load given the small
> number of them for this rack.
>
> Rather than troubleshoot this further, what I was thinking about doing was:
> - drop the replication factor on our keyspace to two
> - hopefully this would reduce load on these two remaining nodes
> - run repairs/cleanup across the cluster
> - then shoot these two nodes in the 'c' rack
> - run repairs/cleanup across the cluster
>
> Would this work with minimal/no disruption?
> Should I update their "rack" before hand or after ?
> What else am I not thinking about?
>
> My main goal atm is to get back to where the cluster is in a clean
> consistent state that allows nodes to properly bootstrap.
>
> Thanks for your help in advance.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Reply via email to