Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-28 Thread Fd Habash
Hi all … The datastax & apache docs are clear: run ‘nodetool repair’ after you alter a keyspace to change its RF or RS. However, the details are all over the place as what type of repair and on what nodes it needs to run. None of the above doc authorities are clear and what you find on the

Is There a Way To Proactively Monitor Reads Returning No Data Due to Consistency Level?

2019-05-07 Thread Fd Habash
Typically, when a read is submitted to C*, it may complete with … 1. No errors & returns expected data 2. Errors out with UnavailableException 3. No error & returns zero rows on first attempt, but returned on subsequent runs. The third scenario happens as a result of cluster entropy specially

CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Fd Habash
C*: 2.2.8 Write CL = LQ Kspace RF = 3 Three racks A write gets received by node 1 in rack 1 at above specs. Node 1 (rack1) & node 2 (rack2) acknowledge it to the client. Within some unit of time, node 1 & 2 die. Either …. - Scenario 1: C* process death: Row did not make it to sstable (it is

RE: Bootstrapping to Replace a Dead Node vs. Adding a NewNode:Consistency Guarantees

2019-05-01 Thread Fd Habash
nsistent data , given you can tolerate bit latency until your repair is complete – if you go by recommendation i.e. to add one node at a time – you’ll avoid all these nuances . From: Fd Habash [mailto:fmhab...@gmail.com] Sent: Wednesday, May 01, 2019 3:12 PM To: user@cassandra.apache.org Sub

RE: Bootstrapping to Replace a Dead Node vs. Adding a New Node:Consistency Guarantees

2019-05-01 Thread Fd Habash
not applies as the replacing node just owns the token ranges of the dead node. I think that’s why the restriction of only replacing one node at a time does not applies in this case.      Thanks Alok Dwivedi Senior Consultant https://www.instaclustr.com/platform/           From: Fd Habash Repl

Bootstrapping to Replace a Dead Node vs. Adding a New Node: Consistency Guarantees

2019-04-30 Thread Fd Habash
Reviewing the documentation & based on my testing, using C* 2.2.8, I was not able to extend the cluster by adding multiple nodes simultaneously. I got an error message … Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true I

RE: A keyspace with RF=3, Cluster with 3 RACS, CL=LQ: No Data on FirstAttempt, but 1 Row Aftwards

2019-04-23 Thread Fd Habash
Any ideas, please? Thank you From: Fd Habash Sent: Tuesday, April 23, 2019 10:38 AM To: user@cassandra.apache.org Subject: A keyspace with RF=3, Cluster with 3 RACS, CL=LQ: No Data on FirstAttempt, but 1 Row Aftwards Cluster setup … - C* 2.2.8 - Three RACs, one DC - Keyspace

A keyspace with RF=3, Cluster with 3 RACS, CL=LQ: No Data on First Attempt, but 1 Row Aftwards

2019-04-23 Thread Fd Habash
Cluster setup … - C* 2.2.8 - Three RACs, one DC - Keyspace with RF=3 - RS = Network topology At CL=LQ … I get zero rows on first attempt, and one row on the second or third. Once found, I always get the row afterwards. Trying to understand this behavior … First attempt, my read request hits

RE: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

2019-03-14 Thread Fd Habash
You could have run removenode You could have run assassinate Also could be some new bug, but that's much less likely.  On Thu, Mar 14, 2019 at 2:50 PM Fd Habash wrote: I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attem

Cannot replace_address /10.xx.xx.xx because it doesn't exist in gossip

2019-03-14 Thread Fd Habash
I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attempted to replace it today, I got this message ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception encountered during startup java.lang.RuntimeException:

Loss of an Entire AZ in a Three-AZ Cassandra Cluster

2019-03-08 Thread Fd Habash
Assume you have a 30 node cluster distributed across three AZ’s with an RF of 3. Trying to come up with a runbook to manage multi-nodes failure as a result of … - Loss of an entire AZ1 - Loss of multiple nodes in AZ2 - AZ3 unaffected. No node loss Is this is most optimal plan. Replacing dead

Migrating to Reaper: Switching From Incremental to Reaper's Full Subrange Repair

2018-06-13 Thread Fd Habash
For those who are using Reaper … Currently, I'm run repairs using crontab/nodetool using 'repair -pr' on 2.2.8 which defaults to incremental. If I migrate to Reaper, do I have to mark sstables as un-repaired first? Also, out of the box, does Reaper run full parallel repair? If yes, is it not

RE: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Fd Habash
. Let’s wait for other experts to comment. Can you also check sstable count for each table just to be sure that they are not extraordinarily high? Sent from my iPhone On Jun 11, 2018, at 10:21 AM, Fd Habash wrote: Yes we did after adding the three nodes back and a full cluster repair as well

RE: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Fd Habash
Yes we did after adding the three nodes back and a full cluster repair as well. But even it we didn’t run cleanup, would it have impacted read latency the fact that some nodes still have sstables that they no longer need? Thanks Thank you From: Nitan Kainth Sent: Monday,

RE: Cassandra upgrade from 2.2.8 to 3.10

2018-03-28 Thread Fd Habash
Thank you. In regards to my second inquiry, as we plan for C* upgrades, I did not find the NEWS.txt always to be telling of possible upgrade paths. Is there a rule of thumb or may be an official reference for upgrade paths? Thank you From: Alexander Dejanovski Sent:

RE: On a 12-node Cluster, Starting C* on a Seed Node Increases ReadLatency from 150ms to 1.5 sec.

2018-03-02 Thread Fd Habash
2018-03-02 14:42 GMT+00:00 Fd Habash <fmhab...@gmail.com>: This is a 2.8.8. cluster with three AWS AZs, each with 4 nodes.   Few days ago we noticed a single node’s read latency reaching 1.5 secs there was 8 others with read latencies going up near 900 ms.   This single node was a see

On a 12-node Cluster, Starting C* on a Seed Node Increases Read Latency from 150ms to 1.5 sec.

2018-03-02 Thread Fd Habash
This is a 2.8.8. cluster with three AWS AZs, each with 4 nodes. Few days ago we noticed a single node’s read latency reaching 1.5 secs there was 8 others with read latencies going up near 900 ms. This single node was a seed node and it was running a ‘repair -pr’ at the time. We intervened as

RE: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Fd Habash
Thank you From: Fd Habash Sent: Thursday, February 22, 2018 9:00 AM To: user@cassandra.apache.org Subject: RE: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster “ data was allowed to fully rebalance/repair/drain before the next node

RE: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase inRead Latency After Shrinking Cluster

2018-02-22 Thread Fd Habash
<fmhab...@gmail.com> wrote: One node at a time  On Feb 21, 2018 10:23 AM, "Carl Mueller" <carl.muel...@smartthings.com> wrote: What is your replication factor?  Single datacenter, three availability zones, is that right? You removed one node at a time or three at once? On Wed, Feb 21,

Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Fd Habash
We have had a 15 node cluster across three zones and cluster repairs using ‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk the cluster to 12. Since then, same repair job has taken up to 12 hours to finish and most times, it never does. More importantly, at some point

RE: When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-14 Thread Fd Habash
-hosts”, how do you identify what specific hosts to repair? Thanks Thank you From: Fd Habash Sent: Thursday, December 7, 2017 12:09 PM To: user@cassandra.apache.org Subject: RE: When Replacing a Node, How to Force a Consistent Bootstrap Thank you. How do I identify what other 2

RE: When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-07 Thread Fd Habash
we don't support). You'll need to repair (and you can repair before you do the replace to avoid the window of time where you violate consistency - use the -hosts option to allow repair with a down host, you'll repair A+C, so when B starts it'll definitely have all of the data). On Tue, D

When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-05 Thread Fd Habash
Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to node A and B. Before it was written to C, node B crashes. I replaced B and it bootstrapped data from node C. Now, row x is missing from C and B. If node A crashes, it will be replaced and it will bootstrap from either C

Replacing a Seed Node

2017-08-03 Thread Fd Habash
Hi all … I know there is plenty of docs on how to replace a seed node, but some are steps are contradictory e.g. need to remote the node from seed list for entire cluster. My cluster has 6 nodes with 3 seeds running C* 2.8. One seed node was terminated by AWS. I came up with this procedure.

Sync Spark Data with Cassandra Using Incremental Data Loading

2017-07-19 Thread Fd Habash
I have a scenario where data has to be loaded into Spark nodes from two data stores: Oracle and Cassandra. We did the initial loading of data and found a way to do daily incremental loading from Oracle to Spark. I’m tying to figure our how to do this from C*. What tools are available in C* to

Constant MemtableFlushWriter Messages Following upgrade from 2.2.5 to 2.2.8

2017-04-12 Thread Fd Habash
In the process of upgrading our cluster. Nodes that go upgraded are constantly emitting these messages. No impact, but wanted to know what they mean and why after the upgrade only. Any feedback will be appreciated. 17-04-10 20:18:11,580 Memtable.java:352 - Writing