Re: Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
A little update on the progress. First: Thank you Thomas. I checked the code in the patch and briefly skimmed through the 3.0.6 code. Yup it should be fixed. Thank you Surbhi. At the moment we don't need authentication as the instances are locked down. Now: - Unfortunately the

Re: Read timeouts when performing rolling restart

2018-09-12 Thread Surbhi Gupta
Another thing to notice is : system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} system_auth has a replication factor of 1 and even if one node is down it may impact the system because of the replication factor. On Wed, 12 Sep 2018 at 09:46, Steinmaurer,

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-12 Thread Jay Zhuang
We have the similar use case: Streamific, the Ingestion Service for Hadoop Big Data at Uber Engineering . We had this data ingestion pipeline built on MySQL/schemaless before using Cassandra. For Cassandra, we used to

RE: Read timeouts when performing rolling restart

2018-09-12 Thread Steinmaurer, Thomas
Hi, I remember something that a client using the native protocol gets notified too early by Cassandra being ready due to the following issue: https://issues.apache.org/jira/browse/CASSANDRA-8236 which looks similar, but above was marked as fixed in 2.2. Thomas From: Riccardo Ferrari Sent:

Can't replace failed node

2018-09-12 Thread Ian Spence
Hello, ENV: Cassandra: 2.2.9, JRE: 1.8.0_74, CentOS 6/7 We've run into a weird issue as a result of a node dying. A specific server had a catastrophic failure and lost all data on disk. It's since been replaced but during that time we've brought down the cluster a couple of times (as this is a

Re: Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
Hi Alain, Thank you for chiming in! I was thinking to perform the 'start_native_transport=false' test as well and indeed the issue is not showing up. Starting the/a node with native transport disabled and letting it cool down lead to no timeout exceptions no dropped messages, simply a crystal

Re: nodetool rebuild

2018-09-12 Thread Surbhi Gupta
Increase 3 throughput Compaction throughput Stream throughput Interdcstream throughput (if rebuilding from another DC) Make all of the above to 0 and see if there is any improvement and later set the value if u can’t leave these values to 0. On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk wrote:

Re: Anticompaction causing significant increase in disk usage

2018-09-12 Thread Martin Mačura
Hi Alain, thank you for your response. I'm using incremental repair. I'm afraid subrange repair is not a viable alternative, because it's very slow - takes over a week to complete. I've found at least a partial solution - specifying '-local' or '-dc' parameter will also disable anticompaction,

nodetool rebuild

2018-09-12 Thread Vitali Dyachuk
Hi, I'm currently streaming data with nodetool rebuild on 2 nodes, each node is streaming from different location. The problem is that it takes ~7 days to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s so it should take around ~2,5 days . Although there are resources on the

Re: Anticompaction causing significant increase in disk usage

2018-09-12 Thread Alain RODRIGUEZ
Hello Martin, How do you perform the repairs? Are you using incremental repairs or full repairs but without subranges? Alex described issues related to these repairs here: http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html . *tl;dr: * The only way to perform repair

Re: Read timeouts when performing rolling restart

2018-09-12 Thread Alain RODRIGUEZ
Hello Ricardo How come that a single node is impacting the whole cluster? > It sounds weird indeed. Is there a way to further delay the native transposrt startup? You can configure 'start_native_transport: false' in 'cassandra.yaml'. (

Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
Hi list, We are seeing the following behaviour when performing a rolling restart: On the node I need to restart: * I run the 'nodetool drain' * Then 'service cassandra restart' so far so good. The load incerase on the other 5 nodes is negligible. The node is generally out of service just for

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-12 Thread DuyHai Doan
The biggest problem of having CDC working correctly in C* is the deduplication issue. Having a process to read incoming mutation from commitlog is not that hard, having to dedup them through N replicas is much harder The idea is : why don't we generate the CDC event directly at the coordinator

Re: impact/incompatibility of patch backport on Cassandra 3.11.2

2018-09-12 Thread Ahmed Eljami
Got it! Thanks a lot Jeff :)

Anticompaction causing significant increase in disk usage

2018-09-12 Thread Martin Mačura
Hi, we're on cassandra 3.11.2 . During an anticompaction after repair, TotalDiskSpaceUsed value of one table gradually went from 700GB to 1180GB, and then suddenly jumped back to 700GB. This happened on all nodes involved in the repair. There was no change in PercentRepaired during or after this

Adding datacenter and data verification

2018-09-12 Thread Pradeep Chhetri
Hello I am running cassandra 3.11.3 5-node cluster on AWS with SimpleSnitch. I was testing the process to migrate to GPFS using AWS region as the datacenter name and AWS zone as the rack name in my preprod environment and was able to achieve it. But before decommissioning the older datacenter, I