Re: Cassandra Cluster issues

2017-05-08 Thread benjamin roth
Hm that question is like "My car does not start - whats the problem?".
You have to monitor, monitor, monitor, monitor. I'd strongly advice to
graph as many metrics as you can. Read them from the JMX interface and
write them to a TSDB, visualize them e.g. with Grafana.
Then read logs, trace your queries, check all the system metrics like CPU
consumption, Disk IO, Network IO, Memory usage, Java GC pauses.

Then you will be able to find the bottleneck.

2017-05-08 15:15 GMT+02:00 Mehdi Bada :

> Dear Cassandra Users,
>
> I have some issues since few days with the following cluster:
>
> - 5 nodes
> - Cassandra 3.7
> - 2 seed nodes
> - 1 keyspace with RF=2, 300Go / nodes, WRITE_LEVEL=ONE, READ_LEVEL=ONE
> - 1 enormous table (90% of the keyspace)
> - TTL for each line insered
>
> The cluster is write oriented. All machines are consuming between 5 - 10 %
> of the CPU and 45 % of RAM.
>
> The cluster is very slow since the last repair, not all writes have been
> done... I don't know how to start the debbuging of my cluster.
>
> Do you have any ideas ?
>
>
> Many thanks in advance
>
> Regards
> Mehdi Bada
>
> 
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 <+41%2032%20422%2096%2000> | Mobile: +41 79 928
> 75 48 <+41%2079%20928%2075%2048> | Fax: +41 32 422 96 15
> <+41%2032%20422%2096%2015>
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> *
>
>


Cassandra Cluster issues

2017-05-08 Thread Mehdi Bada
Dear Cassandra Users, 

I have some issues since few days with the following cluster: 

- 5 nodes 
- Cassandra 3.7 
- 2 seed nodes 
- 1 keyspace with RF=2, 300Go / nodes, WRITE_LEVEL=ONE, READ_LEVEL=ONE 
- 1 enormous table (90% of the keyspace) 
- TTL for each line insered 

The cluster is write oriented. All machines are consuming between 5 - 10 % of 
the CPU and 45 % of RAM. 

The cluster is very slow since the last repair, not all writes have been 
done... I don't know how to start the debbuging of my cluster. 

Do you have any ideas ? 


Many thanks in advance 

Regards 
Mehdi Bada 

 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 



⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 



Re: cluster issues

2013-02-01 Thread aaron morton
For Data Stax Enterprise specific questions try the support forums 
http://www.datastax.com/support-forums/

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 8:27 AM, S C as...@outlook.com wrote:

 I am using DseDelegateSnitch
 
 Thanks,
 SC
 From: aa...@thelastpickle.com
 Subject: Re: cluster issues
 Date: Tue, 29 Jan 2013 20:15:45 +1300
 To: user@cassandra.apache.org
 
   • We can always be proactive in keeping the time sync. But, Is there 
 any way to recover from a time drift (in a reactive manner)? Since it was a 
 lab environment, I dropped the KS (deleted data directory)
 There is a way to remove future dated columns, but it not for the faint 
 hearted. 
 
 Basically:
 1) Drop the gc_grace_seconds to 0
 2) Delete the column with a timestamp way in the future, so it is guaranteed 
 to be higher than the value you want to delete. 
 3) Flush the CF
 4) Compact all the SSTables that contain the row. The easiest way to do that 
 is a major compaction, but we normally advise not to do that because it 
 creates one big file. You can also do a user defined compaction. 
 
   • Are there any other scenarios that would lead a cluster look like 
 below? Note:Actual topology of the cluster - ONE Cassandra node and TWO 
 Analytic nodes.
   •
 What snitch are you using?
 If you have the property file snitch do all nodes have the same configuration 
 ?
 
 There is a lot of sickness there. If possible I would scrub and start again. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 29/01/2013, at 6:29 AM, S C as...@outlook.com wrote:
 
 One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I 
 figured this pretty quickly, I had few questions that am looking for some 
 answers.
 
   • We can always be proactive in keeping the time sync. But, Is there 
 any way to recover from a time drift (in a reactive manner)? Since it was a 
 lab environment, I dropped the KS (deleted data directory).
   • Are there any other scenarios that would lead a cluster look like 
 below?Note:Actual topology of the cluster - ONE Cassandra node and TWO 
 Analytic nodes.
 
 
 On 192.168.2.100
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Cassandra   rack1   Up Normal  601.34 MB   33.33%  
 0   
 192.168.2.101  Analytics   rack1   Down   Normal  149.75 MB   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
 113427455640312821154458202477256070485   
 
 On 192.168.2.101
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
 0  
 192.168.2.101  Analytics   rack1   Up Normal  158.59 MB   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
 113427455640312821154458202477256070485
 
 On 192.168.2.102
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
 0  
 192.168.2.101  Analytics   rack1   Down   Normal  ?   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Up Normal  117.02 MB   33.33%  
 113427455640312821154458202477256070485 
 
 
 Appreciate your valuable inputs.
 
 Thanks,
 SC



RE: cluster issues

2013-01-30 Thread S C
I am using DseDelegateSnitch
Thanks,SC
From: aa...@thelastpickle.com
Subject: Re: cluster issues
Date: Tue, 29 Jan 2013 20:15:45 +1300
To: user@cassandra.apache.org

We can always be proactive in keeping the time sync. But, Is there any way to 
recover from a time drift (in a reactive manner)? Since it was a lab 
environment, I dropped the KS (deleted data directory)There is a way to remove 
future dated columns, but it not for the faint hearted. 
Basically:1) Drop the gc_grace_seconds to 02) Delete the column with a 
timestamp way in the future, so it is guaranteed to be higher than the value 
you want to delete. 3) Flush the CF4) Compact all the SSTables that contain the 
row. The easiest way to do that is a major compaction, but we normally advise 
not to do that because it creates one big file. You can also do a user defined 
compaction. 
Are there any other scenarios that would lead a cluster look like below? 
Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic 
nodes.What snitch are you using?If you have the property file snitch do all 
nodes have the same configuration ?
There is a lot of sickness there. If possible I would scrub and start again. 
Cheers 
-Aaron MortonFreelance Cassandra DeveloperNew Zealand
@aaronmortonhttp://www.thelastpickle.com



On 29/01/2013, at 6:29 AM, S C as...@outlook.com wrote:





One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured 
this pretty quickly, I had few questions that am looking for some answers.
We can always be proactive in keeping the time sync. But, Is there any way to 
recover from a time drift (in a reactive manner)? Since it was a lab 
environment, I dropped the KS (deleted data directory).Are there any other 
scenarios that would lead a cluster look like below? Note:Actual topology of 
the cluster - ONE Cassandra node and TWO Analytic nodes.

On 192.168.2.100Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Cassandra   rack1  
 Up Normal  601.34 MB   33.33%  0   
192.168.2.101  Analytics   rack1   Down   Normal  
149.75 MB   33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%
  113427455640312821154458202477256070485   
On 192.168.2.101Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Analytics   rack1  
 Down   Normal  ?   33.33%  0   
192.168.2.101  Analytics   rack1   Up Normal  
158.59 MB   33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%
  113427455640312821154458202477256070485
On 192.168.2.102Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Analytics   rack1  
 Down   Normal  ?   33.33%  0   
192.168.2.101  Analytics   rack1   Down   Normal  ? 
  33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Up Normal  117.02 MB   33.33%
  113427455640312821154458202477256070485 

Appreciate your valuable inputs.
Thanks,SC
  

  

cluster issues

2013-01-28 Thread S C



One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured 
this pretty quickly, I had few questions that am looking for some answers.
We can always be proactive in keeping the time sync. But, Is there any way to 
recover from a time drift (in a reactive manner)? Since it was a lab 
environment, I dropped the KS (deleted data directory).Are there any other 
scenarios that would lead a cluster look like below? Note:Actual topology of 
the cluster - ONE Cassandra node and TWO Analytic nodes.

On 192.168.2.100Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Cassandra   rack1  
 Up Normal  601.34 MB   33.33%  0   
192.168.2.101  Analytics   rack1   Down   Normal  
149.75 MB   33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%
  113427455640312821154458202477256070485   
On 192.168.2.101Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Analytics   rack1  
 Down   Normal  ?   33.33%  0   
192.168.2.101  Analytics   rack1   Up Normal  
158.59 MB   33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%
  113427455640312821154458202477256070485
On 192.168.2.102Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Analytics   rack1  
 Down   Normal  ?   33.33%  0   
192.168.2.101  Analytics   rack1   Down   Normal  ? 
  33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Up Normal  117.02 MB   33.33%
  113427455640312821154458202477256070485 

Appreciate your valuable inputs.
Thanks,SC
  

Re: cluster issues

2013-01-28 Thread aaron morton
 We can always be proactive in keeping the time sync. But, Is there any way to 
 recover from a time drift (in a reactive manner)? Since it was a lab 
 environment, I dropped the KS (deleted data directory)
There is a way to remove future dated columns, but it not for the faint 
hearted. 

Basically:
1) Drop the gc_grace_seconds to 0
2) Delete the column with a timestamp way in the future, so it is guaranteed to 
be higher than the value you want to delete. 
3) Flush the CF
4) Compact all the SSTables that contain the row. The easiest way to do that is 
a major compaction, but we normally advise not to do that because it creates 
one big file. You can also do a user defined compaction. 

 Are there any other scenarios that would lead a cluster look like below? 
 Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic 
 nodes.
What snitch are you using?
If you have the property file snitch do all nodes have the same configuration ?

There is a lot of sickness there. If possible I would scrub and start again. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 6:29 AM, S C as...@outlook.com wrote:

 One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I 
 figured this pretty quickly, I had few questions that am looking for some 
 answers.
 
 We can always be proactive in keeping the time sync. But, Is there any way to 
 recover from a time drift (in a reactive manner)? Since it was a lab 
 environment, I dropped the KS (deleted data directory).
 Are there any other scenarios that would lead a cluster look like below? 
 Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic 
 nodes.
 
 
 On 192.168.2.100
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Cassandra   rack1   Up Normal  601.34 MB   33.33%  
 0   
 192.168.2.101  Analytics   rack1   Down   Normal  149.75 MB   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
 113427455640312821154458202477256070485   
 
 On 192.168.2.101
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
 0   
 192.168.2.101  Analytics   rack1   Up Normal  158.59 MB   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
 113427455640312821154458202477256070485
 
 On 192.168.2.102
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
 0   
 192.168.2.101  Analytics   rack1   Down   Normal  ?   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Up Normal  117.02 MB   33.33%  
 113427455640312821154458202477256070485 
 
 
 Appreciate your valuable inputs.
 
 Thanks,
 SC