Hi,

Any suggestions/comments on approach ? What you guys are doing to keep check on 
misbehaved clients and restrict Cassandra load.



Note: We will be moving to CQL driver but that will take months. 

Anuj

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" <anujw_2...@yahoo.co.in>
Date:Wed, 23 Sep, 2015 at 1:36 am
Subject:Throttling Cassandra Load

Hi,

We are using Cassandra 2.0.14 with Hector 1.1.4. Each node in cluster has an 
application using Hector and a Cassandra instance.

I want suggestions on the approach we are taking for throttling Cassandra load. 

Problem Statement: 
Misbehaved clients can bring down Cassandra clusters by putting excessive load. 
We want to prevent overloading of Cassandra cluster.

Solution Proposed:
1.  Run a Test for each application scenario involving Cassandra. Keep on 
putting more requests in each application Scenario till performance starts 
deteriorating for the scenario and note the max connection achieved during the 
tests as follows:

For Example: 
Scenario A=60 
Scenario B=70
Scneario C=90

Set rpc_max_threads= max(All scenarios)=90

2. In Hector, set MaxActive connections per host=90 

3. As Hector maintains connections PER HOST, Number of open connections by a 
Hector client on a node increases with cluster size.

e.g. On a 3 node cluster, each Hector client will open total of 90 * 3 
connections
      On a 15 node cluster, each Hector client will open total of 90 * 15 
connections

So, we have set rpc_server_type=hsha to support large client connections. Not 
sure whether https://issues.apache.org/jira/i#browse/CASSANDRA-7309 is a 
concern??

4. At application level, we check cluster load by ADDING active connections 
created by Hector on EACH node of cluster. If they are already around 95% of ( 
90 * (num of Nodes)),we reject tasks to prevent overload.

5. We see that Hector only closes idle connections when borrowing clients from 
pool .And immediately after closing idle connections, it creates a new one. So, 
if active connections increase they seldom go down and remain open(except in 
few exception scenarios). So, we cant rely on ThriftClients JMX metrics by 
Cassandra to know ACTIVE connections. ThriftClients show open connections 
rather than active.Is there a better way to know active Cassandra connections 
on a Cassandra node?? or check Cassandra load to prevent more tasks if a node 
is already overloaded?


I am looking for suggestions on above approach and more ideas on throttling 
Cassandra load ?

Thanks
Anuj

Reply via email to