RE: Query on Cassandra clusters

2017-01-03 Thread SEAN_R_DURITY
A couple thoughts (for after you up/downgrade to one version for all nodes):

-  16 GB of total RAM on a node is a minimum I would use; 32 would be 
much better

-  With a lower amount of memory, I think would keep memtables on-heap 
in order to keep a tighter rein on how much they use. If you are consistently 
using 75% or more of heap space, you need more (either more nodes or more 
memory per node).

-  I would try giving Cassandra 50% of the RAM on the host. And remove 
any client or non-Cassandra processes. Nodes should be dedicated to Cassandra 
(for Production)

-  For disk, my rule for size-tiered is that you need 50% overhead IF 
it is primarily a single table application (90%+ of data in one table). 
Otherwise, I am ok with 35-40% overhead. Just know you can hit issues down the 
road as the sstables get larger.


Sean Durity
From: Sumit Anvekar [mailto:sumit.anve...@gmail.com]
Sent: Wednesday, December 21, 2016 3:47 PM
To: user@cassandra.apache.org
Subject: Re: Query on Cassandra clusters

Thank you Alain for the detailed explanation.
To answer you question on Java version, JVM settings and Memory usage. We are 
using using 1.8.0_45. precisely
>java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
JVM settings are identical on all nodes (cassandra-env.sh is identical).
Further when I say high on memory usage, Cassandra is using heap (-Xmx3767M) 
and off heap of about 6GB out of the total system memory of 14.7 GB. Along with 
this there are other processes running on this system which is bring the 
overall memory usage to >95%. This bring me to another point whether heap 
memory + off heap (sum of values of Space used (total)) from nodetool cfstats 
is the total memory used by Cassandra on a node?
Also, on the disk front, what is a good amount of empty space to be left out 
unused in the partition(~50%
 should be?) considering we use SizeTieredCompaction strategy?

On Wed, Dec 21, 2016 at 6:30 PM, Alain RODRIGUEZ 
<arodr...@gmail.com<mailto:arodr...@gmail.com>> wrote:
Hi Sumit,

1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra version 
3.0.3 and then newer 5 nodes have 3.6.0 version.

I strongly recommend to:


  *   Stick with one version of Apache Cassandra per cluster.
  *   Always be as close as possible from the last minor release of the 
Cassandra version in use.

So you really should not be using 3.0.6 AND 3.6.0 but rather 3.0.10 OR 3.7 
(currently). Note that Cassandra 3.X (with X > 0) uses a tic toc release cycle 
where odd are bug fixes only and even numbers introduce new features as well.

Running multiple version for a long period can induces errors, Cassandra is 
built to handle multiple versions only to give the time to operators to run a 
rolling restart. No streaming (adding / removing / repairing nodes) should 
happen during this period. Also, I have seen in the past some cases where 
changing the schema was also an issue with multiple versions leading to schema 
disagreements.

Due to this scenario, a couple boxes are running very high on memory (95% 
usage) whereas some of the older version nodes have just 60-70% memory usage.

Hard to say if this is related to the mutiple versions of Cassandra but it 
could. Are you sure nodes are using the same JVM / GC options 
(cassandra-env.sh) and Java version?

Also, what is exactly "high on memory 95%"? Are we talking about heap or Native 
memory. Isn't the memory used as page cache (that would still be available for 
the system)?

2. To counter #1, I am planning to upgrade system configuration of the nodes 
where there is higher memory usage. But the question is, will it be a problem 
if we have a Cassandra cluster, where in a couple of nodes have double the 
system configuration than other nodes in the cluster.

It is not a problem per se to have distinct configurations on distinct nodes. 
Cassandra does it very well, and it is frequently used to test some 
configuration change on a canary node, to prevent it from impacting the whole 
service.

Yet, all the nodes should be doing the same work (unless you have some 
heterogenous hardware and are using distinct number of vnodes on each node). 
Keeping things homogenous allows the operator to easily compare how nodes are 
doing and it makes reasoning about Cassandra, as well as troubleshooting issues 
a way easier.

So I would:

- Fully upgrade / downgrade asap to a chosen version (3.X is known as being not 
yet stable, but going back to 3.0.X might be more painful)
- Make sure nodes are well balanced and using the same number of ranges 
'nodetool status '
- Make sure the node are using the same Java version and JVM settings.

Hope that helps,

C*heers,
---
Alain Rodriguez - @arodream - 
al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The

Re: Query on Cassandra clusters

2016-12-21 Thread Sumit Anvekar
Thank you Alain for the detailed explanation.

To answer you question on Java version, JVM settings and Memory usage. We
are using using 1.8.0_45. precisely
>java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

JVM settings are identical on all nodes (cassandra-env.sh is identical).

Further when I say high on memory usage, Cassandra is using heap
(-Xmx3767M) and off heap of about 6GB out of the total system memory of
14.7 GB. Along with this there are other processes running on this system
which is bring the overall memory usage to >95%. This bring me to another
point whether *heap memory* + *off heap (sum of values of Space used
(total)) from nodetool cfstats* is the total memory used by Cassandra on a
node?

Also, on the disk front, what is a good amount of empty space to be left
out unused in the partition(~50%
 should be?) considering we use SizeTieredCompaction strategy?

On Wed, Dec 21, 2016 at 6:30 PM, Alain RODRIGUEZ  wrote:

> Hi Sumit,
>
> 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
>> version 3.0.3 and then newer 5 nodes have 3.6.0 version.
>
>
> I strongly recommend to:
>
>
>- Stick with one version of Apache Cassandra per cluster.
>- Always be as close as possible from the last minor release of the
>Cassandra version in use.
>
>
> So you *really should* not be using 3.0.6 *AND* 3.6.0 but rather 3.0.10
> *OR* 3.7 (currently). Note that Cassandra 3.X (with X > 0) uses a tic toc
> release cycle where odd are bug fixes only and even numbers introduce new
> features as well.
>
> Running multiple version for a long period can induces errors, Cassandra
> is built to handle multiple versions only to give the time to operators to
> run a rolling restart. No streaming (adding / removing / repairing nodes)
> should happen during this period. Also, I have seen in the past some cases
> where changing the schema was also an issue with multiple versions leading
> to schema disagreements.
>
> Due to this scenario, a couple boxes are running very high on memory (95%
>> usage) whereas some of the older version nodes have just 60-70% memory
>> usage.
>
>
> Hard to say if this is related to the mutiple versions of Cassandra but it
> could. Are you sure nodes are using the same JVM / GC options
> (cassandra-env.sh) and Java version?
>
> Also, what is exactly "high on memory 95%"? Are we talking about heap or
> Native memory. Isn't the memory used as page cache (that would still be
> available for the system)?
>
> 2. To counter #1, I am planning to upgrade system configuration of the
>> nodes where there is higher memory usage. But the question is, will it be a
>> problem if we have a Cassandra cluster, where in a couple of nodes have
>> double the system configuration than other nodes in the cluster.
>>
>
> It is not a problem per se to have distinct configurations on distinct
> nodes. Cassandra does it very well, and it is frequently used to test some
> configuration change on a canary node, to prevent it from impacting the
> whole service.
>
> Yet, all the nodes should be doing the same work (unless you have some
> heterogenous hardware and are using distinct number of vnodes on each
> node). Keeping things homogenous allows the operator to easily compare how
> nodes are doing and it makes reasoning about Cassandra, as well as
> troubleshooting issues a way easier.
>
> So I would:
>
> - Fully upgrade / downgrade asap to a chosen version (3.X is known as
> being not yet stable, but going back to 3.0.X might be more painful)
> - Make sure nodes are well balanced and using the same number of ranges
> 'nodetool status '
> - Make sure the node are using the same Java version and JVM settings.
>
> Hope that helps,
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-12-21 8:22 GMT+01:00 Sumit Anvekar :
>
>> I have a couple questions.
>>
>> 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
>> version 3.0.3 and then newer 5 nodes have 3.6.0 version. I has been running
>> fine until recently I am seeing higher amount of data residing in newer
>> boxes. The configuration file (YAML file) is exactly same on all nodes
>> (except for the node host names). Wondering if the version has something to
>> do with this scenario. Due to this scenario, a couple boxes are running
>> very high on memory (95% usage) whereas some of the older version nodes
>> have just 60-70% memory usage.
>>
>> 2. To counter #1, I am planning to upgrade system configuration of the
>> nodes where there is higher memory usage. But the question is, will it be a
>> problem if we have a Cassandra cluster, where in a couple of nodes have
>> double the system configuration than other nodes in the cluster.
>>
>> 

Re: Query on Cassandra clusters

2016-12-21 Thread Alain RODRIGUEZ
Hi Sumit,

1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
> version 3.0.3 and then newer 5 nodes have 3.6.0 version.


I strongly recommend to:


   - Stick with one version of Apache Cassandra per cluster.
   - Always be as close as possible from the last minor release of the
   Cassandra version in use.


So you *really should* not be using 3.0.6 *AND* 3.6.0 but rather 3.0.10 *OR*
3.7 (currently). Note that Cassandra 3.X (with X > 0) uses a tic toc
release cycle where odd are bug fixes only and even numbers introduce new
features as well.

Running multiple version for a long period can induces errors, Cassandra is
built to handle multiple versions only to give the time to operators to run
a rolling restart. No streaming (adding / removing / repairing nodes)
should happen during this period. Also, I have seen in the past some cases
where changing the schema was also an issue with multiple versions leading
to schema disagreements.

Due to this scenario, a couple boxes are running very high on memory (95%
> usage) whereas some of the older version nodes have just 60-70% memory
> usage.


Hard to say if this is related to the mutiple versions of Cassandra but it
could. Are you sure nodes are using the same JVM / GC options
(cassandra-env.sh) and Java version?

Also, what is exactly "high on memory 95%"? Are we talking about heap or
Native memory. Isn't the memory used as page cache (that would still be
available for the system)?

2. To counter #1, I am planning to upgrade system configuration of the
> nodes where there is higher memory usage. But the question is, will it be a
> problem if we have a Cassandra cluster, where in a couple of nodes have
> double the system configuration than other nodes in the cluster.
>

It is not a problem per se to have distinct configurations on distinct
nodes. Cassandra does it very well, and it is frequently used to test some
configuration change on a canary node, to prevent it from impacting the
whole service.

Yet, all the nodes should be doing the same work (unless you have some
heterogenous hardware and are using distinct number of vnodes on each
node). Keeping things homogenous allows the operator to easily compare how
nodes are doing and it makes reasoning about Cassandra, as well as
troubleshooting issues a way easier.

So I would:

- Fully upgrade / downgrade asap to a chosen version (3.X is known as being
not yet stable, but going back to 3.0.X might be more painful)
- Make sure nodes are well balanced and using the same number of ranges
'nodetool status '
- Make sure the node are using the same Java version and JVM settings.

Hope that helps,

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-12-21 8:22 GMT+01:00 Sumit Anvekar :

> I have a couple questions.
>
> 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
> version 3.0.3 and then newer 5 nodes have 3.6.0 version. I has been running
> fine until recently I am seeing higher amount of data residing in newer
> boxes. The configuration file (YAML file) is exactly same on all nodes
> (except for the node host names). Wondering if the version has something to
> do with this scenario. Due to this scenario, a couple boxes are running
> very high on memory (95% usage) whereas some of the older version nodes
> have just 60-70% memory usage.
>
> 2. To counter #1, I am planning to upgrade system configuration of the
> nodes where there is higher memory usage. But the question is, will it be a
> problem if we have a Cassandra cluster, where in a couple of nodes have
> double the system configuration than other nodes in the cluster.
>
> Appreciate any comment on the same.
>
> Sumit.
>


Query on Cassandra clusters

2016-12-20 Thread Sumit Anvekar
I have a couple questions.

1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
version 3.0.3 and then newer 5 nodes have 3.6.0 version. I has been running
fine until recently I am seeing higher amount of data residing in newer
boxes. The configuration file (YAML file) is exactly same on all nodes
(except for the node host names). Wondering if the version has something to
do with this scenario. Due to this scenario, a couple boxes are running
very high on memory (95% usage) whereas some of the older version nodes
have just 60-70% memory usage.

2. To counter #1, I am planning to upgrade system configuration of the
nodes where there is higher memory usage. But the question is, will it be a
problem if we have a Cassandra cluster, where in a couple of nodes have
double the system configuration than other nodes in the cluster.

Appreciate any comment on the same.

Sumit.