Re: Inconsistent behavior during read

2015-06-26 Thread Aditya Shetty

If this problem was because of data inconsistencies, it should have been
very rare. However, I am seeing this happen very often (almost 50 % of the
times). Statistically, this should be very unlikely if the number of
replication failures are small.

On Thu, Jun 25, 2015 at 11:55 PM, Tyler Hobbs wrote:

 On Thu, Jun 25, 2015 at 1:00 PM, Robert Coli wrote:

 [1] or read repair set to 100% combined with a full scan of all data...
 which no one does...

 And this is only true if full scan means reading every partition
 individually.  Reads of partition ranges (or a range slice, in old Thrift
 terms) don't do read repair.

 Tyler Hobbs


Aditya Shetty
*Lead Engineer*

*M*: +91 7022423545, *T*: 080 46603000 *EXT*: 4417


   Download Our App   [image: A]
[image: A]
[image: W]

Re: Restore Snapshots

2015-06-26 Thread Jean Tremblay
Good morning,
Alain, thank you so much. This is exactly what I needed.

 In my test I had a node which had for whatever reason the directory containing 
my data corrupted. I keep in a separate folder my snapshots.

Here are the steps I took to recover my sick node:

0) Cassandra is stopped on my sick node.
1) I wiped out my data directory. My snapshots were kept outside this directory.
2) I modified my Cassandra.yaml. I added auto_bootstrap: false .This is to make 
sure that my node does not synch with the others.
3) I restarted Cassandra. This step created a basic structure for my new data 
4) I did the command: nodetool resetlocalschema. This recreated all the folders 
for my cf.
5) I stopped Cassandra on my node.
6) I copied my snapshot in the right location. I actually hard linked them, 
this is very fast.
7) I restarted Cassandra.

That's it.

Thank you SO MUCH ALAIN for your support. You really helped me a lot.
On 25 Jun,2015, at 18:37, Alain RODRIGUEZ wrote:

Hi Jean,

Answers in line to be sure to be exhaustive:

- how can I restore the data directory structure in order to copy my snapshots 
at the right position?
-- making a script to do it and testing it I would say. basically under any 
table repo you have a snapshots/snapshot_name directory (snapshot_name is 
timestamp if not specified off the top of my head..) and then your sstables.

- is it possible to recreate the schema on one node?
-- The easiest way that come to my mind is to set auto_bootstrap: false on a 
node not already in the ring. If you have trouble with the schema of a node in 
the ring run a nodetool resetlocalschema

- how can I avoid the node from streaming from the other nodes?
-- See above (auto_bootstrap: false). BTW, option might not be present at all, 
just add it.

- must I also have the snapshot of the system tables in order to restore a node 
from only the snapshot of my tables?
-- just you user table. Yet remember that snapshot is per node and as such you 
will just have part of the data this node use to hold. meaning that if the new 
node have different tokens, there will be unused data + missing data for sure.

Basically when a node is down I use to remove it, repair the cluster, and 
bootstap it (auto_bootstrap: true). Streams are part of Cassandra. I accept 
that. An other solution would be to replace the node --



2015-06-25 17:07 GMT+02:00 Jean Tremblay

I am testing snapshot restore procedures in case of a major catastrophe on our 
cluster. I'm using Cassandra 2.1.7 with RF:3

The scenario that I am trying to solve is how to quickly get one node back to 
work after its disk failed and lost all its data assuming that the only thing I 
have is its snapshots.

The procedure that I'm following is the one explained here:

I can do a snapshot that is straight forward.
My problem is in the restore of the snapshot.

If I restart Cassandra with an empty data directory the node will bootstrap.
Bootstrap is very nice, since it recreate the schema and reload the data from 
its neighbour.
But this is quite heavy traffic and quite a slow process.

My questions are:

- how can I restore the data directory structure in order to copy my snapshots 
at the right position?
- is it possible to recreate the schema on one node?
- how can I avoid the node from streaming from the other nodes?
- must I also have the snapshot of the system tables in order to restore a node 
from only the snapshot of my tables?

Thanks for your comments.


Re: Restore Snapshots

2015-06-26 Thread Alain RODRIGUEZ
Hi Jean,

Glad to hear it worked this way.

Some other people provided (and continue providing) similar help to me,
just trying to give back to the community as much as I received from it.

See you around.


2015-06-26 8:44 GMT+02:00 Jean Tremblay

  Good morning,
 Alain, thank you so much. This is exactly what I needed.

   In my test I had a node which had for whatever reason the directory
 containing my data corrupted. I keep in a separate folder my snapshots.

  Here are the steps I took to recover my sick node:

  0) Cassandra is stopped on my sick node.
 1) I wiped out my data directory. My snapshots were kept outside this
 2) I modified my Cassandra.yaml. I added auto_bootstrap: false .This is to
 make sure that my node does not synch with the others.
 3) I restarted Cassandra. This step created a basic structure for my new
 data directory.
 4) I did the command: nodetool resetlocalschema. This recreated all the
 folders for my cf.
 5) I stopped Cassandra on my node.
 6) I copied my snapshot in the right location. I actually hard linked
 them, this is very fast.
 7) I restarted Cassandra.

  That's it.

  Thank you SO MUCH ALAIN for your support. You really helped me a lot.

 On 25 Jun,2015, at 18:37, Alain RODRIGUEZ wrote:

   Hi Jean,

  Answers in line to be sure to be exhaustive:

  - how can I restore the data directory structure in order to copy my
 snapshots at the right position?
 -- making a script to do it and testing it I would say. basically under
 any table repo you have a snapshots/snapshot_name directory (snapshot_name
 is timestamp if not specified off the top of my head..) and then your

  - is it possible to recreate the schema on one node?
 -- The easiest way that come to my mind is to set auto_bootstrap: false
 on a node not already in the ring. If you have trouble with the schema of a
 node in the ring run a nodetool resetlocalschema

  - how can I avoid the node from streaming from the other nodes?
 -- See above (auto_bootstrap: false). BTW, option might not be present
 at all, just add it.

  - must I also have the snapshot of the system tables in order to restore
 a node from only the snapshot of my tables?
 -- just you user table. Yet remember that snapshot is per node and as
 such you will just have part of the data this node use to hold. meaning
 that if the new node have different tokens, there will be unused data +
 missing data for sure.

  Basically when a node is down I use to remove it, repair the cluster,
 and bootstap it (auto_bootstrap: true). Streams are part of Cassandra. I
 accept that. An other solution would be to replace the node --



 2015-06-25 17:07 GMT+02:00 Jean Tremblay


  I am testing snapshot restore procedures in case of a major catastrophe
 on our cluster. I’m using Cassandra 2.1.7 with RF:3

  The scenario that I am trying to solve is how to quickly get one node
 back to work after its disk failed and lost all its data assuming that the
 only thing I have is its snapshots.

  The procedure that I’m following is the one explained here:

  I can do a snapshot that is straight forward.
 My problem is in the restore of the snapshot.

  If I restart Cassandra with an empty data directory the node will
 Bootstrap is very nice, since it recreate the schema and reload the data
 from its neighbour.
 But this is quite heavy traffic and quite a slow process.

  My questions are:

  - how can I restore the data directory structure in order to copy my
 snapshots at the right position?
 - is it possible to recreate the schema on one node?
 - how can I avoid the node from streaming from the other nodes?
 - must I also have the snapshot of the system tables in order to restore
 a node from only the snapshot of my tables?

  Thanks for your comments.


Cassandra stuck at DataSink running on cluster

2015-06-26 Thread Susanne Bülow


I am trying to write into Cassandra via the CqlBulkOutputFormat from an
apache flink program. The program succeeds to write into a cassandra-cluster
while the program is running locally on my pc.

However, when trying to run the program on the cluster, it seems to get
stuck at SSTableSimpleUnsortedWriter.put() waiting for the
Diskwriter-Thread, that is not running anymore.


I am using cassandra version 1.5 and apache flink version 0.9.0.


Attached is the full stacktrace.


Thanks in advance,


2015-06-26 11:15:35
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode):

JMX server connection timeout 68 - Thread t@68
   java.lang.Thread.State: TIMED_WAITING
at java.lang.Object.wait(Native Method)
- waiting on 117d0002 (a [I)

   Locked ownable synchronizers:
- None

RMI TCP Connection(4)- - Thread t@67
   java.lang.Thread.State: RUNNABLE
at Method)
at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(
at sun.reflect.misc.Trampoline.invoke(
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(
at sun.reflect.misc.MethodUtil.invoke(
at com.sun.jmx.mbeanserver.PerInterface.invoke(
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(
at sun.rmi.server.UnicastServerRef.dispatch(
at sun.rmi.transport.Transport$
at sun.rmi.transport.Transport$
at Method)
at sun.rmi.transport.Transport.serviceCall(

   Locked ownable synchronizers:
- locked 4f733919 (a java.util.concurrent.ThreadPoolExecutor$Worker)

RMI Scheduler(0) - Thread t@66
   java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for 258b8c46 (a 

Re: Slow reads on C* 2.0.15 using Spark Cassandra

2015-06-26 Thread Nate McCall
 We notice incredibly slow reads, 600mb in an hour, we are using quorum
LOCAL_ONE reads.
 The load_one of Cassandra increases from 1 to 60! There is no CPU wait,
only user  nice.

Without seeing the code and query, it's hard to tell, but I noticed
something similar when we had a client incorrectly using the 'take' method
for a result count like so:
val resultCount = query.take(count).length

'take' can call limit under the hood. The docs for the latter are
The limit will be applied for each created Spark partition. In other
words, unless the data are fetched from a single Cassandra partition the
number of results is unpredictable. [0]

Removing that line (it wasnt necessary for the use case) and just relying
on a simple ' got performance back
to where it should be. Per the docs, limit (and therefore take) works fine
as long as the partition key is used as a predicate in the where clause
(WHERE test_id = somevalue in your example).


Nate McCall
Austin, TX

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting

Re: sstableloader Could not retrieve endpoint ranges

2015-06-26 Thread Mitch Gitman
I want to follow up on this thread to describe what I was able to get
working. My goal was to switch a cluster to vnodes, in the process
preserving the data for a single table, endpoints.endpoint_messages.
Otherwise, I could afford to start from a clean slate. As should be
apparent, I could also afford to do this within a maintenance window where
the cluster was down. In other words, I had the luxury of not having to add
a new data center to a live cluster per DataStax's documented procedure to
enable vnodes:

What I got working relies on the nodetool snapshot command to create
various SSTable snapshots under
endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME. The snapshots
represent the data being backed up and restored from. The backup and
restore is not directly, literally working against the original SSTables
directly in various endpoints/endpoint_messages/ directories.

   - endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME/: These SSTables
   are being copied off and restored from.
   - endpoints/endpoint_messages/: These SSTables are obviously the source
   of the snapshots but are not being copied off and restored from.

Instead of using sstableloader to load the snapshots into the
re-initialized Cassandra cluster, I used the JMX StorageService.bulkLoad
command after establishing a JConsole session to each node. I copied off
the snapshots to load to a directory path that ends with
endpoints/endpoint_messages/ to give the bulk-loader a path it expects. The
directory path that is the destination for nodetool snapshot and the source
for StorageService.bulkLoad is on the same host as the Cassandra node but
outside the purview of the Cassandra node.

This procedure can be summarized as follows:
1. For each node, create a snapshot of the endpoint_messages table as a
2. Stop the cluster.
3. On each node, wipe all the data, i.e. the contents of
data_files_directories, commitlog, and saved_caches.
4. Deploy the cassandra.yaml configuration that makes the switch to vnodes
and restart the cluster to apply the vnodes change.
5. Re-create the endpoints keyspace.
6. On each node, bulk-load the snapshots for that particular node.

This summary can be reduced even further:
1. On each node, export the data to preserve.
2. On each node, wipe the data.
3. On all nodes, switch to vnodes.
4. On each node, import back in the exported data.

I'm sure this process could have been streamlined.

One caveat for anyone looking to emulate this: Our situation might have
been a little easier to reason about because our original endpoint_messages
table had a replication factor of 1. We used the vnodes switch as an
opportunity to up the RF to 3.

I can only speculate as to why what I was originally attempting wasn't
working. But what I was originally attempting wasn't precisely the use case
I care about. What I'm following up with now was.

On Fri, Jun 19, 2015 at 8:22 PM, Mitch Gitman wrote:

 I checked the system.log for the Cassandra node that I did the jconsole
 JMX session against and which had the data to load. Lot of log output
 indicating that it's busy loading the files. Lot of stacktraces indicating
 a broken pipe. I have no reason to believe there are connectivity issues
 between the nodes, but verifying that is beyond my expertise. What's
 indicative is this last bit of log output:
  INFO [Streaming to /] 2015-06-19 21:20:45,441 (line 44) Successfully sent
 to /
  INFO [Streaming to /] 2015-06-19 21:20:45,457 (line 42) Streaming session to / failed
 ERROR [Streaming to /] 2015-06-19 21:20:45,458 (line 253) Exception in thread Thread[Streaming to /,5,RMI Runtime]
 java.lang.RuntimeException: Broken pipe
 Caused by: Broken pipe
 at Method)

Mixing incremental repair with sequential

2015-06-26 Thread Carl Hu
Dear colleagues,

We are using incremental repair and have noticed that every few repairs,
the cluster experiences pauses.

We run the repair with the following command: nodetool repair -par -inc

I have tried to run it not in parallel, but get the following error:
It is not possible to mix sequential repair and incremental repairs.

Does anyone have any suggestions?

Many thanks in advance,

Re: Cassandra stuck at DataSink running on cluster

2015-06-26 Thread Nathan Bijnens
I strongly disagree with recommending to use version 2.1.x. It only very
recently became more or less stable. Anything before 2.1.5 was unusable.
You might be better of with a recent 2.0.n version.

Best regards,

On Fri, Jun 26, 2015 at 3:36 PM Marcos Ortiz wrote:

  Regards, Susanne.
 Which version of Java are you using here?
 Have you tested this with more recent versions of Cassandra?

 These new version have a lot of improvements related to SSTable reading
 and writing, and much more.

 I recommend you that you should use at least a 2.1.x version.

 Marcos Ortiz, Sr. Product Manager (Data
 Infrastructure) at UCI

 On 26/06/15 08:21, Susanne Bülow wrote:


 I am trying to write into Cassandra via the CqlBulkOutputFormat from an
 apache flink program. The program succeeds to write into a
 cassandra-cluster while the program is running locally on my pc.

 However, when trying to run the program on the cluster, it seems to get
 stuck at SSTableSimpleUnsortedWriter.put() waiting for the
 Diskwriter-Thread, that is not running anymore.

 I am using cassandra version 1.5 and apache flink version 0.9.0.

 Attached is the full stacktrace.

 Thanks in advance,


AW: [MASSMAIL]Cassandra stuck at DataSink running on cluster

2015-06-26 Thread Susanne Bülow


I am using Java 7. 

The cassandra version I use is actually 2.1.5, not 1.5. Sorry for the
I also tried cassandra 2.1.6, but the problem stays the same.


Best regards,



Von: Marcos Ortiz [] 
Gesendet: Freitag, 26. Juni 2015 15:34
Betreff: Re: [MASSMAIL]Cassandra stuck at DataSink running on cluster


Regards, Susanne.
Which version of Java are you using here?
Have you tested this with more recent versions of Cassandra?

These new version have a lot of improvements related to SSTable reading and
writing, and much more.

I recommend you that you should use at least a 2.1.x version.

Marcos Ortiz , Sr. Product Manager (Data
Infrastructure) at UCI

On 26/06/15 08:21, Susanne Bülow wrote:



I am trying to write into Cassandra via the CqlBulkOutputFormat from an
apache flink program. The program succeeds to write into a cassandra-cluster
while the program is running locally on my pc.

However, when trying to run the program on the cluster, it seems to get
stuck at SSTableSimpleUnsortedWriter.put() waiting for the
Diskwriter-Thread, that is not running anymore.


I am using cassandra version 1.5 and apache flink version 0.9.0.


Attached is the full stacktrace.


Thanks in advance,





Re: Mixing incremental repair with sequential

2015-06-26 Thread Alain RODRIGUEZ
It is not possible to mix sequential repair and incremental repairs.

I guess that is a system limitation, even if I am not sure of it (I don't
have used C*2.1 yet)

I would focus on tuning your repair by :
- Monitoring performance / logs (see why the cluster hangs)
- Use range repairs (as a workaround to the Merkle tree 32K limit) or at
list run it per table (

Depending on what's the root issue that makes hang your cluster it is hard
to help you.

- If CPU is a limit, then some tuning around compactions or GC might be
needed (or a few more things)
- if you have Disk IO limitations, you might want to add machines or tune
compaction throughput
- If your network is the issue, there are commands to tune the bandwidth
used by streams.

You need to troubleshot this and give us more informations. I hope you have
a monitoring tool up and running and an easy way to detect errors on your



2015-06-26 16:26 GMT+02:00 Carl Hu

 Dear colleagues,

 We are using incremental repair and have noticed that every few repairs,
 the cluster experiences pauses.

 We run the repair with the following command: nodetool repair -par -inc

 I have tried to run it not in parallel, but get the following error:
 It is not possible to mix sequential repair and incremental repairs.

 Does anyone have any suggestions?

 Many thanks in advance,

Re: Slow reads on C* 2.0.15 using Spark Cassandra

2015-06-26 Thread Nathan Bijnens
Thanks for the suggestion, will take a look.

Our code looks like this:

val rdd = sc.cassandraTable[EventV0](keyspace, test)

val transformed ={e = EventV1(e.testId, e.ts,,
e.groups, e.event)}
transformed.saveToCassandra(keyspace, test_v1)

Not sure if this code might translate to limits.

The total date in this table is +/- 2gb on disk, total data for each node
is around 290gb.

On Fri, Jun 26, 2015 at 7:01 PM Nate McCall wrote:

  We notice incredibly slow reads, 600mb in an hour, we are using quorum
 LOCAL_ONE reads.
  The load_one of Cassandra increases from 1 to 60! There is no CPU wait,
 only user  nice.

 Without seeing the code and query, it's hard to tell, but I noticed
 something similar when we had a client incorrectly using the 'take' method
 for a result count like so:
 val resultCount = query.take(count).length

 'take' can call limit under the hood. The docs for the latter are
 The limit will be applied for each created Spark partition. In other
 words, unless the data are fetched from a single Cassandra partition the
 number of results is unpredictable. [0]

 Removing that line (it wasnt necessary for the use case) and just relying
 on a simple ' got performance back
 to where it should be. Per the docs, limit (and therefore take) works fine
 as long as the partition key is used as a predicate in the where clause
 (WHERE test_id = somevalue in your example).


 Nate McCall
 Austin, TX

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting

Re: Is it okay to use a small t2.micro instance for OpsCenter and use m3.medium instances for the actual Cassandra nodes?

2015-06-26 Thread arun sirimalla
Hi Sid,

I would recommend you to use either c3s or m3s instances for Opscenter and
for Cassandra nodes it depends on your use case.
You can go with either c3s or i2s for Cassandra nodes. But i would
recommend you to run performance tests before selecting the instance type.
If your use case requires more CPU i would recommend  c3s.

On Fri, Jun 26, 2015 at 1:20 PM, Sid Tantia

  Hello, I haven’t been able to find any documentation for best practices
 on this…is it okay to set up opscenter as a smaller node than the rest of
 the cluster.

 For instance, on AWS can I have 3 m3.medium nodes for Cassandra and 1
 t2.micro node for OpsCenter?

Senior Hadoop/Cassandra Engineer

2014 Data Impact Award Winner (Cloudera)

Slow reads on C* 2.0.15 using Spark Cassandra

2015-06-26 Thread Nathan Bijnens
We are using the Spark Cassandra driver, version 1.2.0 (Spark 1.2.1)
connecting to a 6 node bare metal (16gb ram, Xeon E3-1270 (8core), 4x 7,2k
SATA disks) Cassandra cluster. Spark runs on a separate Mesos cluster.

We are running a transformation job, where we read the complete contents of
a table into Spark, do some transformations and write them back to C*. We
are using Spark to do a data migration in C*.

Before we execute, the load on Cassandra is very little.

We notice incredibly slow reads, 600mb in an hour, we are using quorum
LOCAL_ONE reads.
The load_one of Cassandra increases from 1 to 60! There is no CPU wait,
only user  nice.

The table  cassandra.yaml:

Anyone any idea?


Re: Is it okay to use a small t2.micro instance for OpsCenter and use m3.medium instances for the actual Cassandra nodes?

2015-06-26 Thread Jonathan Haddad
It doesn't need to be the same size. It's not part of the cluster.
On Fri, Jun 26, 2015 at 1:34 PM Sid Tantia

  Hello, I haven’t been able to find any documentation for best practices
 on this…is it okay to set up opscenter as a smaller node than the rest of
 the cluster.

 For instance, on AWS can I have 3 m3.medium nodes for Cassandra and 1
 t2.micro node for OpsCenter?

Re: Mixing incremental repair with sequential

2015-06-26 Thread Carl Hu
Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
compaction threshhold from 18 to 10mb/s. Will see what happens.

  I hope you have a monitoring tool up and running and an easy way to
detect errors on your logs.

We do not have this. What do you use for this?

Thank you,

On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ

 It is not possible to mix sequential repair and incremental repairs.

 I guess that is a system limitation, even if I am not sure of it (I don't
 have used C*2.1 yet)

 I would focus on tuning your repair by :
 - Monitoring performance / logs (see why the cluster hangs)
 - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
 list run it per table (

 Depending on what's the root issue that makes hang your cluster it is hard
 to help you.

 - If CPU is a limit, then some tuning around compactions or GC might be
 needed (or a few more things)
 - if you have Disk IO limitations, you might want to add machines or tune
 compaction throughput
 - If your network is the issue, there are commands to tune the bandwidth
 used by streams.

 You need to troubleshot this and give us more informations. I hope you
 have a monitoring tool up and running and an easy way to detect errors on
 your logs.



 2015-06-26 16:26 GMT+02:00 Carl Hu

 Dear colleagues,

 We are using incremental repair and have noticed that every few repairs,
 the cluster experiences pauses.

 We run the repair with the following command: nodetool repair -par -inc

 I have tried to run it not in parallel, but get the following error:
 It is not possible to mix sequential repair and incremental repairs.

 Does anyone have any suggestions?

 Many thanks in advance,

Is it okay to use a small t2.micro instance for OpsCenter and use m3.medium instances for the actual Cassandra nodes?

2015-06-26 Thread Sid Tantia
Hello, I haven’t been able to find any documentation for best practices on 
this…is it okay to set up opscenter as a smaller node than the rest of the 

For instance, on AWS can I have 3 m3.medium nodes for Cassandra and 1 t2.micro 
node for OpsCenter?

Re: Mixing incremental repair with sequential

2015-06-26 Thread Alain RODRIGUEZ
Here is something I wrote some time ago:

Monitoring absolutely necessary to understand what is happening in the
system. There is no magic in there and if you find bottlenecks, you can
think about how to alleviate things. I would say at least as much as the
design of your data models.

I've lowered compaction threshhold from 18 to 10mb/s. Will see what
If you have no SSD and compactions are creating a bottleneck at the disk
the disk, this looks reasonable as long as the compactions pending metric
remains low enough.

If it is a cpu issue and you have many cores, I would advice you to try
lowering the concurrent_compactor: number. (by default 1 compactor per core)

Once again it will depend on were the pressure is. Anyway, you might want
to do anything you will try on one node only to test it first. Also, one
option at the time (or a couple that you believe would have a synergy), and
monitor the evolutions.



2015-06-26 21:30 GMT+02:00 Carl Hu

 Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
 compaction threshhold from 18 to 10mb/s. Will see what happens.

   I hope you have a monitoring tool up and running and an easy way to
 detect errors on your logs.

 We do not have this. What do you use for this?

 Thank you,

 On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ

 It is not possible to mix sequential repair and incremental repairs.

 I guess that is a system limitation, even if I am not sure of it (I don't
 have used C*2.1 yet)

 I would focus on tuning your repair by :
 - Monitoring performance / logs (see why the cluster hangs)
 - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
 list run it per table (

 Depending on what's the root issue that makes hang your cluster it is
 hard to help you.

 - If CPU is a limit, then some tuning around compactions or GC might be
 needed (or a few more things)
 - if you have Disk IO limitations, you might want to add machines or tune
 compaction throughput
 - If your network is the issue, there are commands to tune the bandwidth
 used by streams.

 You need to troubleshot this and give us more informations. I hope you
 have a monitoring tool up and running and an easy way to detect errors on
 your logs.



 2015-06-26 16:26 GMT+02:00 Carl Hu

 Dear colleagues,

 We are using incremental repair and have noticed that every few repairs,
 the cluster experiences pauses.

 We run the repair with the following command: nodetool repair -par -inc

 I have tried to run it not in parallel, but get the following error:
 It is not possible to mix sequential repair and incremental repairs.

 Does anyone have any suggestions?

 Many thanks in advance,

Re: Mixing incremental repair with sequential

2015-06-26 Thread Carl Hu

The reduction of compaction is having significant impact lowering response
time, especially at the 90th percentile level, for us.

For the record, we are using AWS's i2.2xl instance types (these are ssd).
We were running compaction_throughput_mb_per_sec at 18. Now we are running
at 10. Latency variation for reads is hugely reduced. This is very

Thanks, Alain.


On Fri, Jun 26, 2015 at 7:40 PM, Alain RODRIGUEZ wrote:

 Here is something I wrote some time ago:

 Monitoring absolutely necessary to understand what is happening in the
 system. There is no magic in there and if you find bottlenecks, you can
 think about how to alleviate things. I would say at least as much as the
 design of your data models.

 I've lowered compaction threshhold from 18 to 10mb/s. Will see what
 If you have no SSD and compactions are creating a bottleneck at the disk
 the disk, this looks reasonable as long as the compactions pending metric
 remains low enough.

 If it is a cpu issue and you have many cores, I would advice you to try
 lowering the concurrent_compactor: number. (by default 1 compactor per

 Once again it will depend on were the pressure is. Anyway, you might want
 to do anything you will try on one node only to test it first. Also, one
 option at the time (or a couple that you believe would have a synergy), and
 monitor the evolutions.



 2015-06-26 21:30 GMT+02:00 Carl Hu

 Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
 compaction threshhold from 18 to 10mb/s. Will see what happens.

   I hope you have a monitoring tool up and running and an easy way to
 detect errors on your logs.

 We do not have this. What do you use for this?

 Thank you,

 On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ

 It is not possible to mix sequential repair and incremental repairs.

 I guess that is a system limitation, even if I am not sure of it (I
 don't have used C*2.1 yet)

 I would focus on tuning your repair by :
 - Monitoring performance / logs (see why the cluster hangs)
 - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
 list run it per table (

 Depending on what's the root issue that makes hang your cluster it is
 hard to help you.

 - If CPU is a limit, then some tuning around compactions or GC might be
 needed (or a few more things)
 - if you have Disk IO limitations, you might want to add machines or
 tune compaction throughput
 - If your network is the issue, there are commands to tune the bandwidth
 used by streams.

 You need to troubleshot this and give us more informations. I hope you
 have a monitoring tool up and running and an easy way to detect errors on
 your logs.



 2015-06-26 16:26 GMT+02:00 Carl Hu

 Dear colleagues,

 We are using incremental repair and have noticed that every few
 repairs, the cluster experiences pauses.

 We run the repair with the following command: nodetool repair -par -inc

 I have tried to run it not in parallel, but get the following error:
 It is not possible to mix sequential repair and incremental repairs.

 Does anyone have any suggestions?

 Many thanks in advance,

Re: Is it okay to use a small t2.micro instance for OpsCenter and use m3.medium instances for the actual Cassandra nodes?

2015-06-26 Thread Robert Coli
On Fri, Jun 26, 2015 at 1:20 PM, Sid Tantia

  For instance, on AWS can I have 3 m3.medium nodes for Cassandra and 1
 t2.micro node for OpsCenter?

m3.medium is below the minimum size I would use for Cassandra doing
anything meaningful, for the record.


Re: [MASSMAIL]Cassandra stuck at DataSink running on cluster

2015-06-26 Thread Marcos Ortiz

Regards, Susanne.
Which version of Java are you using here?
Have you tested this with more recent versions of Cassandra?

These new version have a lot of improvements related to SSTable reading 
and writing, and much more.

I recommend you that you should use at least a 2.1.x version.

Marcos Ortiz, Sr. Product Manager (Data 
Infrastructure) at UCI


On 26/06/15 08:21, Susanne Bülow wrote:


I am trying to write into Cassandra via the CqlBulkOutputFormat from 
an apache flink program. The program succeeds to write into a 
cassandra-cluster while the program is running locally on my pc.

However, when trying to run the program on the cluster, it seems to 
get stuck at SSTableSimpleUnsortedWriter.put() waiting for the 
Diskwriter-Thread, that is not running anymore.

I am using cassandra version 1.5 and apache flink version 0.9.0.

Attached is the full stacktrace.

Thanks in advance,
