Re: Question about replica and replication factor

2016-09-19 Thread Ben Slater
“replica” here means “a node that has a copy of the data for a given
partition”. The scenario being discussed hear is CL > 1. In this case,
rather than using up network and processing capacity sending all the data
from all the nodes required to meet the consistency level, Cassandra gets
the full data from one replica and  checksums from the others. Only if the
checksums don’t match the full data does Cassandra need to get full data
from all the relevant replicas.

I think the other point here is, conceptually, you should think of the
coordinator as splitting up any query that hits multiple partitions into a
set of queries, one per partition (there might be some optimisations that
make this not quite physically correct but conceptually it’s about right).
Discussion such as the one you quote above tend to be considering a single
partition read (which is the most common kind of read in most uses of
Cassandra).

Cheers
Ben

On Tue, 20 Sep 2016 at 15:18 Jun Wu  wrote:

>
>
> Yes, I think for my case, at least two nodes need to be contacted to get
> the full set of data.
>
> But another thing comes up about dynamic snitch. It's the wrapped snitch
> and enabled by default and it'll choose the fastest/closest node to read
> data from. Another post is about this.
>
> http://www.datastax.com/dev/blog/dynamic-snitching-in-cassandra-past-present-and-future
>
>
> The thing is why it's still emphasis only one replica to read data from.
> Below is from the post:
>
> To begin, let’s first answer the most obvious question: what is dynamic
> snitching? To understand this, we’ll first recall what a snitch does. A
> snitch’s function is to determine which datacenters and racks are both
> written to and read from. So, why would that be ‘dynamic?’ This comes into
> play on the read side only (there’s nothing to be done for writes since we
> send them all and then block to until the consistency level is achieved.)
> When doing reads however, Cassandra only asks one node for the actual data,
> and, depending on consistency level and read repair chance, it asks the
> remaining replicas for checksums only. This means that it has a choice of
> however many replicas exist to ask for the actual data, and this is where
> the dynamic snitch goes to work.
>
> Since only one replica is sending the full data we need, we need to chose
> the best possible replica to ask, since if all we get back is checksums we
> have nothing useful to return to the user. The dynamic snitch handles this
> task by monitoring the performance of reads from the various replicas and
> choosing the best one based on this history.
>
> Sent from my iPad
> On Sep 20, 2016, at 00:03, Ben Slater  wrote:
>
> If your read operation requires data from multiple partitions and the
> partitions are spread across multiple nodes then the coordinator has the
> job of contacting the multiple nodes to get the data and return to the
> client. So, in your scenario, if you did a select * from table (with no
> where clause) the coordinator would need to contact and execute a read on
> at least one other node to satisfy the query.
>
> Cheers
> Ben
>
> On Tue, 20 Sep 2016 at 14:50 Jun Wu  wrote:
>
>> Hi Ben,
>>
>> Thanks for the quick response.
>>
>> It's clear about the example for single row/partition. However,
>> normally data are not single row. Then for this case, I'm still confused.
>> http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsRead_c.html
>>
>> The link above gives an example of 10 nodes cluster with RF = 3. But
>> the figure and the words in the post shows that the coordinator only
>> contact/read data from one replica, and operate read repair for the left
>> replicas.
>>
>> Also, how could read accross all nodes in the cluster?
>>
>> Thanks!
>>
>> Jun
>>
>>
>> From: ben.sla...@instaclustr.com
>> Date: Tue, 20 Sep 2016 04:18:59 +
>> Subject: Re: Question about replica and replication factor
>> To: user@cassandra.apache.org
>>
>>
>> Each individual read (where a read is a single row or single partition)
>> will read from one node (ignoring read repairs) as each partition will be
>> contained entirely on a single node. To read the full set of data,  reads
>> would hit at least two nodes (in practice, reads would likely end up being
>> distributed across all the nodes in your cluster).
>>
>> Cheers
>> Ben
>>
>> On Tue, 20 Sep 2016 at 14:09 Jun Wu  wrote:
>>
>> Hi there,
>>
>> I have a question about the replica and replication factor.
>>
>> For example, I have a cluster of 6 nodes in the same data center.
>> Replication factor RF is set to 3  and the consistency level is default 1.
>> According to this calculator http://www.ecyrd.com/cassandracalculator/,
>> every node will store 50% of the data.
>>
>> When I want to read all data from the cluster, how many nodes should
>> I read from, 2 or 1? Is it 2, 

Re: Question about replica and replication factor

2016-09-19 Thread Jun Wu


Yes, I think for my case, at least two nodes need to be contacted to get the 
full set of data.

But another thing comes up about dynamic snitch. It's the wrapped snitch and 
enabled by default and it'll choose the fastest/closest node to read data from. 
Another post is about this.
http://www.datastax.com/dev/blog/dynamic-snitching-in-cassandra-past-present-and-future
 

The thing is why it's still emphasis only one replica to read data from. Below 
is from the post:

To begin, let’s first answer the most obvious question: what is dynamic 
snitching? To understand this, we’ll first recall what a snitch does. A 
snitch’s function is to determine which datacenters and racks are both written 
to and read from. So, why would that be ‘dynamic?’ This comes into play on the 
read side only (there’s nothing to be done for writes since we send them all 
and then block to until the consistency level is achieved.) When doing reads 
however, Cassandra only asks one node for the actual data, and, depending on 
consistency level and read repair chance, it asks the remaining replicas for 
checksums only. This means that it has a choice of however many replicas exist 
to ask for the actual data, and this is where the dynamic snitch goes to work.

Since only one replica is sending the full data we need, we need to chose the 
best possible replica to ask, since if all we get back is checksums we have 
nothing useful to return to the user. The dynamic snitch handles this task by 
monitoring the performance of reads from the various replicas and choosing the 
best one based on this history.


Sent from my iPad
> On Sep 20, 2016, at 00:03, Ben Slater  wrote:
> 
> If your read operation requires data from multiple partitions and the 
> partitions are spread across multiple nodes then the coordinator has the job 
> of contacting the multiple nodes to get the data and return to the client. 
> So, in your scenario, if you did a select * from table (with no where clause) 
> the coordinator would need to contact and execute a read on at least one 
> other node to satisfy the query.
> 
> Cheers
> Ben
> 
>> On Tue, 20 Sep 2016 at 14:50 Jun Wu  wrote:
>> Hi Ben,
>> 
>> Thanks for the quick response. 
>> 
>> It's clear about the example for single row/partition. However, normally 
>> data are not single row. Then for this case, I'm still confused. 
>> http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsRead_c.html
>> 
>> The link above gives an example of 10 nodes cluster with RF = 3. But the 
>> figure and the words in the post shows that the coordinator only 
>> contact/read data from one replica, and operate read repair for the left 
>> replicas. 
>> 
>> Also, how could read accross all nodes in the cluster? 
>> 
>> Thanks!
>> 
>> Jun
>> 
>> 
>> From: ben.sla...@instaclustr.com
>> Date: Tue, 20 Sep 2016 04:18:59 +
>> Subject: Re: Question about replica and replication factor
>> To: user@cassandra.apache.org
>> 
>> 
>> Each individual read (where a read is a single row or single partition) will 
>> read from one node (ignoring read repairs) as each partition will be 
>> contained entirely on a single node. To read the full set of data,  reads 
>> would hit at least two nodes (in practice, reads would likely end up being 
>> distributed across all the nodes in your cluster).
>> 
>> Cheers
>> Ben
>> 
>> On Tue, 20 Sep 2016 at 14:09 Jun Wu  wrote:
>> Hi there,
>> 
>> I have a question about the replica and replication factor. 
>> 
>> For example, I have a cluster of 6 nodes in the same data center. 
>> Replication factor RF is set to 3  and the consistency level is default 1. 
>> According to this calculator http://www.ecyrd.com/cassandracalculator/, 
>> every node will store 50% of the data.
>> 
>> When I want to read all data from the cluster, how many nodes should I 
>> read from, 2 or 1? Is it 2, because each node has half data? But in the 
>> calculator it show 1: You are really reading from 1 node every time.
>> 
>>Any suggestions? Thanks!
>> 
>> Jun
>> -- 
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
> 
> -- 
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798


RE: Question about replica and replication factor

2016-09-19 Thread Jun Wu
Hi Ben,
Thanks for the quick response. 
It's clear about the example for single row/partition. However, normally 
data are not single row. Then for this case, I'm still confused. 
http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsRead_c.html
The link above gives an example of 10 nodes cluster with RF = 3. But the 
figure and the words in the post shows that the coordinator only contact/read 
data from one replica, and operate read repair for the left replicas. 
Also, how could read accross all nodes in the cluster? 
Thanks!
Jun


From: ben.sla...@instaclustr.com
Date: Tue, 20 Sep 2016 04:18:59 +
Subject: Re: Question about replica and replication factor
To: user@cassandra.apache.org

Each individual read (where a read is a single row or single partition) will 
read from one node (ignoring read repairs) as each partition will be contained 
entirely on a single node. To read the full set of data,  reads would hit at 
least two nodes (in practice, reads would likely end up being distributed 
across all the nodes in your cluster).
CheersBen
On Tue, 20 Sep 2016 at 14:09 Jun Wu  wrote:



Hi there,
I have a question about the replica and replication factor. 
For example, I have a cluster of 6 nodes in the same data center. 
Replication factor RF is set to 3  and the consistency level is default 1. 
According to this calculator http://www.ecyrd.com/cassandracalculator/, every 
node will store 50% of the data.
When I want to read all data from the cluster, how many nodes should I read 
from, 2 or 1? Is it 2, because each node has half data? But in the calculator 
it show 1: You are really reading from 1 node every time.
   Any suggestions? Thanks!
Jun   -- 
Ben SlaterChief Product OfficerInstaclustr: Cassandra + Spark - Managed 
| Consulting | Support+61 437 929 798 

Re: Question about replica and replication factor

2016-09-19 Thread Ben Slater
Each individual read (where a read is a single row or single partition)
will read from one node (ignoring read repairs) as each partition will be
contained entirely on a single node. To read the full set of data,  reads
would hit at least two nodes (in practice, reads would likely end up being
distributed across all the nodes in your cluster).

Cheers
Ben

On Tue, 20 Sep 2016 at 14:09 Jun Wu  wrote:

> Hi there,
>
> I have a question about the replica and replication factor.
>
> For example, I have a cluster of 6 nodes in the same data center.
> Replication factor RF is set to 3  and the consistency level is default 1.
> According to this calculator http://www.ecyrd.com/cassandracalculator/,
> every node will store 50% of the data.
>
> When I want to read all data from the cluster, how many nodes should I
> read from, 2 or 1? Is it 2, because each node has half data? But in the
> calculator it show 1: You are really reading from 1 node every time.
>
>Any suggestions? Thanks!
>
> Jun
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Question about replica and replication factor

2016-09-19 Thread Jun Wu
Hi there,
I have a question about the replica and replication factor. 
For example, I have a cluster of 6 nodes in the same data center. 
Replication factor RF is set to 3  and the consistency level is default 1. 
According to this calculator http://www.ecyrd.com/cassandracalculator/, every 
node will store 50% of the data.
When I want to read all data from the cluster, how many nodes should I read 
from, 2 or 1? Is it 2, because each node has half data? But in the calculator 
it show 1: You are really reading from 1 node every time.
   Any suggestions? Thanks!
Jun   

RE: Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Michael Laws
I put together a shell wrapper around nodetool/sstableloader that I’ve been
running for the past few years –
https://github.com/AppliedInfrastructure/cassandra-snapshot-tools

Always seemed to work well for these kinds of scenarios…  Never really had
to think about where SSTables were on the filesystem, etc.



Mike



*From:* Justin Sanciangco [mailto:jsancian...@blizzard.com]
*Sent:* Monday, September 19, 2016 6:20 PM
*To:* user@cassandra.apache.org
*Subject:* RE: Export/Importing keyspace from a different sized cluster



I am running



cqlsh 5.0.1 | Cassandra 2.1.11.969 | DSE 4.8.3 | CQL spec 3.2.1 |



Doing the below command seemed to work

sstableloader -d  



Thanks for the help!





*From:* Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com
]
*Sent:* Monday, September 19, 2016 5:49 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Export/Importing keyspace from a different sized cluster



Something like that, depending on your version (which you didn’t specify).



Note, though, that sstableloader is notoriously picky about the path to
sstables. In particular, it really really really wants a directory
structure that matches the directory structure on disk, and wants you to be
at the equivalent of the parent/data_files_directory (so if you dump your
sstables at /path/to/data/keyspace/table/, you’d want to run sstableloader
from /path/to/data/ and provide keyspace/table/ as the location).







*From: *Justin Sanciangco 
*Reply-To: *"user@cassandra.apache.org" 
*Date: *Monday, September 19, 2016 at 5:44 PM
*To: *"user@cassandra.apache.org" 
*Subject: *RE: Export/Importing keyspace from a different sized cluster



So if I rsync the the sstables say from source node 1 and source node 2 to
target node 1. Would I just run the command like this?



>From target host

sstableloader -d  



*From:* Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com
]
*Sent:* Monday, September 19, 2016 4:45 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Export/Importing keyspace from a different sized cluster



You can ship the sstables to the destination (or any other server with
Cassandra binary tools installed) via ssh/rsync and run sstableloader on
the destination cluster as well.





*From: *Justin Sanciangco 
*Reply-To: *"user@cassandra.apache.org" 
*Date: *Monday, September 19, 2016 at 2:49 PM
*To: *"user@cassandra.apache.org" 
*Subject: *Export/Importing keyspace from a different sized cluster



Hello,



Assuming I can’t get ports opened from source to target cluster to run
sstableloader, what methods can I use to load a single keyspace from one
cluster to another cluster of different size?



Appreciate the help…



Thanks,

Justin


RE: Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Justin Sanciangco
I am running

cqlsh 5.0.1 | Cassandra 2.1.11.969 | DSE 4.8.3 | CQL spec 3.2.1 |

Doing the below command seemed to work
sstableloader -d  

Thanks for the help!


From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Monday, September 19, 2016 5:49 PM
To: user@cassandra.apache.org
Subject: Re: Export/Importing keyspace from a different sized cluster

Something like that, depending on your version (which you didn’t specify).

Note, though, that sstableloader is notoriously picky about the path to 
sstables. In particular, it really really really wants a directory structure 
that matches the directory structure on disk, and wants you to be at the 
equivalent of the parent/data_files_directory (so if you dump your sstables at 
/path/to/data/keyspace/table/, you’d want to run sstableloader from 
/path/to/data/ and provide keyspace/table/ as the location).



From: Justin Sanciangco 
>
Reply-To: "user@cassandra.apache.org" 
>
Date: Monday, September 19, 2016 at 5:44 PM
To: "user@cassandra.apache.org" 
>
Subject: RE: Export/Importing keyspace from a different sized cluster

So if I rsync the the sstables say from source node 1 and source node 2 to 
target node 1. Would I just run the command like this?

From target host
sstableloader -d  

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Monday, September 19, 2016 4:45 PM
To: user@cassandra.apache.org
Subject: Re: Export/Importing keyspace from a different sized cluster

You can ship the sstables to the destination (or any other server with 
Cassandra binary tools installed) via ssh/rsync and run sstableloader on the 
destination cluster as well.


From: Justin Sanciangco 
>
Reply-To: "user@cassandra.apache.org" 
>
Date: Monday, September 19, 2016 at 2:49 PM
To: "user@cassandra.apache.org" 
>
Subject: Export/Importing keyspace from a different sized cluster

Hello,

Assuming I can’t get ports opened from source to target cluster to run 
sstableloader, what methods can I use to load a single keyspace from one 
cluster to another cluster of different size?

Appreciate the help…

Thanks,
Justin



Re: Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Jeff Jirsa
Something like that, depending on your version (which you didn’t specify).

 

Note, though, that sstableloader is notoriously picky about the path to 
sstables. In particular, it really really really wants a directory structure 
that matches the directory structure on disk, and wants you to be at the 
equivalent of the parent/data_files_directory (so if you dump your sstables at 
/path/to/data/keyspace/table/, you’d want to run sstableloader from 
/path/to/data/ and provide keyspace/table/ as the location).

 

 

 

From: Justin Sanciangco 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, September 19, 2016 at 5:44 PM
To: "user@cassandra.apache.org" 
Subject: RE: Export/Importing keyspace from a different sized cluster

 

So if I rsync the the sstables say from source node 1 and source node 2 to 
target node 1. Would I just run the command like this?

 

>From target host

sstableloader -d  

 

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] 
Sent: Monday, September 19, 2016 4:45 PM
To: user@cassandra.apache.org
Subject: Re: Export/Importing keyspace from a different sized cluster

 

You can ship the sstables to the destination (or any other server with 
Cassandra binary tools installed) via ssh/rsync and run sstableloader on the 
destination cluster as well.

 

 

From: Justin Sanciangco 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, September 19, 2016 at 2:49 PM
To: "user@cassandra.apache.org" 
Subject: Export/Importing keyspace from a different sized cluster

 

Hello,

 

Assuming I can’t get ports opened from source to target cluster to run 
sstableloader, what methods can I use to load a single keyspace from one 
cluster to another cluster of different size? 

 

Appreciate the help…

 

Thanks,

Justin

 



smime.p7s
Description: S/MIME cryptographic signature


RE: Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Justin Sanciangco
So if I rsync the the sstables say from source node 1 and source node 2 to 
target node 1. Would I just run the command like this?

From target host
sstableloader -d  

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Monday, September 19, 2016 4:45 PM
To: user@cassandra.apache.org
Subject: Re: Export/Importing keyspace from a different sized cluster

You can ship the sstables to the destination (or any other server with 
Cassandra binary tools installed) via ssh/rsync and run sstableloader on the 
destination cluster as well.


From: Justin Sanciangco 
>
Reply-To: "user@cassandra.apache.org" 
>
Date: Monday, September 19, 2016 at 2:49 PM
To: "user@cassandra.apache.org" 
>
Subject: Export/Importing keyspace from a different sized cluster

Hello,

Assuming I can’t get ports opened from source to target cluster to run 
sstableloader, what methods can I use to load a single keyspace from one 
cluster to another cluster of different size?

Appreciate the help…

Thanks,
Justin



Re: Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Jeff Jirsa
You can ship the sstables to the destination (or any other server with 
Cassandra binary tools installed) via ssh/rsync and run sstableloader on the 
destination cluster as well.

 

 

From: Justin Sanciangco 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, September 19, 2016 at 2:49 PM
To: "user@cassandra.apache.org" 
Subject: Export/Importing keyspace from a different sized cluster

 

Hello,

 

Assuming I can’t get ports opened from source to target cluster to run 
sstableloader, what methods can I use to load a single keyspace from one 
cluster to another cluster of different size? 

 

Appreciate the help…

 

Thanks,

Justin

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Ben Slater
CQLSH COPY FROM / COPY TO? There are some significant performance
improvements in recent versions:
https://issues.apache.org/jira/browse/CASSANDRA-11053

On Tue, 20 Sep 2016 at 07:49 Justin Sanciangco 
wrote:

> Hello,
>
>
>
> Assuming I can’t get ports opened from source to target cluster to run
> sstableloader, what methods can I use to load a single keyspace from one
> cluster to another cluster of different size?
>
>
>
> Appreciate the help…
>
>
>
> Thanks,
>
> Justin
>
>
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: large system hint partition

2016-09-19 Thread Graham Sanderson
The reason for large partitions is that the partition key is just the uuid of 
the target node

More recent (I think 2.2) don't have this problem since they write hints to the 
file system as per the commit log

Sadly the large partitions make things worse when you are hinting hence 
presumably under stress

Sent from my iPhone

> On Sep 16, 2016, at 6:13 PM, Nicolas Douillet  
> wrote:
> 
> Hi Erza, 
> 
> Have you a dead node in your cluster?
> Because the coordinator stores a hint about dead replicas in the local 
> system.hints when a node is dead or didn't respond to a write request.
> 
> --
> Nicolas
> 
> 
> 
>> Le sam. 17 sept. 2016 à 00:12, Ezra Stuetzel  a 
>> écrit :
>> What would be the likely causes of large system hint partitions? Normally 
>> large partition warnings are for user defined tables which they are writing 
>> large partitions to. In this case, it appears C* is writing large partitions 
>> to the system.hints table. Gossip is not backed up.
>> 
>> version: C* 2.2.7
>> WARN  [MemtableFlushWriter:134] 2016-09-16 04:27:39,220 
>> BigTableWriter.java:184 - Writing large partition 
>> system/hints:7ce838aa-f30f-494a-8caa-d44d1440e48b (128181097 bytes)
>> 
>> 
>> 
>> Thanks,
>> 
>> Ezra


Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Justin Sanciangco
Hello,

Assuming I can't get ports opened from source to target cluster to run 
sstableloader, what methods can I use to load a single keyspace from one 
cluster to another cluster of different size?

Appreciate the help...

Thanks,
Justin



Re: How does Local quorum consistency work ?? response from fastest node?

2016-09-19 Thread Nicolas Douillet
Hi Pranay,

I'll try to answer the more precisely as I can.

Note that what I'm going to explain is valid only for reads, write requests
work differently.
I'm assuming you have only one DC.

   1. The coordinator gets a list of sorted live replicas. Replicas are
   sorted by proximity.
   (I'm not sure enough how it works to explain it here, by snitch I guess).

   2. By default *the coordinator keeps the exact list of nodes necessary*
   to ensure the desired consistency (2 nodes for RF=3),
   but, according the read repair chance provided on each column family
   (10% of the requests by default), *it might keep all the replicas* (if
   one DC).

   3. The coordinator checks if enough nodes are alive before trying any
   request. If not, no need to go further.
   You'll have a slightly different error message :

*Live nodes  do not satisfy ConsistencyLevel (2 required) *
   4. And in substance the coordinator waits for the exact number of
   responses to achieve the consistency.
   To be more specific, the coordinator is not requesting the same to each
   involved replicas (to one or two, the closest, a full data read, and for
   the others only a digest), and is waiting for the exact number of responses
   to achieve the consistency with at least one full data present.
   (There is of course more to explain, if the digests do not match for
   example ...)

   So you're right when you talk about the fastest responses, but only
   under certain conditions and if additional replicas are requested.


I'm certainly missing some points.
Is that clear enough?

--
Nicolas



Le lun. 19 sept. 2016 à 22:16, Pranay akula  a
écrit :

>
>
> i always have this doubt when a cassandra node got a read request for
> local quorum consistency does coordinator node asks all nodes with replicas
> in that DC for response or just the fastest responding nodes to it who's
> count satisfy the local quorum.
>
> In this case RF is 3 Cassandra timeout during read query at consistency
> LOCAL_QUORUM (2 responses were required but only 1 replica responded)
> does this mean coordinator asked only two replicas with fastest response
> for data and 1 out of 2 timed out  or  coordinator asked all nodes with
> replicas which means all three (3)  and 2 out of 3 timed out as i only got
> single response back.
>
>
>
> Thanks
>
> Pranay
>


Re: Problems with schema creation

2016-09-19 Thread Cody Yancey
Hi Josh,
I too have had this issue on several clusters I manage, particularly when
making schema changes. The worst part is, those nodes don't restart, and
the tables can't be dropped. Basically you have to rebuild your whole
cluster which often takes down time. Others have seen this on 3.0.x and it
has been documented here:

https://issues.apache.org/jira/browse/CASSANDRA-12131

If you have a good repro case I'm sure that would be a great help towards
helping this bug get some much needed attention.

Thanks,
Cody

On Mon, Sep 19, 2016 at 1:22 PM, Josh Smith 
wrote:

> I have an automated tool we created which will create a keyspace, its
> tables, and add indexes in solr.  But when I run the tool even for a new
> keyspace I end up getting ghost tables with the name “”.  If I look in
> system_schema.tables I see a bunch of tables all named
> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00). Am I
> creating the tables and schema too fast or is something else wrong? Has
> anyone else run into this problem before? I have searched the mailing list
> and google but I have not found anything.  I am running DSE 5.0 (C*3.0.2)
> on m4.4xl 5 nodes currently.  Any help would be appreciated.
>
>
>
> Josh Smith
>


How does Local quorum consistency work ?? response from fastest node?

2016-09-19 Thread Pranay akula
i always have this doubt when a cassandra node got a read request for local
quorum consistency does coordinator node asks all nodes with replicas in
that DC for response or just the fastest responding nodes to it who's count
satisfy the local quorum.

In this case RF is 3 Cassandra timeout during read query at consistency
LOCAL_QUORUM (2 responses were required but only 1 replica responded)
does this mean coordinator asked only two replicas with fastest response
for data and 1 out of 2 timed out  or  coordinator asked all nodes with
replicas which means all three (3)  and 2 out of 3 timed out as i only got
single response back.



Thanks

Pranay


Problems with schema creation

2016-09-19 Thread Josh Smith
I have an automated tool we created which will create a keyspace, its tables, 
and add indexes in solr.  But when I run the tool even for a new keyspace I end 
up getting ghost tables with the name “”.  If I look in system_schema.tables I 
see a bunch of tables all named 
(\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00). Am I 
creating the tables and schema too fast or is something else wrong? Has anyone 
else run into this problem before? I have searched the mailing list and google 
but I have not found anything.  I am running DSE 5.0 (C*3.0.2) on m4.4xl 5 
nodes currently.  Any help would be appreciated.

Josh Smith


Re: How many vnodes should I use for each node in my cluster?

2016-09-19 Thread Li, Guangxing
Thanks for the input. I just kicked off another repair for one keyspace.
Per the log, there are 1536 ranges to repair through. This makes sense:
there are 6 nodes in the cluster, each having 256 token ranges, so 6*256 =
1536. So far, it is averaging 1 range per minute. So repair the keyspace
will take more than a day on this rate. I guess the only thing I can do is
to upgrade to 2.1 and start using incremental repair?

Thanks.

George.

On Fri, Sep 16, 2016 at 3:03 PM, Dor Laor  wrote:

> On Fri, Sep 16, 2016 at 11:29 AM, Li, Guangxing 
> wrote:
>
>> Hi,
>>
>> I have a 3 nodes cluster, each with less than 200 GB data. Currently all
>> nodes have the default 256 value for num_tokens. My colleague told me that
>> with the data size I have (less than 200 GB on each node), I should change
>> num_tokens to something like 32 to get better performance, especially speed
>> up the repair time. Do any of you guys have experience on
>>
>
> It's not enough to know the volume size, it's important to know the amount
> of keys which effect the merkle tree. I wouldn't change it, I doubt you'll
> see a significant difference in repair speed and if you'll grow the cluster
> you would want to have enough vnodes.
>
>
>> this? I am running Cassandra Community version 2.0.9. The cluster resides
>> in AWS. All keyspaces have RC 3.
>>
>> Thanks.
>>
>> George.
>>
>
>


Re: High load on few nodes in a DC.

2016-09-19 Thread Pranay akula
I was able to see most used partitions but the nodes with less load are
serving more read and write requests for that particular partitions when
compared to nodes with high load, how can i find if these nodes are serving
as co-coordinators for those read and write requests ?? how can i find the
token range for these particular partitions and which node is the primary
for these partition ??


Thanks

On Mon, Sep 19, 2016 at 11:04 AM, Pranay akula 
wrote:

> Hai Jeff,
>
> Thank, we are using RF 3 and cassandra version 2.1.8.
>
> Thanks
> Pranay.
>
> On Mon, Sep 19, 2016 at 10:55 AM, Jeff Jirsa 
> wrote:
>
>> Is your replication_factor 2? Or is it 3?  What version are you using?
>>
>>
>>
>> The most likely answer is some individual partition that’s either being
>> written/read more than others, or is somehow impacting the cluster (wide
>> rows are a natural candidate).
>>
>>
>>
>> You don’t mention your version, but most modern versions of Cassandra
>> ship with ‘nodetool toppartitions’, which will help you identify frequently
>> written/read partitions – perhaps you can use that to identify a hotspot
>> due to some external behavior (some partition being read thousands of
>> times, over and over could certainly drive up load).
>>
>>
>>
>> -  Jeff
>>
>>
>>
>> *From: *Pranay akula 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Monday, September 19, 2016 at 7:53 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *High load on few nodes in a DC.
>>
>>
>>
>> when our cluster was under load  i am seeing  1 or 2 nodes are on more
>> load consistently when compared to others in dc i am not seeing any GC
>> pauses or wide partitions  is this can be those nodes are continuously
>> serving as coordinators ?? how can  i find what is the reason for high load
>> on those two nodes ?? We are using Vnode.
>>
>>
>>
>>
>>
>> Thanks
>>
>> Pranay.
>>
>
>


Re: How Fast Does Information Spread With Gossip?

2016-09-19 Thread Eric Evans
On Wed, Sep 14, 2016 at 1:49 PM, jerome  wrote:
> I was curious if anyone had any kind of statistics or ballpark figures on
> how long it takes information to propagate through a cluster with Gossip?
> I'm particularly interested in how fast information about the liveness of a
> node spreads. For example, in an n-node cluster the median amount of time it
> takes for all nodes to learn that a node went down is f(n) seconds. Is a
> minute a reasonable upper bound for most clusters? Too high, too low?

Dahlia Malkhi gave a talk on gossip protocols at the Papers We Love
conference last Thursday (http://pwlconf.org/dahlia-malkhi/), and she
answered this better than I ever could.  The video of her presentation
hasn't been posted yet, I'm told it should be as early as later today
though.  You can look for at on the Papers We Love Youtube channel
(https://www.youtube.com/channel/UCoj4eQh_dZR37lL78ymC6XA), and it'll
be announced on the website (http://paperswelove.org/).

-- 
Eric Evans
john.eric.ev...@gmail.com


Re: High load on few nodes in a DC.

2016-09-19 Thread Pranay akula
Hai Jeff,

Thank, we are using RF 3 and cassandra version 2.1.8.

Thanks
Pranay.

On Mon, Sep 19, 2016 at 10:55 AM, Jeff Jirsa 
wrote:

> Is your replication_factor 2? Or is it 3?  What version are you using?
>
>
>
> The most likely answer is some individual partition that’s either being
> written/read more than others, or is somehow impacting the cluster (wide
> rows are a natural candidate).
>
>
>
> You don’t mention your version, but most modern versions of Cassandra ship
> with ‘nodetool toppartitions’, which will help you identify frequently
> written/read partitions – perhaps you can use that to identify a hotspot
> due to some external behavior (some partition being read thousands of
> times, over and over could certainly drive up load).
>
>
>
> -  Jeff
>
>
>
> *From: *Pranay akula 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, September 19, 2016 at 7:53 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *High load on few nodes in a DC.
>
>
>
> when our cluster was under load  i am seeing  1 or 2 nodes are on more
> load consistently when compared to others in dc i am not seeing any GC
> pauses or wide partitions  is this can be those nodes are continuously
> serving as coordinators ?? how can  i find what is the reason for high load
> on those two nodes ?? We are using Vnode.
>
>
>
>
>
> Thanks
>
> Pranay.
>


Re: Inconsistent results with Quorum at different times

2016-09-19 Thread Alain RODRIGUEZ
Hi Jaydeep.


> Now when I read using quorum then sometimes it returns data D1 and
> sometimes it returns empty results. After tracing I found that when N1 and
> N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then
> D1 data is returned.


This is an acceptable situation (ie. a node might not have received the
delete) and the inconsistencies are not supposed to happen when reading at
quorum indeed. If tombstone is younger than the Data and you read data +
tombstone, Cassandra should return empty.

Does your tracing confirm you are actually reading using QUORUM?

N3:
> SSTable: Partition key K1 is valid and has data D1 with lower time-stamp
> T1 (T1 < T2)
>

Have you checked the timestamps using an sstable2json / sstabledump ?
Sometimes clock drifts can have this kind of weird effects and might have
produced a T1 > T2.

Can you reproduce it easily? If so, this would indeed deserve some
attention and possibly a JIRA. What is expected is the Last Write Wins
algorithm to apply (LWW), in your situation it should guarantee you
consistency as long as you have the tombstones on the 2 other nodes, so at
least for 10 days (default). After that, consistency will depend if the
tombstone made its way to the last node as well or not, in which case
Zombie data would reappear, as mentioned by Jaydeep.

I wrote a detailed blog post about tombstone and consistency issues it
might be useful. I think your understanding is correct though.

thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-09-17 1:46 GMT+02:00 Nicolas Douillet :

> Hi Jaydeep,
>
> Yes, dealing with tombstones in Cassandra is very tricky.
>
> Cassandra keeps tombstones to mark deleted columns and distribute (hinted
> handoff, full repair, read repair ...) to the other nodes that missed the
> initial remove request. But Cassandra can't afford to keep those
> tombstones lifetime and has to wipe them. The tradeoff is that after a
> time, GCGraceSeconds, configured on each column family, the tombstones are
> fully dropped during compactions and are not distributed to the other nodes
> anymore.
> If one node didn't have the chance to receive this tombstone during this
> period, and kept and old column value, then the deleted column will
> reappear.
>
> So I guess in your case that the time T2 is older than this GCGraceSeconds
> ?
>
> The best way to avoid all those phantom columns to come back from death is
> to run a full repair on your cluster at least once every GCGraceSeconds.
> Did you try this?
>
> --
> Nicolas
>
>
> Le sam. 17 sept. 2016 à 00:05, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> a écrit :
>
>> Hi,
>>
>> We have three node (N1, N2, N3) cluster (RF=3) and data in SSTable as
>> following:
>>
>> N1:
>> SSTable: Partition key K1 is marked as tombstone at time T2
>>
>> N2:
>> SSTable: Partition key K1 is marked as tombstone at time T2
>>
>> N3:
>> SSTable: Partition key K1 is valid and has data D1 with lower time-stamp
>> T1 (T1 < T2)
>>
>>
>> Now when I read using quorum then sometimes it returns data D1 and
>> sometimes it returns empty results. After tracing I found that when N1 and
>> N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then
>> D1 data is returned.
>>
>> My point is when we read with Quorum then our results have to be
>> consistent, here same query give different results at different times.
>>
>> Isn't this a big problem with Cassandra @QUORUM (with tombstone)?
>>
>>
>> Thanks,
>> Jaydeep
>>
>


Re: High load on few nodes in a DC.

2016-09-19 Thread Jeff Jirsa
Is your replication_factor 2? Or is it 3?  What version are you using? 

 

The most likely answer is some individual partition that’s either being 
written/read more than others, or is somehow impacting the cluster (wide rows 
are a natural candidate).

 

You don’t mention your version, but most modern versions of Cassandra ship with 
‘nodetool toppartitions’, which will help you identify frequently written/read 
partitions – perhaps you can use that to identify a hotspot due to some 
external behavior (some partition being read thousands of times, over and over 
could certainly drive up load).

 

-  Jeff

 

From: Pranay akula 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, September 19, 2016 at 7:53 AM
To: "user@cassandra.apache.org" 
Subject: High load on few nodes in a DC.

 

when our cluster was under load  i am seeing  1 or 2 nodes are on more load 
consistently when compared to others in dc i am not seeing any GC pauses or 
wide partitions  is this can be those nodes are continuously serving as 
coordinators ?? how can  i find what is the reason for high load on those two 
nodes ?? We are using Vnode. 

 

 

Thanks

Pranay. 



smime.p7s
Description: S/MIME cryptographic signature


High load on few nodes in a DC.

2016-09-19 Thread Pranay akula
when our cluster was under load  i am seeing  1 or 2 nodes are on more load
consistently when compared to others in dc i am not seeing any GC pauses or
wide partitions  is this can be those nodes are continuously serving as
coordinators ?? how can  i find what is the reason for high load on those
two nodes ?? We are using Vnode.


Thanks
Pranay.


Re: Upgrade cassandra 2.1.14 to 3.0.7

2016-09-19 Thread Paulo Motta
> If you do not feel ready for incremental repairs, just adding the '-full'
option to your 'nodetool repair' command should be enough to continue
repairing as you currently are once using 3.0.7.

This is not entirely true on 2.2+ after CASSANDRA-7586, since
anti-compaction is always executed after full repairs, so it will be more
expensive on 2.2+ than on 2.1. So after 2.2+ it's much cheaper to run
incremental repairs since repaired/unrepaired data is already being
seggregated anyway by anti-compaction after the first full repair.

Please note that this does not apply to subrange repair, which skips
anti-compaction entirely (CASSANDRA-10422), so it has the same cost as
previously.

2016-09-19 11:02 GMT-03:00 Alain RODRIGUEZ :

> Hi Jean,
>
> Our concern is the repair, in 3.0.7 repairs inc are by default. Then it
>> means that once we do the upgrade to 3.0.7 we must follow the migration
>> process of repairs inc for all our data in order to mark the sstables as
>> repaired ?
>
>
> If you do not feel ready for incremental repairs, just adding the '-full'
> option to your 'nodetool repair' command should be enough to continue
> repairing as you currently are once using 3.0.7. 'nodetool repair -inc'
> becomes the default 'nodetool repair' indeed, but you're not forced to use
> incremental repair because you will be using 3.0.7, that's why '-full'
> option was added. You did the hardest part in noticing this change and
> start wondering about it.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-09-12 13:55 GMT+02:00 Paulo Motta :
>
>> Migration procedure is no longer required for incremental repair as of
>> 2.1.4 since CASSANDRA-8004, which was the reason why the migration
>> procedure was required for LCS before. The migration procedure is only
>> useful now to skip validation on already repaired sstables in the first
>> incremental repair run by marking them as repaired before running the first
>> incremental repair, otherwise anti-compaction will mark them as repaired in
>> the first run since CASSANDRA-7586 on 2.2+.
>>
>> 2016-09-06 5:26 GMT-03:00 Jean Carlo :
>>
>>> Hello guys
>>>
>>> We are planing to upgrade cassandra soon to the version 3.0.7 from
>>> 2.1.14. Our concern is the repair, in 3.0.7 repairs inc are by default.
>>> Then it means that once we do the upgrade to 3.0.7 we must follow the
>>> migration process of repairs inc for all our data in order to mark the
>>> sstables as repaired ? or we can just run directly the repair command
>>> without need to mark the sstables previously?
>>>
>>> My first test with ccm tells me that we don't need to mark the sstables
>>> because the repair in the 3.0.7 do it for you, but I want to ask if someone
>>> has done this migration and confirm my assumption
>>>
>>> Best regards.
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>
>>
>


Re: Upgrade cassandra 2.1.14 to 3.0.7

2016-09-19 Thread Alain RODRIGUEZ
Hi Jean,

Our concern is the repair, in 3.0.7 repairs inc are by default. Then it
> means that once we do the upgrade to 3.0.7 we must follow the migration
> process of repairs inc for all our data in order to mark the sstables as
> repaired ?


If you do not feel ready for incremental repairs, just adding the '-full'
option to your 'nodetool repair' command should be enough to continue
repairing as you currently are once using 3.0.7. 'nodetool repair -inc'
becomes the default 'nodetool repair' indeed, but you're not forced to use
incremental repair because you will be using 3.0.7, that's why '-full'
option was added. You did the hardest part in noticing this change and
start wondering about it.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-12 13:55 GMT+02:00 Paulo Motta :

> Migration procedure is no longer required for incremental repair as of
> 2.1.4 since CASSANDRA-8004, which was the reason why the migration
> procedure was required for LCS before. The migration procedure is only
> useful now to skip validation on already repaired sstables in the first
> incremental repair run by marking them as repaired before running the first
> incremental repair, otherwise anti-compaction will mark them as repaired in
> the first run since CASSANDRA-7586 on 2.2+.
>
> 2016-09-06 5:26 GMT-03:00 Jean Carlo :
>
>> Hello guys
>>
>> We are planing to upgrade cassandra soon to the version 3.0.7 from
>> 2.1.14. Our concern is the repair, in 3.0.7 repairs inc are by default.
>> Then it means that once we do the upgrade to 3.0.7 we must follow the
>> migration process of repairs inc for all our data in order to mark the
>> sstables as repaired ? or we can just run directly the repair command
>> without need to mark the sstables previously?
>>
>> My first test with ccm tells me that we don't need to mark the sstables
>> because the repair in the 3.0.7 do it for you, but I want to ask if someone
>> has done this migration and confirm my assumption
>>
>> Best regards.
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>
>


Re: A question to sstable2json

2016-09-19 Thread Alain RODRIGUEZ
Hi,

Have you solved this issue? Sorry we did not answered you earlier.

Is it a bug of Cassandra 2.1.11


Not that I am aware of.

or I misused this command?


What does your command look like? I could try it locally if that helps.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-09-08 5:02 GMT+02:00 Lu, Boying :

> Hi, All,
>
>
>
> We use Cassandra 2.1.11 in our product and I tried its sstable2json to
> dump some sstable file like this:
>
> sstable2json full-path-to-sstable-file (e.g. xxx-Data.db).
>
>
>
> But I got an assert error at  “assert initialized ||
> keyspaceName.equals(SYSTEM_KS);” (Keyspace.java:97).
>
> The 'keyspaceName' is our keyspace, but the SYSTEM_KS is "system" (defined
> inside Keyspace class).
>
>
>
> This error is related to the following statement in SSTableExport.java:
>
> Keyspace keyspace = Keyspace.open(descriptor.ksname); (
> SSTableExport.java:432)
>
>
>
> Adding "Keyspace.setInitialized()" before this statement solves the issue.
>
>
>
> Is it a bug of Cassandra 2.1.11 or I misused this command?
>
>
>
> Thanks
>
>
>
> Boying
>
>
>


Partition size estimation formula in 3.0

2016-09-19 Thread Jérôme Mainaud
Hello,

Until 3.0, we had a nice formula to estimate partition size :

  sizeof(partition keys)
+ sizeof(static columns)
+ countof(rows) * sizeof(regular columns)
+ countof(rows) * countof(regular columns) * sizeof(clustering columns)
+ 8 * count(values in partition)

With the 3.0 storage engine, the size is supposed to be smaller.
And I'm looking for the new formula.

I reckon the formula to become :

  sizeof(partition keys)
+ sizeof(static columns)
+ countof(rows) * sizeof(regular columns)
+ countof(rows) * sizeof(clustering columns)
+ 8 * count(values in partition)

That is the clustering column values are no more repeated for each regular
column in the row.

Could anyone confirm me that new formula or am I missing something ?

Thank you,

-- 
Jérôme Mainaud
jer...@mainaud.com


Re: Nodetool repair

2016-09-19 Thread Alain RODRIGUEZ
Hi Lokesh,

Repair is a regular, very common and yet non trivial operations in
Cassandra. A lot of people are struggling with it.

Some good talks were done about repairs during the summit, you might want
to have a look in the Datastax youtube channel in a few days :-).
https://www.youtube.com/user/DataStaxMedia

Is there a way to know in advance the ETA of manual repair before
> triggering it
>

There is not such a thing. And it is probably because the duration of the
repair is going to depend on:

- The size of your data
- The number of vnodes
- The compaction throughput
- The streaming throughput
- The hardware available
- The load of the cluster
- ...

So the best thing to do is to benchmark it in your own environment. You can
track repairs using logs. I used something like that in the past:

for i in $(echo "SELECT columnfamily_name FROM system.schema_columns WHERE
keyspace_name = ‘my_keyspace';" | cqlsh | uniq | tail -n +4 | head -n -2);
do echo Sessions synced for $i: $(grep -i "$i is fully synced"
/var/log/cassandra/system.log* | wc -l); done

Depending on your version of Cassandra - and the path to your logs - this
might work or not, you might need to adjust it. The number of "sessions"
depends on the number of nodes and of vnodes. But the number of session
will be the same on all the tables, from all the nodes if you are using the
same number of vnodes.

So you will soon have a good idea on how long it takes to repair a table /
a keyspace and some informations about the completeness of the repairs (be
aware of the rotations in the logs and of the previous repairs logs if
using the command above).

How fast repair can go will also depend on the options and techniques you
are using:

- Subranges: https://github.com/BrianGallew/cassandra_range_repair ?
- Incremental / Full repairs ?

I believe repair performs following operations -
>
> 1) Major compaction
> 2) Exchange of merkle trees with neighbouring nodes.
>

 AFAIK, a repair doesn't trigger a major compaction, but I might be wrong
> here.


Jens is right, no major compaction in there. This is how repairs (roughly)
works. There are 2 main steps:

- Compare / exchange merkle trees (done through a VALIDATION compaction,
like a compaction, but without the write phase)
- Streaming: Any mismatch detected in the previous validation is fixed by
streaming a larger block of data (read more about that:
http://www.datastax.com/dev/blog/advanced-repair-techniques)

To monitor those operations use

- validation: nodetool compactionstats -H (Look for "VALIDATION COMPACTION"
off the top of my head)
- streaming: watch -d 'nodetool netstats -H | grep -v 100%'

You should think about what would be a good repair strategy according to
your use case and workload (run repairs by night ? Use subranges ?). Keep
in mind that "nodetool repair" is useful to reduce entropy in your cluster,
and so reducing the risk of inconsistencies. Repair also prevents deleted
data from reappearing (Zombies) as long as it is run cluster-wide within
gc_grace_seconds (per table option).

What if I kill the process in the middle?


This is safe, some parts of the data will not be repair on this node,
that's it. You can either restart the node or find the right JMX command.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-19 11:18 GMT+02:00 Jens Rantil :

> Hi Lokesh,
>
> Which version of Cassandra are you using? Which compaction strategy are
> you using?
>
> AFAIK, a repair doesn't trigger a major compaction, but I might be wrong
> here.
>
> What you could do is to run a repair for a subset of the ring (see `-st`
> and `-et` `nodetool repair` parameters). If you repair 1/1000 or the ring,
> repairing the whole ring will take ~1000 longer than your sample.
>
> Also, you might want to look at incremental repairs.
>
> If you kill the process in the middle the repair will not start again. You
> will need to reissue it.
>
> Cheers,
> Jens
>
> On Sun, Sep 18, 2016 at 2:58 PM Lokesh Shrivastava <
> lokesh.shrivast...@gmail.com> wrote:
>
>> Hi,
>>
>> I tried to run nodetool repair command on one of my keyspaces and found
>> that it took lot more time than I anticipated. Is there a way to know in
>> advance the ETA of manual repair before triggering it? I believe repair
>> performs following operations -
>>
>> 1) Major compaction
>> 2) Exchange of merkle trees with neighbouring nodes.
>>
>> Is there any other operation performed during manual repair? What if I
>> kill the process in the middle?
>>
>> Thanks.
>> Lokesh
>>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>


Re: Nodetool repair

2016-09-19 Thread Jens Rantil
Hi Lokesh,

Which version of Cassandra are you using? Which compaction strategy are you
using?

AFAIK, a repair doesn't trigger a major compaction, but I might be wrong
here.

What you could do is to run a repair for a subset of the ring (see `-st`
and `-et` `nodetool repair` parameters). If you repair 1/1000 or the ring,
repairing the whole ring will take ~1000 longer than your sample.

Also, you might want to look at incremental repairs.

If you kill the process in the middle the repair will not start again. You
will need to reissue it.

Cheers,
Jens

On Sun, Sep 18, 2016 at 2:58 PM Lokesh Shrivastava <
lokesh.shrivast...@gmail.com> wrote:

> Hi,
>
> I tried to run nodetool repair command on one of my keyspaces and found
> that it took lot more time than I anticipated. Is there a way to know in
> advance the ETA of manual repair before triggering it? I believe repair
> performs following operations -
>
> 1) Major compaction
> 2) Exchange of merkle trees with neighbouring nodes.
>
> Is there any other operation performed during manual repair? What if I
> kill the process in the middle?
>
> Thanks.
> Lokesh
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.