Re: Repair taking long time

2014-09-30 Thread Ben Bromhead
use https://github.com/BrianGallew/cassandra_range_repair



On 30 September 2014 05:24, Ken Hancock ken.hanc...@schange.com wrote:


 On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:


 As an aside, you just lose with vnodes and clusters of the size. I
 presume you plan to grow over appx 9 nodes per DC, in which case you
 probably do want vnodes enabled.


 I typically only see discussion on vnodes vs. non-vnodes, but it seems to
 me that might be more important to discuss the number of vnodes per node.
 A small cluster having 256 vnodes/node is unwise given some of the
 sequential operations that are still done.  Even if operations were done in
 parallel, having a 256x increase in parallelization seems an equally bad
 choice.

 I've never seen any discussion on how many vnodes per node might be an
 appropriate answer based a planned cluster size -- does such a thing exist?

 Ken







-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359


Re: Repair taking long time

2014-09-29 Thread Robert Coli
On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com
wrote:

  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
 4 in another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.


Unfortunately, as others have mentioned, the slowness/broken-ness of repair
is a long running (groan!) issue and therefore currently expected.

At this time, I do not recommend upgrading to 2.1 in production to attempt
to fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.

Once can increase gc_grace_seconds to 34 days [1] and repair once a month,
which should help make repair slightly more tractable.

For now you should probably evaluate which of your column families you
*absolutely must* repair (because you do DELETE like operations in them,
etc.) and only repair those.

As an aside, you just lose with vnodes and clusters of the size. I
presume you plan to grow over appx 9 nodes per DC, in which case you
probably do want vnodes enabled.

One note :

  Looking at nodetool compaction stats it indicates the Validation phase
 is running that the total bytes is 4.5T (4505336278756).


This is the uncompressed size, I'm betting your actual on disk size is
closer to 2T? Even though 2.0 has improved performance for nodes with lots
of data, 2T per node is still relatively fat for a Cassandra node.


=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5850


Re: Repair taking long time

2014-09-29 Thread Rahul Neelakantan
What is the recommendation on the number of tokens value? I am asking because 
of the issue with sequential repairs on token range after token range.

Rahul Neelakantan

 On Sep 29, 2014, at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com 
 wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in 
 another.
 
  
 
 Running a repair on a large column family seems to be moving much slower 
 than I expect.
 
 
 Unfortunately, as others have mentioned, the slowness/broken-ness of repair 
 is a long running (groan!) issue and therefore currently expected. 
 
 At this time, I do not recommend upgrading to 2.1 in production to attempt to 
 fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.
 
 Once can increase gc_grace_seconds to 34 days [1] and repair once a month, 
 which should help make repair slightly more tractable.
 
 For now you should probably evaluate which of your column families you 
 *absolutely must* repair (because you do DELETE like operations in them, 
 etc.) and only repair those.
 
 As an aside, you just lose with vnodes and clusters of the size. I presume 
 you plan to grow over appx 9 nodes per DC, in which case you probably do want 
 vnodes enabled.
 
 One note :
  Looking at nodetool compaction stats it indicates the Validation phase is 
 running that the total bytes is 4.5T (4505336278756).
 
 This is the uncompressed size, I'm betting your actual on disk size is closer 
 to 2T? Even though 2.0 has improved performance for nodes with lots of data, 
 2T per node is still relatively fat for a Cassandra node.
 
 
 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-5850


Re: Repair taking long time

2014-09-29 Thread Ken Hancock
On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:


 As an aside, you just lose with vnodes and clusters of the size. I
 presume you plan to grow over appx 9 nodes per DC, in which case you
 probably do want vnodes enabled.


I typically only see discussion on vnodes vs. non-vnodes, but it seems to
me that might be more important to discuss the number of vnodes per node.
A small cluster having 256 vnodes/node is unwise given some of the
sequential operations that are still done.  Even if operations were done in
parallel, having a 256x increase in parallelization seems an equally bad
choice.

I've never seen any discussion on how many vnodes per node might be an
appropriate answer based a planned cluster size -- does such a thing exist?

Ken


Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Repair taking long time

2014-09-26 Thread Brice Dutheil
Unfortunately DSE 4.5.0 is still on 2.0.x

-- Brice

On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.

 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
 4 in
  another.
 
 
 
  Running a repair on a large column family seems to be moving much slower
  than I expect.
 
 
 
  Looking at nodetool compaction stats it indicates the Validation phase is
  running that the total bytes is 4.5T (4505336278756).
 
 
 
  This is a very large CF. The process has been running for 2.5 hours and
 has
  processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
  rate it will take 158 hours, just shy of 1 week.
 
 
 
  Is this reasonable? This is my first large repair and I am wondering if
 this
  is normal for a CF of this size. Seems like a long time to me.
 
 
 
  Is it possible to tune this process to speed it up? Is there something
 in my
  configuration that could be causing this slow performance? I am running
  HDDs, not SSDs in a JBOD configuration.
 
 
 
 
 
 
 
  Gene Robichaux
 
  Manager, Database Operations
 
  Match.com
 
  8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
 
  Phone: 214-576-3273
 
 



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



Re: Repair taking long time

2014-09-26 Thread Bryan Talbot
With a 4.5 TB table and just 4 nodes, repair will likely take forever for
any version.

-Bryan


On Fri, Sep 26, 2014 at 10:40 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.

 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
 4 in
  another.
 
 
 
  Running a repair on a large column family seems to be moving much slower
  than I expect.
 
 
 
  Looking at nodetool compaction stats it indicates the Validation phase is
  running that the total bytes is 4.5T (4505336278756).
 
 
 
  This is a very large CF. The process has been running for 2.5 hours and
 has
  processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
  rate it will take 158 hours, just shy of 1 week.
 
 
 
  Is this reasonable? This is my first large repair and I am wondering if
 this
  is normal for a CF of this size. Seems like a long time to me.
 
 
 
  Is it possible to tune this process to speed it up? Is there something
 in my
  configuration that could be causing this slow performance? I am running
  HDDs, not SSDs in a JBOD configuration.
 
 
 
 
 
 
 
  Gene Robichaux
 
  Manager, Database Operations
 
  Match.com
 
  8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
 
  Phone: 214-576-3273
 
 



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



RE: Repair taking long time

2014-09-26 Thread Gene Robichaux
I am on DSE 4.0.3 which is 2.0.7.

If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..

The bad thing is that table is not our largest….. :(


Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

From: Brice Dutheil [mailto:brice.duth...@gmail.com]
Sent: Friday, September 26, 2014 12:47 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

Unfortunately DSE 4.5.0 is still on 2.0.x

-- Brice

On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad 
j...@jonhaddad.commailto:j...@jonhaddad.com wrote:
Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
gene.robich...@match.commailto:gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade



Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
If you're using DSE you might want to contact Datastax support, rather
than the ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux
gene.robich...@match.com wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4
 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and
 has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if
 this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in
 my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


RE: Repair taking long time

2014-09-26 Thread Gene Robichaux
Using their community edition..no support (yet!) :(

Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

-Original Message-
From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of 
Jonathan Haddad
Sent: Friday, September 26, 2014 12:58 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

If you're using DSE you might want to contact Datastax support, rather than the 
ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux gene.robich...@match.com 
wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux 
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC 
 and 4 in another.



 Running a repair on a large column family seems to be moving much 
 slower than I expect.



 Looking at nodetool compaction stats it indicates the Validation 
 phase is running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours 
 and has processed 71G (71950433062). That rate is about 28.4 GB per 
 hour. At this rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering 
 if this is normal for a CF of this size. Seems like a long time to 
 me.



 Is it possible to tune this process to speed it up? Is there 
 something in my configuration that could be causing this slow 
 performance? I am running HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
Well, in that case, you may want to roll your own script for doing
constant repairs of your cluster, and extend your gc grace seconds so
you can repair the whole cluster before the tombstones are cleared.

On Fri, Sep 26, 2014 at 11:15 AM, Gene Robichaux
gene.robich...@match.com wrote:
 Using their community edition..no support (yet!) :(

 Gene Robichaux
 Manager, Database Operations
 Match.com
 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
 Phone: 214-576-3273

 -Original Message-
 From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf 
 Of Jonathan Haddad
 Sent: Friday, September 26, 2014 12:58 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time

 If you're using DSE you might want to contact Datastax support, rather than 
 the ML.

 On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux gene.robich...@match.com 
 wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC
 and 4 in another.



 Running a repair on a large column family seems to be moving much
 slower than I expect.



 Looking at nodetool compaction stats it indicates the Validation
 phase is running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours
 and has processed 71G (71950433062). That rate is about 28.4 GB per
 hour. At this rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering
 if this is normal for a CF of this size. Seems like a long time to
 me.



 Is it possible to tune this process to speed it up? Is there
 something in my configuration that could be causing this slow
 performance? I am running HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade