subject:"Re\: repair performance"

Re: repair performance

2017-03-20 Thread daemeon reiydelle

I would zero in on network throughput, especially interrack trunks


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 17, 2017 2:07 PM, "Roland Otta"  wrote:

> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland

Re: repair performance

2017-03-20 Thread Roland Otta

good point! i did not (so far) i will do that - especially because i often see 
all compaction threads being used during repair (according to compactionstats).

thank you also for your link recommendations. i will go through them.



On Sat, 2017-03-18 at 16:54 +, Thakrar, Jayesh wrote:
You changed compaction_throughput_mb_per_sec, but did you also increase 
concurrent_compactors?

In reference to the reaper and some other info I received on the user forum to 
my question on "nodetool repair", here are some useful links/slides -



https://www.datastax.com/dev/blog/repair-in-cassandra



https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/



http://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016



http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016




From: Roland Otta <roland.o...@willhaben.at>
Date: Friday, March 17, 2017 at 5:47 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: repair performance

did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: repair performance

2017-03-18 Thread Thakrar, Jayesh

You changed compaction_throughput_mb_per_sec, but did you also increase 
concurrent_compactors?

In reference to the reaper and some other info I received on the user forum to 
my question on "nodetool repair", here are some useful links/slides -



https://www.datastax.com/dev/blog/repair-in-cassandra



https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/



http://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016



http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016




From: Roland Otta <roland.o...@willhaben.at>
Date: Friday, March 17, 2017 at 5:47 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: repair performance

did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: repair performance

2017-03-17 Thread Roland Otta

did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
>:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+

On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)

i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: repair performance

2017-03-17 Thread benjamin roth

The fork from thelastpickle is. I'd recommend to give it a try over pure
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta :

> forgot to mention the version we are using:
>
> we are using 3.0.7 - so i guess we should have incremental repairs by
> default.
> it also prints out incremental:true when starting a repair
> INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 -
> Starting repair command #7, repairing keyspace xxx with repair options
> (parallelism: parallel, primary range: false, incremental: true, job
> threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of
> ranges: 1758)
>
> 3.0.7 is also the reason why we are not using reaper ... as far as i could
> figure out it's not compatible with 3.0+
>
>
>
> On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
>
> It depends a lot ...
>
> - Repairs can be very slow, yes! (And unreliable, due to timeouts,
> outages, whatever)
> - You can use incremental repairs to speed things up for regular repairs
> - You can use "reaper" to schedule repairs and run them sliced, automated,
> failsafe
>
> The time repairs actually may vary a lot depending on how much data has to
> be streamed or how inconsistent your cluster is.
>
> 50mbit/s is really a bit low! The actual performance depends on so many
> factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
> nodes" of the cluster.
> This is a quite individual problem you have to track down individually.
>
> 2017-03-17 22:07 GMT+01:00 Roland Otta :
>
> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland
>
>
>

Re: repair performance

2017-03-17 Thread Roland Otta

... maybe i should just try increasing the job threads with --job-threads

shame on me

On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: repair performance

2017-03-17 Thread Roland Otta

forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: repair performance

2017-03-17 Thread benjamin roth

It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages,
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated,
failsafe

The time repairs actually may vary a lot depending on how much data has to
be streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many
factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
nodes" of the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta :

> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland

Re: repair performance

Re: repair performance

Re: repair performance

Re: repair performance

Re: repair performance

Re: repair performance

Re: repair performance

Re: repair performance

8 matches

Site Navigation

Mail list logo

Footer information