Re: repair performance

Thakrar, Jayesh Mon, 20 Mar 2017 09:12:27 -0700

Another thing.
Based on what I see in our system especially when I was changing from STCS to 
LCS compaction strategy,
compaction does cause quite a bit of memory churn and it helps to increase heap 
memory to certain extent.
You can see heap sizes using nodetool info to gauge your usage and hwm.
Enabling gc logging helps as well to see the impact.

From: Roland Otta <roland.o...@willhaben.at>
Date: Monday, March 20, 2017 at 1:53 AM
To: Conversant <jthak...@conversantmedia.com>, "user@cassandra.apache.org" 
<user@cassandra.apache.org>
Subject: Re: repair performance

good point! i did not (so far) i will do that - especially because i often see 
all compaction threads being used during repair (according to compactionstats).

thank you also for your link recommendations. i will go through them.

On Sat, 2017-03-18 at 16:54 +0000, Thakrar, Jayesh wrote:
You changed compaction_throughput_mb_per_sec, but did you also increase 
concurrent_compactors?

In reference to the reaper and some other info I received on the user forum to 
my question on "nodetool repair", here are some useful links/slides -

https://www.datastax.com/dev/blog/repair-in-cassandra

https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/

http://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016

http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016

From: Roland Otta <roland.o...@willhaben.at>
Date: Friday, March 17, 2017 at 5:47 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: repair performance

did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+

On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)

i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: repair performance

Reply via email to