Re: From SimpleStrategy to DCs approach

2017-09-15 Thread kurt greaves
You can add a tiny node with 3 tokens. it will own a very small amount of
data and be responsible for replicas of that data and thus included in
quorum queries for that data. What is the use case? This won't give you any
real improvement in meeting consistency.


RE: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network

2017-09-15 Thread Mohapatra, Kishore
Hi Jeff,
  Thanks for your reply.
Infact I have tried with all the options.

  1.  We use Cassandra reaper for our repair, which does the sub range repair.
  2.  I have also developed a shell script, which exactly does the same, as 
what reaper does. But this can control, how many repair session will run 
concurrently.
  3.  Also tried with full repair.
  4.  Tried running repair in two DCs at a time. While the repair between DC1 
And DC2 goes fine, but repair between DC1 and DC3 or between DC2 and DC3 fails.

So I will try setting inter-dc stream thruput to 20Mbps and see how that goes.

Is there anything else that could be done in this case ?

Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : kishore.mohapa...@nuance.com


From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, September 15, 2017 10:27 AM
To: cassandra 
Subject: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network

Hi Kishore,

Just to make sure we're all on the same page, I presume you're doing full 
repairs using something like 'nodetool repair -pr', which repairs all data for 
a given token range across all of your hosts in all of your dcs. Is that a 
correct assumption to start?

In addition to throttling inter-dc stream throughput (which you should be able 
to set quite low - perhaps as low as 20 Mbps), you may also want to consider 
smaller ranges (using a concept we call subrange repair, where instead of using 
-pr, you pass -st and -et - which is what tools like 
http://cassandra-reaper.io/
 do ) - this will keep streams smaller (in terms of total bytes transferred per 
streaming session, though you'll have more sessions). Finally, you can use 
-host and -dc options to limit repair so that sessions don't always hit all 3 
dcs - for exactly, you could do a repair between DC1 and DC2 using -dc, then do 
a repair of DC1 and DC3 using -dc - it's a lot more coordination required, but 
likely helps cut down on the traffic over your VPN link.


On Fri, Sep 15, 2017 at 9:09 AM, Mohapatra, Kishore 
> wrote:

Hi,
   we have a cassandra cluster with 7 nodes each in 3 datacenters. We are 
using C* 2.1.15.4 version.
Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a dedicated 
one. However network pipe between DC1 and DC3 and between DC2 and DC3 is very 
poor and has only 100 MBit/s and also goes thru VPN network. Each node contains 
about 100 Gb of data and has a RF of 3. Whenever we run the repair, it fails 
with streaming errors and never completes. I have already tried the streaming 
timeout parameter to a very high value. But it did not help. I could repair 
either just in the local dc or just the first two DCs. Can not repair DC3 when 
i combine with the other two DCs.

So how can i successfully repair the keyspace in these kind of environments ?

I see that there is a parameter to throttle the inter-dc stream thruput, which 
default to 200 MBit/s. So what is the minimum threshold that i could set it to 
without affecting the cluster ?

Is there any other way to work in these kind of environments ?
I will appreciate your feedback and help on this.


Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : kishore.mohapa...@nuance.com





Re: Cassandra repair process in Low Bandwidth Network

2017-09-15 Thread Jeff Jirsa
Hi Kishore,

Just to make sure we're all on the same page, I presume you're doing full
repairs using something like 'nodetool repair -pr', which repairs all data
for a given token range across all of your hosts in all of your dcs. Is
that a correct assumption to start?

In addition to throttling inter-dc stream throughput (which you should be
able to set quite low - perhaps as low as 20 Mbps), you may also want to
consider smaller ranges (using a concept we call subrange repair, where
instead of using -pr, you pass -st and -et - which is what tools like
http://cassandra-reaper.io/ do ) - this will keep streams smaller (in terms
of total bytes transferred per streaming session, though you'll have more
sessions). Finally, you can use -host and -dc options to limit repair so
that sessions don't always hit all 3 dcs - for exactly, you could do a
repair between DC1 and DC2 using -dc, then do a repair of DC1 and DC3 using
-dc - it's a lot more coordination required, but likely helps cut down on
the traffic over your VPN link.



On Fri, Sep 15, 2017 at 9:09 AM, Mohapatra, Kishore <
kishore.mohapa...@nuance.com> wrote:

> Hi,
>we have a cassandra cluster with 7 nodes each in 3 datacenters. We
> are using C* 2.1.15.4 version.
> Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a
> dedicated one. However network pipe between DC1 and DC3 and between DC2 and
> DC3 is very poor and has only 100 MBit/s and also goes thru VPN network.
> Each node contains about 100 Gb of data and has a RF of 3. Whenever we run
> the repair, it fails with streaming errors and never completes. I have
> already tried the streaming timeout parameter to a very high value. But it
> did not help. I could repair either just in the local dc or just the first
> two DCs. Can not repair DC3 when i combine with the other two DCs.
>
> So how can i successfully repair the keyspace in these kind of
> environments ?
>
> I see that there is a parameter to throttle the inter-dc stream thruput,
> which default to 200 MBit/s. So what is the minimum threshold that i could
> set it to without affecting the cluster ?
>
> Is there any other way to work in these kind of environments ?
> I will appreciate your feedback and help on this.
>
>
>
>
>
> Thanks
>
>
>
> *Kishore Mohapatra*
>
> Principal Operations DBA
>
> Seattle, WA
>
> Email : kishore.mohapa...@nuance.com
>
>
>
>
>


RE: Compaction in cassandra

2017-09-15 Thread Steinmaurer, Thomas
Hi,

usually automatic minor compactions are fine, but you may need much more free 
disk space to reclaim disk space via automatic minor compactions, especially in 
a time series use case with size-tiered compaction strategy (possibly with 
leveled as well, I’m not familiar with this strategy type). We are in the time 
series / STCS combination and currently plan to run a major compaction every X 
weeks. Although not perfect, this is currently our only way to effectively 
really get rid of out-dated data from disk, without the extra cost of storage 
we would additionally need, cause it needs a lot of time that delete markers 
(tombstones) according to our retention policy are actually getting 
automatically minor compacted with potentially large SSTables. Mind you, with 
pre 2.2, a major compaction results in a single (large) SSTable again, so the 
whole disk usage troubles start again. With 2.2+ there is an option to end up 
with SSTables in 50%, 25% etc.. in file size per column family / table, so this 
might be useful.

If you have a time series use case you may want to look at the new time window 
compaction strategy introduced in 3.0, but it relies on TTL-based time series 
data only. We tested it and it works great, but unfortunately we can’t use it, 
cause we may have different TTL/retention policies in a single column family, 
even varying retention configurations per customer over time, so TWCS not 
really an option here, unfortunately.

Thomas

From: Akshit Jain [mailto:akshit13...@iiitd.ac.in]
Sent: Donnerstag, 14. September 2017 08:50
To: user@cassandra.apache.org
Subject: Compaction in cassandra

Is it helpful to run nodetool compaction in cassandra?
or automatic compaction is just fine.
Regards

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra repair process in Low Bandwidth Network

2017-09-15 Thread Mohapatra, Kishore
Hi,
   we have a cassandra cluster with 7 nodes each in 3 datacenters. We are 
using C* 2.1.15.4 version.
Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a dedicated 
one. However network pipe between DC1 and DC3 and between DC2 and DC3 is very 
poor and has only 100 MBit/s and also goes thru VPN network. Each node contains 
about 100 Gb of data and has a RF of 3. Whenever we run the repair, it fails 
with streaming errors and never completes. I have already tried the streaming 
timeout parameter to a very high value. But it did not help. I could repair 
either just in the local dc or just the first two DCs. Can not repair DC3 when 
i combine with the other two DCs.

So how can i successfully repair the keyspace in these kind of environments ?

I see that there is a parameter to throttle the inter-dc stream thruput, which 
default to 200 MBit/s. So what is the minimum threshold that i could set it to 
without affecting the cluster ?

Is there any other way to work in these kind of environments ?
I will appreciate your feedback and help on this.


Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : kishore.mohapa...@nuance.com




Re: From SimpleStrategy to DCs approach

2017-09-15 Thread Cogumelos Maravilha
I've finished the migration to NetworkTopologyStrategy using
GossipingPropertyFileSnitch.

Now I have 4 nodes at zone a (rack1) and another 4 nodes at zone b
(rack2) only one dc, there's no zone c at Frankfurt.

Can I get QUORUM consistency for reading (for writing I'm using ANY)
adding a tiny node using only num_tokens = 3 in another place or it must
be a node like the others with vnodes = 256?

I only make inserts and queries there's no updates or direct deletes
only deletes made by the TTL.

Thanks in advance.


On 05-09-2017 13:41, kurt greaves wrote:
> data will be distributed amongst racks correctly, however only if you
> are using a snitch that understands racks and also
> NetworkTopologyStrategy. SimpleStrategy doesn't understand racks or
> DCs. You should use a snitch that understands racks and then
> transition to a 2 rack cluster, keeping only 1 DC. The whole DC per
> rack thing isn't necessary and will make your clients overly complicated.
>
> On 5 Sep. 2017 21:01, "Cogumelos Maravilha"
> > wrote:
>
> Hi list,
>
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '2'}  AND durable_writes = true;
>
> I'm using C* 3.11.0 with 8 nodes at aws, 4 nodes at zone a and the
> other
> 4 nodes at zone b. The idea is to keep the cluster alive if zone a
> or b
> goes dark and keep QUORUM for reading. For writing I'm using ANY.
>
> Using getendpoints I can see that lots of keys are in the same
> zone. As
> fair as I understand a rack solution does not grant full data
> replication between racks.
>
> My idea to reach this goal is:
>
> - Change replication_factor to 1
>
> - Start decommission nodes one by node in one zone.
>
> - When only 4 nodes are up and running in one zone change keyspace
> configuration to DC using actual data as DC1 and the other 4 nodes
> as DC2
>
>
> Is this the best approach?
>
>
> Thanks in advance.
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
>



RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas
Hi Jeff,

we are using native (CQL3) via Java DataStax driver (3.1). We also have 
OpsCenter running (to be removed soon) via Thrift, if I remember correctly.

As said, the write request latency for our keyspace hasn’t really changed, so 
perhaps another one (system related, OpsCenter …?) is affected or perhaps the 
JMX metric is reporting something differently now. ☺ So not a real issue for 
now hopefully, just popping up in our monitoring, wondering what this may be.

Regarding compression metadata memory usage drop. Right, storage engine 
re-write could be a reason. Thanks.

Still wondering about the GC/CPU increase.

Thanks!

Thomas



From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Freitag, 15. September 2017 13:14
To: user@cassandra.apache.org
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Most people find 3.0 slightly slower than 2.1. The only thing that really 
stands out in your email is the huge change in 95% latency - that's atypical. 
Are you using thrift or native 9042)?  The decrease in compression metadata 
offheap usage is likely due to the increased storage efficiency of the storage 
engine (see Cassandra-8099).


--
Jeff Jirsa


On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas 
> 
wrote:
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto 
deploying our software on a daily basis and attach constant load across all 
deployments. Basically to allow us to detect any regressions in our software on 
a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, 
CMS. The environment has also been upgraded from Cassandra 2.1.18 to 3.0.14 at 
a certain point in time. Without running upgradesstables so far. We have not 
made any additional JVM/GC configuration change when going from 2.1.18 to 
3.0.14 on our own, thus, any self-made configuration changes (e.g. new gen heap 
size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
likely correlating):

· CPU: ~ 12% => ~ 17%

· GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 
50% (with increased GC most likely contributing). Something we have deal with 
when going into production (going into larger, multi-node loadtest environments 
first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes 
(don’t know if they somehow correlate with the CPU/GC shift above):

· Increased AVG Write Client Requests Latency (95th Percentile), 
org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, but 
almost constant (no change in) write client request latency for our particular 
keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

· Compression metadata memory usage drop, 
org.apache.cassandra.metrics.Keyspace.XXX. 
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something 
similar when upgrading to 3.0.14 and can share their thoughts/ideas. Especially 
the (relative) CPU/GC increase is something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Jeff Jirsa
Most people find 3.0 slightly slower than 2.1. The only thing that really 
stands out in your email is the huge change in 95% latency - that's atypical. 
Are you using thrift or native 9042)?  The decrease in compression metadata 
offheap usage is likely due to the increased storage efficiency of the storage 
engine (see Cassandra-8099).


-- 
Jeff Jirsa


> On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas 
>  wrote:
> 
> Hello,
>  
> we have a test (regression) environment hosted in AWS, which is used for auto 
> deploying our software on a daily basis and attach constant load across all 
> deployments. Basically to allow us to detect any regressions in our software 
> on a daily basis.
>  
> On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G 
> heap, CMS. The environment has also been upgraded from Cassandra 2.1.18 to 
> 3.0.14 at a certain point in time. Without running upgradesstables so far. We 
> have not made any additional JVM/GC configuration change when going from 
> 2.1.18 to 3.0.14 on our own, thus, any self-made configuration changes (e.g. 
> new gen heap size) for 2.1.18 are also in place with 3.0.14.
>  
> What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
> some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
> likely correlating):
> · CPU: ~ 12% => ~ 17%
> · GC Suspension: ~ 1,7% => 3,29%
>  
> In this environment not a big deal, but relatively we have a CPU increase of 
> ~ 50% (with increased GC most likely contributing). Something we have deal 
> with when going into production (going into larger, multi-node loadtest 
> environments first though).
>  
> Beside the CPU/GC shift, we also monitor the following noticeable changes 
> (don’t know if they somehow correlate with the CPU/GC shift above):
> · Increased AVG Write Client Requests Latency (95th Percentile), 
> org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, 
> but almost constant (no change in) write client request latency for our 
> particular keyspace only, 
> org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency
> · Compression metadata memory usage drop, 
> org.apache.cassandra.metrics.Keyspace.XXX. 
> CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?
>  
> I know, looks all a bit vague, but perhaps someone else has seen something 
> similar when upgrading to 3.0.14 and can share their thoughts/ideas. 
> Especially the (relative) CPU/GC increase is something we are curious about.
>  
> Thanks a lot.
>  
> Thomas
> The contents of this e-mail are intended for the named addressee only. It 
> contains information that may be confidential. Unless you are the named 
> addressee or an authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify us immediately 
> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
> is a company registered in Linz whose registered office is at 4040 Linz, 
> Austria, Freistädterstraße 313


Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Jeff Jirsa
Few notes:
- in 3.0 the default changed to incremental repair which will have to 
anticompact sstables to allow you to repair the primary ranges you've specified
- since you're starting the repair on all nodes at the same time, you end up 
with overlapping anticompactions

Generally you should stagger your repairs, and if you want it to be more in 
line with the 2.1 behavior, you can pass -full to force full (non-incremental) 
repairs



-- 
Jeff Jirsa


> On Sep 14, 2017, at 11:42 PM, Steinmaurer, Thomas 
>  wrote:
> 
> Hello,
>  
> we are currently in the process of upgrading from 2.1.18 to 3.0.14. After 
> upgrading a few test environments, we start to see some suspicious log 
> entries regarding repair issues.
>  
> We have a cron job on all nodes basically executing the following repair call 
> on a daily basis:
>  
> nodetool repair –pr 
>  
> This gets started on all nodes at the same time. While this has worked with 
> 2.1.18 (at least we haven’t seen anything suspicious in Cassandra log), with 
> 3.0.14 we get something similar like that on all nodes (see below; IP 
> addresses and KS/CF faked).
>  
> Any pointers are appreciated. Thanks.
> Thomas
>  
>  
> INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 - [repair 
> #071f81e0-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
> /FAKE.35.153, /FAKE.34.171 on range 
> [(8195393703879512303,8196334842725538685], 
> (8166975326273137878,8182604850967732931], 
> (-7246799942440641887,-7227869626613009045], 
> (-8371707510273823988,-8365977215604569699], 
> (-141862581573028594,-140310864869418908], 
> (3732113975108886193,3743105867152786342], 
> (4998127507903069087,5008922734235607550], 
> (-5115827291264930140,-5111054924035590372], 
> (-2475342271852943287,-2447285553369030332], 
> (-8318606053827235336,-8308721754886697230], 
> (-5208900659917654871,-5202385837264015269], 
> (6618737991399272130,6623100721269775102], 
> (-4650650128572424858,-4650260492494258461], 
> (1886545362164970333,1886646959491599822], 
> (-4511817721998311568,-4507491187192881115], 
> (8114903118676615937,8132992506844206601], 
> (6224957219376301858,6304379125732293904], 
> (-3460547504877234383,-3459262416082517136], 
> (-167838948111369123,-141862581573028594], 
> (481579232521229473,491242114841289497], 
> (4052464144722307684,4059745901618136723], 
> (1659668187498418295,1679582585970705122], 
> (-1118922763210109192,-1093766915505652874], 
> (7504365235878319341,752615210185292], 
> (-79866884352549492,-77667207866300333], 
> (8151204058820798561,8154760186218662205], 
> (-1040398370287131739,-1033770179677543189], 
> (3767057277953758442,3783780844370292025], 
> (-6491678058233994892,-6487797181789288329], 
> (-916868210769480248,-907141794196269524], 
> (-9005441616028750657,-9002220258513351832], 
> (8183526518331102304,8186908810225025483], 
> (-5685737903527826627,-5672136154194382932], 
> (4976122621177738811,4987871287137312689], 
> (6051670147160447042,6051686987147911650], 
> (-1161640137086921883,-1159172734746043158], 
> (6895951547735922309,6899152466544114890], 
> (-3357667382515377172,-3356304907368646189], 
> (-5370953856683870319,-5345971445444542485], 
> (3824272999898372667,3829315045986248983], 
> (8132992506844206601,8149858096109302285], 
> (3975126143101303723,3980729378827590597], 
> (-956691623200349709,-946602525018301692], 
> (-82499927325251331,-79866884352549492], 
> (3952144214544622998,3955602392726495936], 
> (8154760186218662205,8157079055586089583], 
> (3840595196718778916,3866458971850198755], 
> (-1066905024007783341,-1055954824488508260], 
> (-7252356975874511782,-7246799942440641887], 
> (-810612946397276081,-792189809286829222], 
> (4964519403172053705,4970446606512414858], 
> (-5380038118840759647,-5370953856683870319], 
> (-3221630728515706463,-3206856875356976885], 
> (-1193448110686154165,-1161640137086921883], 
> (-3356304907368646189,-3346460884208327912], 
> (3466596314109623830,346814432669172], 
> (-9050241313548454460,-9005441616028750657], 
> (402227699082311580,407458511300218383]] for XXX.[YYY, ZZZ]
> INFO  [Repair#1:1] 2017-09-15 03:00:28,419 RepairJob.java:172 - [repair 
> #071f81e0-99c2-11e7-91dc-6132f5fe5fb0] Requesting merkle trees for YYY (to 
> [/FAKE.35.153, /FAKE.34.171, /FAKE.33.64])
> INFO  [Thread-2941] 2017-09-15 03:00:28,434 RepairSession.java:224 - [repair 
> #075d2720-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
> /FAKE.35.57, /FAKE.34.171 on range 
> [(-5410955131843184047,-5390722609201388849], 
> (-2429793939970389370,-2402273315769352748], 
> (8085575576842594575,8086965740279021106], 
> (-8802193901675845653,-8790472027607832351], 
> (-3900412470120874591,-3892641480459306647], 
> (5455804264750818305,5465037357825542970], 
> (4930767198829659527,4939587074207662799], 
> (8086965740279021106,8087442741329154201], 
> (-8933201045321260661,-8926445549049070674], 
> 

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Alex,

thanks again! We will switch back to the 2.1 behavior for now.

Thomas

From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 11:30
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Right, you should indeed add the "--full" flag to perform full repairs, and you 
can then keep the "-pr" flag.

I'd advise to monitor the status of your SSTables as you'll probably end up 
with a pool of SSTables marked as repaired, and another pool marked as 
unrepaired which won't be compacted together (hence the suggestion of running 
subrange repairs).
Use sstablemetadata to check on the "Repaired at" value for each. 0 means 
unrepaired and any other value (a timestamp) means the SSTable has been 
repaired.
I've had behaviors in the past where running "-pr" on the whole cluster would 
still not mark all SSTables as repaired, but I can't say if that behavior has 
changed in latest versions.

Having separate pools of SStables that cannot be compacted means that you might 
have tombstones that don't get evicted due to partitions living in both states 
(repaired/unrepaired).

To sum up the recommendations :
- Run a full repair with both "--full" and "-pr" and check that SSTables are 
properly marked as repaired
- Use a tight repair schedule to avoid keeping partitions for too long in both 
repaired and unrepaired state
- Switch to subrange repair if you want to fully avoid marking SSTables as 
repaired (which you don't need anyway since you're not using incremental 
repairs). If you wish to do this, you'll have to mark back all your sstables to 
unrepaired, using nodetool 
sstablerepairedset.

Cheers,

On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas 
> 
wrote:
Hi Alex,

thanks a lot. Somehow missed that incremental repairs are the default now.

We have been happy with full repair so far, cause data what we currently 
manually invoke for being prepared is a small (~1GB or even smaller).

So I guess with full repairs across all nodes, we still can stick with the 
partition range (-pr) option, but with 3.0 we additionally have to provide the 
–full option, right?

Thanks again,
Thomas

From: Alexander Dejanovski 
[mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 09:45
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Hi Thomas,

in 2.1.18, the default repair mode was full repair while since 2.2 it is 
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't trigger 
the same operation.

Incremental repair cannot run on more than one node at a time on a cluster, 
because you risk to have conflicts with sessions trying to anticompact and run 
validation compactions on the same SSTables (which will make the validation 
phase fail, like your logs are showing).
Furthermore, you should never use "-pr" with incremental repair because it is 
useless in that mode, and won't properly perform anticompaction on all nodes.

If you were happy with full repairs in 2.1.18, I'd suggest to stick with those 
in 3.0.14 as well because there are still too many caveats with incremental 
repairs that should hopefully be fixed in 4.0+.
Note that full repair will also trigger anticompaction and mark SSTables as 
repaired in your release of Cassandra, and only full subrange repairs are the 
only flavor that will skip anticompaction.

You will need some tooling to help with subrange repairs though, and I'd 
recommend to use Reaper which handles automation for you : 
http://cassandra-reaper.io/

If you decide to stick with incremental repairs, first perform a rolling 
restart of your cluster to make sure no repair session still runs, and run 
"nodetool repair" on a single node at a time. Move on to the next node only 
when nodetool or the logs show that repair is over (which will include the 
anticompaction phase).

Cheers,



On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas 
> 
wrote:
Hello,

we are currently in the process of upgrading from 2.1.18 to 3.0.14. After 
upgrading a few test environments, we start to see some suspicious log entries 
regarding repair issues.

We have a cron job on all nodes basically executing the following repair call 
on a daily basis:

nodetool repair –pr 

This gets started on all nodes at the same time. While this has worked with 
2.1.18 (at least we haven’t seen anything suspicious in Cassandra log), with 
3.0.14 we get something similar like that on all nodes (see below; IP addresses 
and KS/CF faked).

Any pointers are appreciated. Thanks.
Thomas


INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 - [repair 

GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto 
deploying our software on a daily basis and attach constant load across all 
deployments. Basically to allow us to detect any regressions in our software on 
a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, 
CMS. The environment has also been upgraded from Cassandra 2.1.18 to 3.0.14 at 
a certain point in time. Without running upgradesstables so far. We have not 
made any additional JVM/GC configuration change when going from 2.1.18 to 
3.0.14 on our own, thus, any self-made configuration changes (e.g. new gen heap 
size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
likely correlating):

* CPU: ~ 12% => ~ 17%

* GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 
50% (with increased GC most likely contributing). Something we have deal with 
when going into production (going into larger, multi-node loadtest environments 
first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes 
(don't know if they somehow correlate with the CPU/GC shift above):

* Increased AVG Write Client Requests Latency (95th Percentile), 
org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, but 
almost constant (no change in) write client request latency for our particular 
keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

* Compression metadata memory usage drop, 
org.apache.cassandra.metrics.Keyspace.XXX. 
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something 
similar when upgrading to 3.0.14 and can share their thoughts/ideas. Especially 
the (relative) CPU/GC increase is something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Alexander Dejanovski
Right, you should indeed add the "--full" flag to perform full repairs, and
you can then keep the "-pr" flag.

I'd advise to monitor the status of your SSTables as you'll probably end up
with a pool of SSTables marked as repaired, and another pool marked as
unrepaired which won't be compacted together (hence the suggestion of
running subrange repairs).
Use sstablemetadata to check on the "Repaired at" value for each. 0 means
unrepaired and any other value (a timestamp) means the SSTable has been
repaired.
I've had behaviors in the past where running "-pr" on the whole cluster
would still not mark all SSTables as repaired, but I can't say if that
behavior has changed in latest versions.

Having separate pools of SStables that cannot be compacted means that you
might have tombstones that don't get evicted due to partitions living in
both states (repaired/unrepaired).

To sum up the recommendations :
- Run a full repair with both "--full" and "-pr" and check that SSTables
are properly marked as repaired
- Use a tight repair schedule to avoid keeping partitions for too long in
both repaired and unrepaired state
- Switch to subrange repair if you want to fully avoid marking SSTables as
repaired (which you don't need anyway since you're not using incremental
repairs). If you wish to do this, you'll have to mark back all your
sstables to unrepaired, using nodetool sstablerepairedset

.

Cheers,

On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hi Alex,
>
>
>
> thanks a lot. Somehow missed that incremental repairs are the default now.
>
>
>
> We have been happy with full repair so far, cause data what we currently
> manually invoke for being prepared is a small (~1GB or even smaller).
>
>
>
> So I guess with full repairs across all nodes, we still can stick with the
> partition range (-pr) option, but with 3.0 we additionally have to provide
> the –full option, right?
>
>
>
> Thanks again,
>
> Thomas
>
>
>
> *From:* Alexander Dejanovski [mailto:a...@thelastpickle.com]
> *Sent:* Freitag, 15. September 2017 09:45
> *To:* user@cassandra.apache.org
> *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> Hi Thomas,
>
>
>
> in 2.1.18, the default repair mode was full repair while since 2.2 it is
> incremental repair.
>
> So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't
> trigger the same operation.
>
>
>
> Incremental repair cannot run on more than one node at a time on a
> cluster, because you risk to have conflicts with sessions trying to
> anticompact and run validation compactions on the same SSTables (which will
> make the validation phase fail, like your logs are showing).
>
> Furthermore, you should never use "-pr" with incremental repair because it
> is useless in that mode, and won't properly perform anticompaction on all
> nodes.
>
>
>
> If you were happy with full repairs in 2.1.18, I'd suggest to stick with
> those in 3.0.14 as well because there are still too many caveats with
> incremental repairs that should hopefully be fixed in 4.0+.
>
> Note that full repair will also trigger anticompaction and mark SSTables
> as repaired in your release of Cassandra, and only full subrange repairs
> are the only flavor that will skip anticompaction.
>
>
>
> You will need some tooling to help with subrange repairs though, and I'd
> recommend to use Reaper which handles automation for you :
> http://cassandra-reaper.io/
>
>
>
> If you decide to stick with incremental repairs, first perform a rolling
> restart of your cluster to make sure no repair session still runs, and run
> "nodetool repair" on a single node at a time. Move on to the next node only
> when nodetool or the logs show that repair is over (which will include the
> anticompaction phase).
>
>
>
> Cheers,
>
>
>
>
>
>
>
> On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> Hello,
>
>
>
> we are currently in the process of upgrading from 2.1.18 to 3.0.14. After
> upgrading a few test environments, we start to see some suspicious log
> entries regarding repair issues.
>
>
>
> We have a cron job on all nodes basically executing the following repair
> call on a daily basis:
>
>
>
> nodetool repair –pr 
>
>
>
> This gets started on all nodes at the same time. While this has worked
> with 2.1.18 (at least we haven’t seen anything suspicious in Cassandra
> log), with 3.0.14 we get something similar like that on all nodes (see
> below; IP addresses and KS/CF faked).
>
>
>
> Any pointers are appreciated. Thanks.
>
> Thomas
>
>
>
>
>
> INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 -
> [repair #071f81e0-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync
> /FAKE.33.64, /FAKE.35.153, /FAKE.34.171 on range
> [(8195393703879512303,8196334842725538685],
> (8166975326273137878,8182604850967732931],
> 

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Hi Alex,

thanks a lot. Somehow missed that incremental repairs are the default now.

We have been happy with full repair so far, cause data what we currently 
manually invoke for being prepared is a small (~1GB or even smaller).

So I guess with full repairs across all nodes, we still can stick with the 
partition range (-pr) option, but with 3.0 we additionally have to provide the 
–full option, right?

Thanks again,
Thomas

From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 09:45
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Hi Thomas,

in 2.1.18, the default repair mode was full repair while since 2.2 it is 
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't trigger 
the same operation.

Incremental repair cannot run on more than one node at a time on a cluster, 
because you risk to have conflicts with sessions trying to anticompact and run 
validation compactions on the same SSTables (which will make the validation 
phase fail, like your logs are showing).
Furthermore, you should never use "-pr" with incremental repair because it is 
useless in that mode, and won't properly perform anticompaction on all nodes.

If you were happy with full repairs in 2.1.18, I'd suggest to stick with those 
in 3.0.14 as well because there are still too many caveats with incremental 
repairs that should hopefully be fixed in 4.0+.
Note that full repair will also trigger anticompaction and mark SSTables as 
repaired in your release of Cassandra, and only full subrange repairs are the 
only flavor that will skip anticompaction.

You will need some tooling to help with subrange repairs though, and I'd 
recommend to use Reaper which handles automation for you : 
http://cassandra-reaper.io/

If you decide to stick with incremental repairs, first perform a rolling 
restart of your cluster to make sure no repair session still runs, and run 
"nodetool repair" on a single node at a time. Move on to the next node only 
when nodetool or the logs show that repair is over (which will include the 
anticompaction phase).

Cheers,



On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas 
> 
wrote:
Hello,

we are currently in the process of upgrading from 2.1.18 to 3.0.14. After 
upgrading a few test environments, we start to see some suspicious log entries 
regarding repair issues.

We have a cron job on all nodes basically executing the following repair call 
on a daily basis:

nodetool repair –pr 

This gets started on all nodes at the same time. While this has worked with 
2.1.18 (at least we haven’t seen anything suspicious in Cassandra log), with 
3.0.14 we get something similar like that on all nodes (see below; IP addresses 
and KS/CF faked).

Any pointers are appreciated. Thanks.
Thomas


INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 - [repair 
#071f81e0-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
/FAKE.35.153, /FAKE.34.171 on range [(8195393703879512303,8196334842725538685], 
(8166975326273137878,8182604850967732931], 
(-7246799942440641887,-7227869626613009045], 
(-8371707510273823988,-8365977215604569699], 
(-141862581573028594,-140310864869418908], 
(3732113975108886193,3743105867152786342], 
(4998127507903069087,5008922734235607550], 
(-5115827291264930140,-5111054924035590372], 
(-2475342271852943287,-2447285553369030332], 
(-8318606053827235336,-8308721754886697230], 
(-5208900659917654871,-5202385837264015269], 
(6618737991399272130,6623100721269775102], 
(-4650650128572424858,-4650260492494258461], 
(1886545362164970333,1886646959491599822], 
(-4511817721998311568,-4507491187192881115], 
(8114903118676615937,8132992506844206601], 
(6224957219376301858,6304379125732293904], 
(-3460547504877234383,-3459262416082517136], 
(-167838948111369123,-141862581573028594], 
(481579232521229473,491242114841289497], 
(4052464144722307684,4059745901618136723], 
(1659668187498418295,1679582585970705122], 
(-1118922763210109192,-1093766915505652874], 
(7504365235878319341,752615210185292], 
(-79866884352549492,-77667207866300333], 
(8151204058820798561,8154760186218662205], 
(-1040398370287131739,-1033770179677543189], 
(3767057277953758442,3783780844370292025], 
(-6491678058233994892,-6487797181789288329], 
(-916868210769480248,-907141794196269524], 
(-9005441616028750657,-9002220258513351832], 
(8183526518331102304,8186908810225025483], 
(-5685737903527826627,-5672136154194382932], 
(4976122621177738811,4987871287137312689], 
(6051670147160447042,6051686987147911650], 
(-1161640137086921883,-1159172734746043158], 
(6895951547735922309,6899152466544114890], 
(-3357667382515377172,-3356304907368646189], 
(-5370953856683870319,-5345971445444542485], 
(3824272999898372667,3829315045986248983], 
(8132992506844206601,8149858096109302285], 
(3975126143101303723,3980729378827590597], 

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Alexander Dejanovski
Hi Thomas,

in 2.1.18, the default repair mode was full repair while since 2.2 it is
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't
trigger the same operation.

Incremental repair cannot run on more than one node at a time on a cluster,
because you risk to have conflicts with sessions trying to anticompact and
run validation compactions on the same SSTables (which will make the
validation phase fail, like your logs are showing).
Furthermore, you should never use "-pr" with incremental repair because it
is useless in that mode, and won't properly perform anticompaction on all
nodes.

If you were happy with full repairs in 2.1.18, I'd suggest to stick with
those in 3.0.14 as well because there are still too many caveats with
incremental repairs that should hopefully be fixed in 4.0+.
Note that full repair will also trigger anticompaction and mark SSTables as
repaired in your release of Cassandra, and only full subrange repairs are
the only flavor that will skip anticompaction.

You will need some tooling to help with subrange repairs though, and I'd
recommend to use Reaper which handles automation for you :
http://cassandra-reaper.io/

If you decide to stick with incremental repairs, first perform a rolling
restart of your cluster to make sure no repair session still runs, and run
"nodetool repair" on a single node at a time. Move on to the next node only
when nodetool or the logs show that repair is over (which will include the
anticompaction phase).

Cheers,



On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hello,
>
>
>
> we are currently in the process of upgrading from 2.1.18 to 3.0.14. After
> upgrading a few test environments, we start to see some suspicious log
> entries regarding repair issues.
>
>
>
> We have a cron job on all nodes basically executing the following repair
> call on a daily basis:
>
>
>
> nodetool repair –pr 
>
>
>
> This gets started on all nodes at the same time. While this has worked
> with 2.1.18 (at least we haven’t seen anything suspicious in Cassandra
> log), with 3.0.14 we get something similar like that on all nodes (see
> below; IP addresses and KS/CF faked).
>
>
>
> Any pointers are appreciated. Thanks.
>
> Thomas
>
>
>
>
>
> INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 -
> [repair #071f81e0-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync
> /FAKE.33.64, /FAKE.35.153, /FAKE.34.171 on range
> [(8195393703879512303,8196334842725538685],
> (8166975326273137878,8182604850967732931],
> (-7246799942440641887,-7227869626613009045],
> (-8371707510273823988,-8365977215604569699],
> (-141862581573028594,-140310864869418908],
> (3732113975108886193,3743105867152786342],
> (4998127507903069087,5008922734235607550],
> (-5115827291264930140,-5111054924035590372],
> (-2475342271852943287,-2447285553369030332],
> (-8318606053827235336,-8308721754886697230],
> (-5208900659917654871,-5202385837264015269],
> (6618737991399272130,6623100721269775102],
> (-4650650128572424858,-4650260492494258461],
> (1886545362164970333,1886646959491599822],
> (-4511817721998311568,-4507491187192881115],
> (8114903118676615937,8132992506844206601],
> (6224957219376301858,6304379125732293904],
> (-3460547504877234383,-3459262416082517136],
> (-167838948111369123,-141862581573028594],
> (481579232521229473,491242114841289497],
> (4052464144722307684,4059745901618136723],
> (1659668187498418295,1679582585970705122],
> (-1118922763210109192,-1093766915505652874],
> (7504365235878319341,752615210185292],
> (-79866884352549492,-77667207866300333],
> (8151204058820798561,8154760186218662205],
> (-1040398370287131739,-1033770179677543189],
> (3767057277953758442,3783780844370292025],
> (-6491678058233994892,-6487797181789288329],
> (-916868210769480248,-907141794196269524],
> (-9005441616028750657,-9002220258513351832],
> (8183526518331102304,8186908810225025483],
> (-5685737903527826627,-5672136154194382932],
> (4976122621177738811,4987871287137312689],
> (6051670147160447042,6051686987147911650],
> (-1161640137086921883,-1159172734746043158],
> (6895951547735922309,6899152466544114890],
> (-3357667382515377172,-3356304907368646189],
> (-5370953856683870319,-5345971445444542485],
> (3824272999898372667,3829315045986248983],
> (8132992506844206601,8149858096109302285],
> (3975126143101303723,3980729378827590597],
> (-956691623200349709,-946602525018301692],
> (-82499927325251331,-79866884352549492],
> (3952144214544622998,3955602392726495936],
> (8154760186218662205,8157079055586089583],
> (3840595196718778916,3866458971850198755],
> (-1066905024007783341,-1055954824488508260],
> (-7252356975874511782,-7246799942440641887],
> (-810612946397276081,-792189809286829222],
> (4964519403172053705,4970446606512414858],
> (-5380038118840759647,-5370953856683870319],
> (-3221630728515706463,-3206856875356976885],
> (-1193448110686154165,-1161640137086921883],
> 

Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Hello,

we are currently in the process of upgrading from 2.1.18 to 3.0.14. After 
upgrading a few test environments, we start to see some suspicious log entries 
regarding repair issues.

We have a cron job on all nodes basically executing the following repair call 
on a daily basis:

nodetool repair -pr 

This gets started on all nodes at the same time. While this has worked with 
2.1.18 (at least we haven't seen anything suspicious in Cassandra log), with 
3.0.14 we get something similar like that on all nodes (see below; IP addresses 
and KS/CF faked).

Any pointers are appreciated. Thanks.
Thomas


INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 - [repair 
#071f81e0-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
/FAKE.35.153, /FAKE.34.171 on range [(8195393703879512303,8196334842725538685], 
(8166975326273137878,8182604850967732931], 
(-7246799942440641887,-7227869626613009045], 
(-8371707510273823988,-8365977215604569699], 
(-141862581573028594,-140310864869418908], 
(3732113975108886193,3743105867152786342], 
(4998127507903069087,5008922734235607550], 
(-5115827291264930140,-5111054924035590372], 
(-2475342271852943287,-2447285553369030332], 
(-8318606053827235336,-8308721754886697230], 
(-5208900659917654871,-5202385837264015269], 
(6618737991399272130,6623100721269775102], 
(-4650650128572424858,-4650260492494258461], 
(1886545362164970333,1886646959491599822], 
(-4511817721998311568,-4507491187192881115], 
(8114903118676615937,8132992506844206601], 
(6224957219376301858,6304379125732293904], 
(-3460547504877234383,-3459262416082517136], 
(-167838948111369123,-141862581573028594], 
(481579232521229473,491242114841289497], 
(4052464144722307684,4059745901618136723], 
(1659668187498418295,1679582585970705122], 
(-1118922763210109192,-1093766915505652874], 
(7504365235878319341,752615210185292], 
(-79866884352549492,-77667207866300333], 
(8151204058820798561,8154760186218662205], 
(-1040398370287131739,-1033770179677543189], 
(3767057277953758442,3783780844370292025], 
(-6491678058233994892,-6487797181789288329], 
(-916868210769480248,-907141794196269524], 
(-9005441616028750657,-9002220258513351832], 
(8183526518331102304,8186908810225025483], 
(-5685737903527826627,-5672136154194382932], 
(4976122621177738811,4987871287137312689], 
(6051670147160447042,6051686987147911650], 
(-1161640137086921883,-1159172734746043158], 
(6895951547735922309,6899152466544114890], 
(-3357667382515377172,-3356304907368646189], 
(-5370953856683870319,-5345971445444542485], 
(3824272999898372667,3829315045986248983], 
(8132992506844206601,8149858096109302285], 
(3975126143101303723,3980729378827590597], 
(-956691623200349709,-946602525018301692], 
(-82499927325251331,-79866884352549492], 
(3952144214544622998,3955602392726495936], 
(8154760186218662205,8157079055586089583], 
(3840595196718778916,3866458971850198755], 
(-1066905024007783341,-1055954824488508260], 
(-7252356975874511782,-7246799942440641887], 
(-810612946397276081,-792189809286829222], 
(4964519403172053705,4970446606512414858], 
(-5380038118840759647,-5370953856683870319], 
(-3221630728515706463,-3206856875356976885], 
(-1193448110686154165,-1161640137086921883], 
(-3356304907368646189,-3346460884208327912], 
(3466596314109623830,346814432669172], 
(-9050241313548454460,-9005441616028750657], 
(402227699082311580,407458511300218383]] for XXX.[YYY, ZZZ]
INFO  [Repair#1:1] 2017-09-15 03:00:28,419 RepairJob.java:172 - [repair 
#071f81e0-99c2-11e7-91dc-6132f5fe5fb0] Requesting merkle trees for YYY (to 
[/FAKE.35.153, /FAKE.34.171, /FAKE.33.64])
INFO  [Thread-2941] 2017-09-15 03:00:28,434 RepairSession.java:224 - [repair 
#075d2720-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
/FAKE.35.57, /FAKE.34.171 on range 
[(-5410955131843184047,-5390722609201388849], 
(-2429793939970389370,-2402273315769352748], 
(8085575576842594575,8086965740279021106], 
(-8802193901675845653,-8790472027607832351], 
(-3900412470120874591,-3892641480459306647], 
(5455804264750818305,5465037357825542970], 
(4930767198829659527,4939587074207662799], 
(8086965740279021106,8087442741329154201], 
(-8933201045321260661,-8926445549049070674], 
(-4841328524165418854,-4838895482794593338], 
(628107265570603622,682509946926464280], 
(7043245467621414187,7055126022831789025], 
(624871765540463735,627374995781897409], 
(9219228482330263660,9221294940422311559], 
(-2335215188301493066,-2315034243278984017], 
(-6216599212198827632,-6211460136507414133], 
(-3276490559558850323,-3273110814046238767], 
(7204991007334459472,7214826985711309418], 
(1815809811279373566,1846961604192445001], 
(8743912118048160970,8751518028513315549], 
(-9204701745739426439,-9200185935622985719], 
(7926527126882050773,7941554683778488797], 
(-1307707180308444994,-1274682085495751899], 
(8354147540115782875,8358523989614737607], 
(-5418282332713406631,-541509309282099], 
(2436459402559272117,2441988676982099299], 

Re: Rebalance a cassandra cluster

2017-09-15 Thread Anthony Grasso
As Kurt mentioned, you definitely need to pick a partition key that ensure
data is uniformly distributed.

If you want to want to redistribute the data in cluster and move tokens
around, you could decommission the node with the tokens you want to
redistribute and then bootstrap a new node into the cluster. However, be
careful, because if there are unbalanced partitions in the cluster
redistributing the tokens will just move the problem partition to another
node. In this case, the same problem will occur on the node that picks up
the problem partition key and you will be back in the same situation again.

Regards,
Anthony

On 13 September 2017 at 20:09, kurt greaves  wrote:

> You should choose a partition key that enables you to have a uniform
> distribution of partitions amongst the nodes and refrain from having too
> many wide rows/a small number of wide partitions. If your tokens are
> already uniformly distributed, recalculating in order to achieve a better
> data load balance is probably going to be an effort in futility, plus not
> really a good idea from a maintenance and scaling perspective.​
>