Truncate data from a single node

2017-07-11 Thread Kevin O'Connor
This might be an interesting question - but is there a way to truncate data from just a single node or two as a test instead of truncating from the entire cluster? We have time series data we don't really care if we're missing gaps in, but it's taking up a huge amount of space and we're looking to

Re: Truncate data from a single node

2017-07-11 Thread Patrick McFadin
Hey Kevin, I would worry that much about a truncate operation. It can quietly destroy all your data very efficiently. One thing you should know is that a snapshot is automatically created when you issue a truncate. Yes. An undelete if you screw up. Just don't be surprised when you find it.

Re: Unbalanced cluster

2017-07-11 Thread Jonathan Haddad
Awesome utility Avi! Thanks for sharing. On Tue, Jul 11, 2017 at 10:57 AM Avi Kivity wrote: > There is now a readme with some examples and a build file. > > On 07/11/2017 11:53 AM, Avi Kivity wrote: > > Yeah, posting a github link carries an implied undertaking to write a >

Re: "nodetool repair -dc"

2017-07-11 Thread vasu gunja
Hi , My Question specific to -dc option Do we need to run this on all nodes that belongs to that DC ? Or only on one of the nodes that belongs to that DC then it will repair all nodes ? On Sat, Jul 8, 2017 at 10:56 PM, Varun Gupta wrote: > I do not see the need to run

reduced num_token = improved performance ??

2017-07-11 Thread ZAIDI, ASAD A
Hi Folks, Pardon me if I’m missing something obvious. I’m still using apache-cassandra 2.2 and planning for upgrade to 3.x. I came across this jira [https://issues.apache.org/jira/browse/CASSANDRA-7032] that suggests reducing num_token may improve general performance of Cassandra like

Re: Unbalanced cluster

2017-07-11 Thread Avi Kivity
There is now a readme with some examples and a build file. On 07/11/2017 11:53 AM, Avi Kivity wrote: Yeah, posting a github link carries an implied undertaking to write a README file and make it easily buildable. I'll see what I can do. On 07/11/2017 06:25 AM, Nate McCall wrote: You

c* updates not getting reflected.

2017-07-11 Thread techpyaasa .
Hi, We have a table with following schema: CREATE TABLE ks1.cf1 ( pid bigint, cid bigint, resp_json text, status int, PRIMARY KEY (pid, cid) ) WITH CLUSTERING ORDER BY (cid ASC) with LCS compaction strategy. We make very frequent updates to this table with query like UPDATE ks1.cf1 SET status

Re: "nodetool repair -dc"

2017-07-11 Thread Anuj Wadehra
Hi,  I have not used dc local repair specifically but generally repair syncs all local tokens of the node with other replicas (full repair) or a subset of local tokens (-pr and subrange). Full repair with - Dc option should only sync data for all the tokens present on the node where the command

Re: c* updates not getting reflected.

2017-07-11 Thread Carlos Rolo
What consistency are you using on those queries? On 11 Jul 2017 19:09, "techpyaasa ." wrote: > Hi, > > We have a table with following schema: > > CREATE TABLE ks1.cf1 ( pid bigint, cid bigint, resp_json text, status int, > PRIMARY KEY (pid, cid) ) WITH CLUSTERING ORDER BY

Re: Unbalanced cluster

2017-07-11 Thread Avi Kivity
Yeah, posting a github link carries an implied undertaking to write a README file and make it easily buildable. I'll see what I can do. On 07/11/2017 06:25 AM, Nate McCall wrote: You wouldnt have a build file laying around for that, would you? On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall

Re: Unbalanced cluster

2017-07-11 Thread Loic Lambiel
Thanks for the hint and tool ! By the way, what does the --shards parameter means ? Thanks Loic On 07/10/2017 05:20 PM, Avi Kivity wrote: > 32 tokens is too few for 33 nodes. I have a sharding simulator [1] and > it shows > > > $ ./shardsim --vnodes 32 --nodes 33 --shards 1 > 33 nodes, 32

Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-11 Thread qiang zhang
Thanks for your explanation! > It's taking a full minute to sync your memtable to disk. This is either indication that your disk is broken, or your JVM is pausing for GC. The disk is ok, the long time JVM pausing happens many times, I didn't disable the paging file in windows, may be that's the

Re: Unbalanced cluster

2017-07-11 Thread Avi Kivity
It is ScyllaDB specific. Scylla divides data not only among nodes, but also internally within a node among cores (=shards in our terminology). In the past we had problems with shards being over- and under-utilized (just like your cluster), so this simulator was developed to validate the

Re: reduced num_token = improved performance ??

2017-07-11 Thread Justin Cameron
Hi, Using fewer vnodes means you'll have a higher chance of hot spots in your cluster. Hot spots in Cassandra are nodes that, by random chance, are responsible for a higher percentage of the token space than others. This means they will receive more data and also more traffic/load than other