Re: Cassandra snapshot restore with VNODES missing some data

2017-08-30 Thread kurt greaves
Does the source cluster also use vnodes? You will need to ensure you use the same tokens for each node as the snapshots used in the source (and also ensure same tokens apply to same racks).

Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread kurt greaves
For that many nodes mixed with vnodes you probably want a lower RF than N per datacenter. 5 or 7 would be reasonable. The only down side is that auth queries may take slightly longer as they will often have to go to other nodes to be resolved, but in practice this is likely not a big deal as the

Re: Rack Awareness

2017-08-29 Thread kurt greaves
Cassandra understands racks based on the configured snitch and the rack assigned to each node (for example in cassandra-rackdc.properties if using GossipingPropertyFileSnitch). If you have racks configured, to perform a "rack-aware" repair you would simply need to run repair on only one rack. Note

Re: Working With Prepared Statements

2017-08-29 Thread kurt greaves
>From memory prepared statements that are idempotent will not be set as idempotent, so if you are using prepared statements that you know are idempotent you should make sure to set idempotent on them. For java driver see https://github.com/datastax/java-driver/tree/3.x/manual/idempotence​

Re: Cassandra and OpenJDK

2017-08-28 Thread kurt greaves
OpenJDK is fine.

Re: timeouts on counter tables

2017-08-27 Thread kurt greaves
If every node is a replica it sounds like you've got hardware issues. Have you compared iostat to the "normal" nodes? I assume there is nothing different in the logs on this one node? Also sanity check, you are using DCAwareRoundRobinPolicy? ​

Re: timeouts on counter tables

2017-08-27 Thread kurt greaves
What is your RF? Also, as a side note RAID 1 shouldn't be necessary if you have >1 RF and would give you worse performance

Re: Bootstrapping a node fails because of compactions not keeping up

2017-08-23 Thread kurt greaves
> > But if it also streams, it means I'd still be under-pressure if I am not > mistaken. I am under the assumption that the compactions are the by-product > of streaming too many SStables at the same time, and not because of my > current write load. > Ah yeah I wasn't thinking about the capacity

Re: C* 3 node issue -Urgent

2017-08-23 Thread kurt greaves
Common trap. It's an unfortunate default that is not so easy to change.​

Re: Bootstrapping a node fails because of compactions not keeping up

2017-08-23 Thread kurt greaves
> ​1) You mean restarting the node in the middle of the bootstrap with > join_ring=false? Would this option require me to issue a nodetool boostrap > resume, correct? I didn't know you could instruct the join via JMX. Would > it be the same of the nodetool boostrap command? write_survey is

Re: C* 3 node issue -Urgent

2017-08-23 Thread kurt greaves
The cassandra user requires QUORUM consistency to be achieved for authentication. Normal users only require ONE. I suspect your system_auth keyspace has an RF of 1, and the node that owns the cassandra users data is down. Steps to recover: 1. Turn off authentication on all the nodes 2. Restart

Re: Cassandra isn't compacting old files

2017-08-23 Thread kurt greaves
Ignore me, I was getting the major compaction for LCS mixed up with STCS. Estimated droppable tombstones tends to be fairly accurate. If your SSTables in level 2 have that many tombstones I'd say that's definitely the reason L3 isn't being compacted. As for how you got here in the first place,

Re: Bootstrapping a node fails because of compactions not keeping up

2017-08-23 Thread kurt greaves
Well, that sucks. Be interested if you could find out if any of the streamed SSTables are retaining their levels. To answer your questions: 1) No. However, you could set your nodes to join in write_survey mode, which will stop them from joining the ring and you can initiate the join over JMX when

Re: Cassandra isn't compacting old files

2017-08-22 Thread kurt greaves
LCS major compaction on 2.2 should compact each level to have a single SSTable. It seems more likely to me that you are simply not generating enough data to require compactions in L3 and most data is TTL'ing before it gets there. Out of curiosity, what does sstablemetadata report for Estimated

Re: Bootstrapping a node fails because of compactions not keeping up

2017-08-22 Thread kurt greaves
What version are you running? 2.2 has an improvement that will retain levels when streaming and this shouldn't really happen. If you're on 2.1 best bet is to upgrade

Re: Cassandra crashes....

2017-08-22 Thread kurt greaves
sounds like Cassandra is being killed by the oom killer. can you check dmesg to see if this is the case? sounds a bit absurd with 256g of memory but could be a config problem.

Re: Cassandra 3.11 is compacting forever

2017-08-21 Thread kurt greaves
Why are you adding new nodes? If you're upgrading you should upgrade the existing nodes first and then add nodes. ​

Re: Moving all LCS SSTables to a repaired state

2017-08-21 Thread kurt greaves
Is there any specific reason you are trying to achieve this? It shouldn't really matter if you have a few SSTables in the unrepaired pool.​

Re: Moving all LCS SSTables to a repaired state

2017-08-20 Thread kurt greaves
Correction: Full repairs do mark SSTables as repaired in 2.2 (CASSANDRA-7586 ). My mistake, I thought that was only introduced in 3.0. Note that if mixing full and incremental repairs you probably want to be using at least 2.2.10 because of

Re: Moving all LCS SSTables to a repaired state

2017-08-20 Thread kurt greaves
Pretty much, I wouldn't set your heart on having 0 unrepaired SSTables.

Re: Getting all unique keys

2017-08-18 Thread kurt greaves
You can SELECT DISTINCT in CQL, however I would recommend against such a pattern as it is very unlikely to be efficient, and prone to errors. A distinct query will search every partition for the first live cell, which could be buried behind a lot of tombstones. It's safe to say at some point you

Re: Moving all LCS SSTables to a repaired state

2017-08-18 Thread kurt greaves
You need to run an incremental repair for sstables to be marked repaired. However only if all of the data in that Sstable is repaired during the repair will you end up with it being marked repaired, otherwise an anticompaction will occur and split the unrepaired data into its own sstable. It's

Re: Migrate from DSE (Datastax) to Apache Cassandra

2017-08-15 Thread kurt greaves
Haven't done it for 5.1 but went smoothly for earlier versions. If you're not using any of the additional features of DSE, it should be OK. Just change any custom replication strategies before migrating and also make sure your yaml options are compatible.

Re: Attempted to write commit log entry for unrecognized table

2017-08-15 Thread kurt greaves
what does nodetool describecluster show? stab in the dark but you could try nodetool resetlocalschema or a rolling restart of the cluster if it's schema issues.

Re: Dropping down replication factor

2017-08-13 Thread kurt greaves
On 14 Aug. 2017 00:59, "Brian Spindler" wrote: Do you think with the setup I've described I'd be ok doing that now to recover this node? The node died trying to run the scrub; I've restarted it but I'm not sure it's going to get past a scrub/repair, this is why I

Re: rebuild constantly fails, 3.11

2017-08-11 Thread kurt greaves
cc'ing user back in... On 12 Aug. 2017 01:55, "kurt greaves" <k...@instaclustr.com> wrote: > How much memory do these machines have? Typically we've found that G1 > isn't worth it until you get to around 24G heaps, and even at that it's not > really better than CMS. You

Re: Questions on time series use case, tombstones, TWCS

2017-08-09 Thread kurt greaves
> > With STCS, estimated droppable tombstones being always 0.0 (thus also no > automatic single SSTable compaction may happen): Is this a matter of not > writing with TTL? If yes, would enabling TTL with STCS improve the disk > reclaim situation, cause then single SSTAble compactions will kick in?

Re: rebuild constantly fails, 3.11

2017-08-08 Thread kurt greaves
If the error is reproducible can you upload the logs to a gist from the same time period as when the error occurs?​

Re: Creating a copy of a C* cluster

2017-08-07 Thread kurt greaves
The most effective way to "divorce" it is to remove connectivity between the datacentres. I would put in place firewall rules between the DC's to stop them from communicating, and then rolling restart one of the DC's. You should be left with 2 datacentres that see each other as down, and on each

Re: Data Loss irreparabley so

2017-08-02 Thread kurt greaves
You should run repairs every GC_GRACE_SECONDS. If a node is overloaded/goes down, you should run repairs. LOCAL_QUORUM will somewhat maintain consistency within a DC, but certainly doesn't mean you can get away without running repairs. You need to run repairs even if you are using QUORUM or ONE.​

Re: Bootstrapping a new Node with Consistency=ONE

2017-08-02 Thread kurt greaves
only in this one case might that work (RF==N)

Re: Bootstrapping a new Node with Consistency=ONE

2017-08-02 Thread kurt greaves
You can't just add a new DC and then tell their clients to connect to the new one (after migrating all the data to it obv.)? If you can't achieve that you should probably use GossipingPropertyFileSnitch.​ Your best plan is to have the desired RF/redundancy from the start. Changing RF in production

Re: UndeclaredThrowableException, C* 3.11

2017-08-02 Thread kurt greaves
If the repair command failed, repair also failed. Regarding % repaired, no it's unlikely you will see 100% repaired after a single repair. Maybe after a few consecutive repairs with no data load you might get it to 100%.​

Re: Bootstrapping a new Node with Consistency=ONE

2017-08-02 Thread kurt greaves
If you want to change RF on a live system your best bet is through DC migration (add another DC with the desired # of nodes and RF), and migrate your clients to use that DC. There is a way to boot a node and not join the ring, however I don't think it will work for new nodes (have not confirmed),

Re: Is it possible to delete system_auth keyspace.

2017-08-01 Thread kurt greaves
You should be able to create it yourself prior to enabling auth without issues. alternatively you could just add an extra node with auth on, or switch one node to have auth on then change th RF.

Re: How to minimize side effects induced by tombstones when using deletion?

2017-08-01 Thread kurt greaves
> Also, if we repaired once successfully, will the next repair process take a more reasonable time? Depends on if there was a lot of inconsistent data to repair in the first place. Also full repairs or incremental? Repairs are complicated and tricky to get working efficiently. If you're using

Re: Cassandra isn't compacting old files

2017-08-01 Thread kurt greaves
Seeing as there aren't even 100 SSTables in L2, LCS should be gradually trying to compact L3 with L2. You could search the logs for "Adding high-level (L3)" to check if this is happening. ​

Re: Cassandra isn't compacting old files

2017-07-31 Thread kurt greaves
How long is your ttl and how much data do you write per day (ie, what is the difference in disk usage over a day)? Did you always TTL? I'd say it's likely there is live data in those older sstables but you're not generating enough data to push new data to the highest level before it expires.

Re: nodetool repair failure

2017-07-30 Thread kurt greaves
You need check the node that failed validation to find the relevant error. The IP should be in the logs of the node you started repair on. You shouldn't run multiple repairs on the same table from multiple nodes unless you really know what you're doing and not using vnodes. The failure you are

Re: Maximum and recommended storage per node

2017-07-28 Thread kurt greaves
There are many different recommendations floating around, typically the limit depends on how well you know Cassandra and your workload. If your workload is CPU bound, you should go for more, less dense nodes. If not, you can sustain higher data density per node. Typically I'd say the usable range

Re: 回复: 回复: tolerate how many nodes down in the cluster

2017-07-27 Thread kurt greaves
Note that if you use more racks than RF you lose some of the operational benefit. e.g: you'll still only be able to take out one rack at a time (especially if using vnodes), despite the fact that you have more racks than RF. As Jeff said this may be desirable, but really it comes down to what your

Re: read/write request counts and write size of each write

2017-07-25 Thread kurt greaves
Looks like you can collect MutationSizeHistogram for each write as well from the coordinator, in regards to write request size. See the Write request section under https://cassandra.apache.org/doc/latest/operating/metrics.html#client-request-metrics

Re: 回复: tolerate how many nodes down in the cluster

2017-07-25 Thread kurt greaves
Keep in mind that you shouldn't just enable multiple racks on an existing cluster (this will lead to massive inconsistencies). The best method is to migrate to a new DC as Brooke mentioned.​

Re: performance penalty of add column in CQL3

2017-07-25 Thread kurt greaves
If by "offline" you mean with no reads going to the nodes, then yes that would be a *potentially *safe time to do it, but it's still not advised. You should avoid doing any ALTERs on versions of 3 less than 3.0.14 or 3.11 if possible. Adding/dropping a column does not require a re-write of the

Re: read/write request counts and write size of each write

2017-07-25 Thread kurt greaves
You will need to use jmx to collect write/read related metrics. not aware of anything that measures write size, but if there isn't it should be easily measured on your client. there are quite a few existing solutions for monitoring Cassandra out there, you should find some easily with a quick

Re: Data Loss irreparabley so

2017-07-25 Thread kurt greaves
Cassandra doesn't do any automatic repairing. It can tell if your data is inconsistent, however it's really up to you to manage consistency through repairs and choice of consistency level for queries. If you lose a node, you have to manually repair the cluster after replacing the node, but really

Re: 1 node doing compaction all the time in 6-node cluster (C* 2.2.8)

2017-07-24 Thread kurt greaves
Have you checked system logs/dmesg? I'd suspect it's an instance problem too, maybe you'll see some relevant errors in those logs. ​

Re: 回复: tolerate how many nodes down in the cluster

2017-07-24 Thread kurt greaves
I've never really understood why Datastax recommends against racks. In those docs they make it out to be much more difficult than it actually is to configure and manage racks. The important thing to keep in mind when using racks is that your # of racks should be equal to your RF. If you have

Re: 1 node doing compaction all the time in 6-node cluster (C* 2.2.8)

2017-07-24 Thread kurt greaves
Just to rule out a simple problem, are you using a load balancing policy?

Re: Understanding gossip and seeds

2017-07-21 Thread kurt greaves
Haven't checked the code but pretty sure it's because it will always use the known state stored in the system tables. the seeds in the yaml are mostly for initial set up, used to discover the rest of the nodes in the ring. Once that's done there is little reason to refer to them again, unless

Re: write time for nulls is not consistent

2017-07-18 Thread kurt greaves
can you try select a, writetime(b) from test.t I heard of an issue recently where cqlsh reports null incorrectly if you query a column twice, wondering if it extends to this case with writetime.

Re: adding nodes to a cluster and changing rf

2017-07-14 Thread kurt greaves
Increasing RF will result in nodes that previously didn't have a replica of the data now being responsible for it. This means that a repair is required after increasing the RF. Until the repair completes you will suffer from inconsistencies in data. For example, in a 3 node cluster with RF 2,

Re: Unbalanced cluster

2017-07-10 Thread kurt greaves
the reason for the default of 256 vnodes is because at that many tokens the random distribution of tokens is enough to balance out each nodes token allocation almost evenly. any less and some nodes will get far more unbalanced, as Avi has shown. In 3.0 there is a new token allocating algorithm

Re: ALL range query monitors failing frequently

2017-06-28 Thread kurt greaves
I'd say that no, a range query probably isn't the best for monitoring, but it really depends on how important it is that the range you select is consistent. >From those traces it does seem that the bulk of the time spent was waiting for responses from the replicas, which may indicate a network

Re: Restore Snapshot

2017-06-28 Thread kurt greaves
Hm, I did recall seeing a ticket for this particular use case, which is certainly useful, I just didn't think it had been implemented yet. Turns out it's been in since 2.0.7, so you should be receiving writes with join_ring=false. If you confirm you aren't receiving writes then we have an issue.

Re: Restore Snapshot

2017-06-28 Thread kurt greaves
There are many scenarios where it can be useful, but to address what seems to be your main concern; you could simply restore and then only read at ALL until your repair completes. If you use snapshot restore with commitlog archiving you're in a better state, but granted the case you described can

Re: ALL range query monitors failing frequently

2017-06-28 Thread kurt greaves
You're correct in that the timeout is only driver side. The server will have its own timeouts configured in the cassandra.yaml file. I suspect either that you have a node down in your cluster (or 4), or your queries are gradually getting slower. This kind of aligns with the slow query statements

Re: Question: Large partition warning

2017-06-15 Thread kurt greaves
fyi ticket already existed for this, I've submitted a patch that fixes this specific issue but it looks like there are a few other properties that will suffer from the same. As I said on the ticket, we should probably fix these up even though setting things this high is generally bad practice. If

Re: Question: Large partition warning

2017-06-14 Thread kurt greaves
Looks like you've hit a bug (not the first time I've seen this in relation to C* configs). compaction_large_partition_warning_threshold_mb resolves to an int, and in the codebase is represented in bytes. 4096 * 1024 * 1024 and you've got some serious overflow. Granted, you should have this warning

Re: Repairs on 2.1.12

2017-05-11 Thread kurt greaves
to clarify, what exactly was your repair command, and in reference to a ring did you mean the DC or the cluster, and has the repair been running for 2 weeks or is that in reference to the "ring"? It would be helpful if you provided the relevant logs as well, also, the cassandra version you are

Re: Repairs on 2.1.12

2017-05-10 Thread kurt greaves
never seen a repair loop, seems very unlikely. when you say "on a ring" what do you mean? what arguments are you passing to repair? On 10 May 2017 03:22, "Mark Furlong" wrote: I have a large cluster running a -dc repair on a ring which has been running for nearly two

Re: Materialize View in production

2017-05-08 Thread kurt greaves
Generally we still don't consider them stable and you should avoid using them for the moment. As you can see on my favourite search, the list of open bugs for MV's is not small, and there are some scary ones in there: https://issues.apache.org/jira/browse/CASSANDRA-13127?filter=12340733 On 8 May

Re: Smart Table creation for 2D range query

2017-05-08 Thread kurt greaves
Note that will not give you the desired range queries of 0 >= x <= 1 and 0 >= y <= 1. ​Something akin to Jon's solution could give you those range queries if you made the x and y components part of the clustering key. For example, a space of (1,1) could contain all x,y coordinates where x and y

Re: [Cassandra] nodetool compactionstats not showing pending task.

2017-05-02 Thread kurt greaves
I believe this is a bug with the estimation of tasks, however not aware of any JIRA that covers the issue. On 28 April 2017 at 06:19, Abhishek Kumar Maheshwari < abhishek.maheshw...@timesinternet.in> wrote: > Hi , > > > > I will try with JMX but I try with tpstats. In tpstats its showing pending

Re: Running Cassandra in Integration Tests

2017-04-28 Thread kurt greaves
Use ccmlib. https://github.com/pcmanus/ccm On 28 April 2017 at 12:59, Matteo Moci wrote: > Sorry for bumping this old thread, but what would be your suggestion for > programmatically start/stop nodes in a cluster? > > I'd like to make some experiments and perform QUORUM writes

Re: Streaming errors during bootstrap

2017-04-20 Thread kurt greaves
Did this error persist? What was the expected outcome? Did you drop this CF and now expect it to no longer exist? On 12 April 2017 at 01:26, Jai Bheemsen Rao Dhanwada wrote: > Hello, > > I am seeing streaming errors while adding new nodes(in the same DC) to the > cluster.

Re: Very odd & inconsistent results from SASI query

2017-03-20 Thread kurt greaves
As secondary indexes are stored individually on each node what you're suggesting sounds exactly like a consistency issue. the fact that you read 0 cells on one query implies the node that got the query did not have any data for the row. The reason you would sometimes see different behaviours is

Re: Change the IP of a live node

2017-03-15 Thread kurt greaves
Cassandra uses the IP address for more or less everything. It's possible to change it through some hackery however probably not a great idea. The nodes system tables will still reference the old IP which is likely your problem here. On 14 March 2017 at 18:58, George Sigletos

Re: changing compaction strategy

2017-03-15 Thread kurt greaves
The rogue pending task is likely a non-issue. If your jmx command went through without errors and you got the log message you can assume it worked. It won't show in the schema unless you run the ALTER statement which affects the whole cluster. If you were switching from STCS then you wouldn't

Re: Internal Security - Authentication & Authorization

2017-03-15 Thread kurt greaves
Jacob, seems you are on the right track however my understanding is that only the user that was auth'd has their permissions/roles/creds cached. Also. Cassandra will query at QUORUM for the "cassandra" user, and at LOCAL_ONE for *all* other users. This is the same for creating users/roles.

Re: High disk io read load

2017-02-24 Thread kurt greaves
How many CFs are we talking about here? Also, did the script also kick off the scrubs or was this purely from changing the schemas? ​

Re: Read exceptions after upgrading to 3.0.10

2017-02-24 Thread kurt greaves
That stacktrace generally implies your clients are resetting connections. The reconnection policy probably handles the issue automatically, however worth investigating. I don't think it normally causes statuslogger output however, what were the log messages prior to the stacktrace? On 24 February

Re: Which compaction strategy when modeling a dumb set

2017-02-24 Thread kurt greaves
Probably LCS although what you're implying (read before write) is an anti-pattern in Cassandra. Something like this is a good indicator that you should review your model. ​

Re: Count(*) is not working

2017-02-17 Thread kurt greaves
really... well that's good to know. it still almost never works though. i guess every time I've seen it it must have timed out due to tombstones. On 17 Feb. 2017 22:06, "Sylvain Lebresne" <sylv...@datastax.com> wrote: On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves <k...@ins

Re: lots of connection timeouts around same time every day

2017-02-17 Thread kurt greaves
typically when I've seen that gossip issue it requires more than just restarting the affected node to fix. if you're not getting query related errors in the server log you should start looking at what is being queried. are the queries that time out each day the same?

Re: High disk io read load

2017-02-17 Thread kurt greaves
what's the Owns % for the relevant keyspace from nodetool status?

Re: lots of connection timeouts around same time every day

2017-02-17 Thread kurt greaves
have you tried a rolling restart of the entire DC?

Re: Count(*) is not working

2017-02-17 Thread kurt greaves
if you want a reliable count, you should use spark. performing a count (*) will inevitably fail unless you make your server read timeouts and tombstone fail thresholds ridiculous On 17 Feb. 2017 04:34, "Jan" wrote: > Hi, > > could you post the output of nodetool cfstats for the

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-02-13 Thread kurt greaves
are people actually trying to imply that Google is less evil than oracle? what is this shill fest On 12 Feb. 2017 8:24 am, "Kant Kodali" wrote: Saw this one today... https://news.ycombinator.com/item?id=13624062 On Tue, Jan 3, 2017 at 6:27 AM, Eric Evans

Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread kurt greaves
Marketing never lies. Ever

Re: UnknownColumnFamilyException after removing all Cassandra data

2017-02-07 Thread kurt greaves
The node is trying to communicate with another node, potentially streaming data, and is receiving files/data for an "unknown column family". That is, it doesn't know about the CF with the id e36415b6-95a7-368c-9ac0- ae0ac774863d. If you deleted some columnfamilies but not all the system keyspace

Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-30 Thread kurt greaves
On 30 January 2017 at 04:43, Abhishek Kumar Maheshwari < abhishek.maheshw...@timesinternet.in> wrote: > But how I will tell rebuild command source DC if I have more than 2 Dc? You will need to rebuild the new DC from at least one DC for every keyspace present on the new DC and the old DC's.

Re: Time series data model and tombstones

2017-01-29 Thread kurt greaves
Your partitioning key is text. If you have multiple entries per id you are likely hitting older cells that have expired. Descending only affects how the data is stored on disk, if you have to read the whole partition to find whichever time you are querying for you could potentially hit tombstones

Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-27 Thread kurt greaves
What Dikang said, in your original email you are passing -dc to rebuild. This is incorrect. Simply run nodetool rebuild from each of the nodes in the new dc. On 28 Jan 2017 07:50, "Dikang Gu" wrote: > Have you run 'nodetool rebuild dc_india' on the new nodes? > > On Tue,

Re: Re : Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16

2017-01-27 Thread kurt greaves
we've seen this issue on a few clusters, including on 2.1.7 and 2.1.8. pretty sure it is an issue in gossip that's known about. in later versions it seems to be fixed. On 24 Jan 2017 06:09, "sai krishnam raju potturi" wrote: > In the Cassandra versions 2.1.11 - 2.1.16,

Re: Unreliable JMX metrics

2017-01-19 Thread kurt Greaves
Yes. You likely will still be able to see the nodes in nodetool gossipinfo

Re: Cassandra cluster performance

2017-01-05 Thread kurt Greaves
you should try switching to async writes and then perform the test. sync writes won't make much difference from a single node but multiple nodes there should be a massive difference. On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" < bjano...@cisco.com> wrote: > Hi, > >

Re: How to change Replication Strategy and RF

2016-12-29 Thread kurt Greaves
​If you're already using the cluster in production and require no downtime you should perform a datacenter migration first to change the RF to 3. Rough process would be as follows: 1. Change keyspace to NetworkTopologyStrategy with RF=1. You shouldn't increase RF here as you will receive

Re: Cassandra cluster performance

2016-12-23 Thread kurt Greaves
Branislav, are you doing async writes?

Re: iostat -like tool to parse 'nodetool cfstats'

2016-12-20 Thread kurt Greaves
Anything in cfstats you should be able to retrieve through the metrics Mbeans. See https://cassandra.apache.org/doc/latest/operating/metrics.html On 20 December 2016 at 23:04, Richard L. Burton III wrote: > I haven't seen anything like that myself. It would be nice to have >

Re: Incremental repair for the first time

2016-12-20 Thread kurt Greaves
No workarounds, your best/only option is to upgrade (plus you get the benefit of loads of other bug fixes). On 16 December 2016 at 21:58, Kathiresan S wrote: > Thank you! > > Is any work around available for this version? > > Thanks, > Kathir > > > On Friday,

Re: Join_ring=false Use Cases

2016-12-20 Thread kurt Greaves
It seems that you're correct in saying that writes don't propagate to a node that has join_ring set to false, so I'd say this is a flaw. In reality I can't see many actual use cases in regards to node outages with the current implementation. The main usage I'd think would be to have additional

Re: Cassandra 2.x Stability

2016-11-30 Thread kurt Greaves
Latest release in 2.2. 2.1 is borderline EOL and from my experience 2.2 is quite stable and has some handy bugfixes that didn't actually make it into 2.1 On 30 November 2016 at 10:41, Shalom Sagges wrote: > Hi Everyone, > > I'm about to upgrade our 2.0.14 version to a

Re: Which version is stable enough for production environment?

2016-11-30 Thread kurt Greaves
Yes Benjamin, no one said it wouldn't. We're actively backporting things as we get time, if you find something you'd like backported raise an issue and let us know. We're well aware of the issues affecting MVs, but they haven't really been solved anywhere yet. On 30 November 2016 at 07:54,

Re: Cassandra Upgrade

2016-11-29 Thread kurt Greaves
Why would you remove all the data? That doesn't sound like a good idea. Just upgrade the OS and then go through the normal upgrade flow of starting C* with the next version and upgrading sstables. Also, *you will need to go from 2.0.14 -> 2.1.16 -> 2.2.8* and upgrade sstables at each stage of the

Re: lots of DigestMismatchException in cassandra3

2016-11-22 Thread kurt Greaves
nsert, update, delete on the > same record at the same time , is it a possibility? > > > > -- > > Regards, Adeline > > > > > > > > *From:* kurt Greaves [mailto:k...@instaclustr.com] > *Sent:* Wednesday, November 23, 2016 6:51 AM > *To

Re: lots of DigestMismatchException in cassandra3

2016-11-22 Thread kurt Greaves
deta/not all replicas receiving all writes. You should run a repair and see if the number of mismatches is reduced. Kurt Greaves k...@instaclustr.com www.instaclustr.com On 22 November 2016 at 06:30, <adeline@thomsonreuters.com> wrote: > Hi Kurt, > > Thank you for

Re: Is it *safe* to issue multiple replace-node at the same time?

2016-11-21 Thread kurt Greaves
is assuming RF<=# of racks as well (and NTS). Kurt Greaves www.instaclustr.com

Re: lots of DigestMismatchException in cassandra3

2016-11-21 Thread kurt Greaves
Actually, just saw the error message in those logs and what you're looking at is probably https://issues.apache.org/jira/browse/CASSANDRA-12694 Kurt Greaves k...@instaclustr.com www.instaclustr.com On 21 November 2016 at 08:59, kurt Greaves <k...@instaclustr.com> wrote: > That'

RE: lots of DigestMismatchException in cassandra3

2016-11-21 Thread kurt Greaves
That's a debug message. From the sound of it, it's triggered on read where there is a digest mismatch between replicas. As to whether it's normal, well that depends on your cluster. Are the nodes reporting lots of dropped mutations and are you writing at

<    1   2   3   4   >