Re: where does c* store the schema?
Rahul, none of that is true at all. Each node stores schema locally in a non-replicated system table. Schema changes are disseminated directly to live nodes (not the write path), and the schema version is gossiped to other nodes. If a node misses a schema update, it will figure this out when it notices that it’s local schema version is behind the one being gossiped by the rest of the cluster, and will pull the updated schema from the other nodes in the cluster. From: Rahul SinghReply-To: Date: Tuesday, April 17, 2018 at 4:13 PM To: Subject: Re: where does c* store the schema? It uses a “everywhere” replication strategy and its recommended to do all alter / create / drop statements with consistency level all — meaning it wouldn’t make the change to the schema if the nodes are up. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 17, 2018, 12:31 AM -0500, Jinhua Luo , wrote: Yes, I know it must be in system schema. But how c* replicates the user defined schema to all nodes? If it applies the same RWN model to them, then what's the R and W? And when a failed node comes back to the cluster, how to recover the schema updates it may miss during the outage? 2018-04-16 17:01 GMT+08:00 DuyHai Doan : There is a system_schema keyspace to store all the schema information https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html#useQuerySystem__table_bhg_1bw_4v On Mon, Apr 16, 2018 at 10:48 AM, Jinhua Luo wrote: Hi All, Does c* use predefined keyspace/tables to store the user defined schema? If so, what's the RWN of those meta schema? And what's the procedure to update them? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Why Cassandra need full repair after incremental repair
Because in theory, corruption of your repaired dataset is possible, which incremental repair won’t fix. In practice pre-4.0 incremental repair has some flaws that can bring deleted data back to life in some cases, which this would address. You should also evaluate whether pre-4.0 incremental repair is saving you time. The same flaws can cause *a lot* of over streaming, which may negate the benefit of repairing only the unrepaired data. > On Nov 2, 2017, at 2:17 AM, dayuwrote: > > https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesWhen.html > > So you means i am misleading by this statements. The full repair only needed > when node failure + replacement, or adding a datacenter. right? > > > > > At 2017-11-02 15:54:49, "kurt greaves" wrote: > Where are you seeing this? If your incremental repairs work properly, full > repair is only needed in certain situations, like after node failure + > replacement, or adding a datacenter. > > >
Re: Cqlsh unable to switch keyspace after Cassandra upgrade.
Looks like a bug, could you open a jira? > On Nov 2, 2017, at 2:08 AM, Mikhail Tsaplinwrote: > > Hi, > I've upgraded Cassandra from 2.1.6 to 3.0.9 on three nodes cluster. After > upgrade > cqlsh shows following error when trying to run "use {keyspace};" command: > 'ResponseFuture' object has no attribute 'is_schema_agreed' > > Actual upgrade was done on Ubuntu 16.04 by running "apt-get upgrade > cassandra" command. > Apt repository is deb http://debian.datastax.com/community stable main. > Following parameters were migrated from former cassandra.yaml: > cluster_name, num_tokens, data_file_directories, commit_log_directory, > saved_caches_directory, seeds, listen_address, rpc_address, initial_token, > auto_bootstrap. > > Later I did additional test - fetched 3.0.15 binary distribution from > cassandra.apache.org and tried to run cassandra from this distr - same error: > $ ./bin/cqlsh > Connected to cellwize.cassandra at 172.31.17.42:9042. > [cqlsh 5.0.1 | Cassandra 3.0.15 | CQL spec 3.4.0 | Native protocol v4] > Use HELP for help. > cqlsh> use listener ; > 'ResponseFuture' object has no attribute 'is_schema_agreed' > cqlsh> > > What could be the reason?
Re: Need help with incremental repair
Ah cool, I didn't realize reaper did that. On October 30, 2017 at 1:29:26 PM, Paulo Motta (pauloricard...@gmail.com) wrote: > This is also the case for full repairs, if I'm not mistaken. Assuming I'm not > missing something here, that should mean that he shouldn't need to mark > sstables as unrepaired? That's right, but he mentioned that he is using reaper which uses subrange repair if I'm not mistaken, which doesn't do anti-compaction. So in that case he should probably mark data as unrepaired when no longer using incremental repair. 2017-10-31 3:52 GMT+11:00 Blake Eggleston <beggles...@apple.com>: >> Once you run incremental repair, your data is permanently marked as >> repaired > > This is also the case for full repairs, if I'm not mistaken. I'll admit I'm > not as familiar with the quirks of repair in 2.2, but prior to > 4.0/CASSANDRA-9143, any global repair ends with an anticompaction that marks > sstables as repaired. Looking at the RepairRunnable class, this does seem to > be the case. Assuming I'm not missing something here, that should mean that > he shouldn't need to mark sstables as unrepaired? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Need help with incremental repair
> Once you run incremental repair, your data is permanently marked as repaired This is also the case for full repairs, if I'm not mistaken. I'll admit I'm not as familiar with the quirks of repair in 2.2, but prior to 4.0/CASSANDRA-9143, any global repair ends with an anticompaction that marks sstables as repaired. Looking at the RepairRunnable class, this does seem to be the case. Assuming I'm not missing something here, that should mean that he shouldn't need to mark sstables as unrepaired?
Re: Need help with incremental repair
Hey Aiman, Assuming the situation is just "we accidentally ran incremental repair", you shouldn't have to do anything. It's not going to hurt anything. Pre-4.0 incremental repair has some issues that can cause a lot of extra streaming, and inconsistencies in some edge cases, but as long as you're running full repairs before gc grace expires, everything should be ok. Thanks, Blake On October 28, 2017 at 1:28:42 AM, Aiman Parvaiz (ai...@steelhouse.com) wrote: Hi everyone, We seek your help in a issue we are facing in our 2.2.8 version. We have 24 nodes cluster spread over 3 DCs. Initially, when the cluster was in a single DC we were using The Last Pickle reaper 0.5 to repair it with incremental repair set to false. We added 2 more DCs. Now the problem is that accidentally on one of the newer DCs we ran nodetool repair without realizing that for 2.2 the default option is incremental. I am not seeing any errors in the logs till now but wanted to know what would be the best way to handle this situation. To make things a little more complicated, the node on which we triggered this repair is almost out of disk and we had to restart C* on it. I can see a bunch of "anticompaction after repair" under Opscenter Activites across various nodes in the 3 DCs. Any help, suggestion would be appreciated. Thanks
Materialized Views marked experimental
Hi user@, Following a discussion on dev@, the materialized view feature is being retroactively classified as experimental, and not recommended for new production uses. The next patch releases of 3.0, 3.11, and 4.0 will include CASSANDRA-13959, which will log warnings when materialized views are created, and introduce a yaml setting that will allow operators to disable their creation. Concerns about MV’s suitability for production are not uncommon, and this just formalizes the advice often given to people considering materialized views. That is: materialized views have shortcomings that can make them unsuitable for the general use case. If you’re not familiar with their shortcomings and confident they won’t cause problems for your use case, you shouldn’t use them The shortcomings I’m referring to are: * There's no way to determine if a view is out of sync with the base table. * If you do determine that a view is out of sync, the only way to fix it is to drop and rebuild the view. Even in the happy path, there isn’t an upper bound on how long it will take for updates to be reflected in the view. There is also concern that the materialized view design isn’t known to be ‘correct’. In other words, it’s a distributed system design that hasn’t been extensively modeled and simulated, and there have been no formal proofs about it’s properties. You should be aware that manually denormalizing your tables has these same limitations in most cases, so you may not be losing any guarantees by using materialized views in your use case. The reason we’re doing this is because users expect correctness from features built into a database. It’s not unreasonable for users to think that a database feature will more or less “just work”, which is not necessarily the case for materialized views. If a feature is considered experimental, users are more likely to spend the time understanding it’s shortcomings before using it in their application. Thanks, Blake The dev@ thread can be found here: https://www.mail-archive.com/dev@cassandra.apache.org/msg11511.html
Re: What is a node's "counter ID?"
I believe that’s just referencing a counter implementation detail. If I remember correctly, there was a fairly large improvement of the implementation of counters in 2.1, and the assignment of the id would basically be a format migration. > On Oct 20, 2017, at 9:57 AM, Paul Pollackwrote: > > Hi, > > I was reading the doc page for nodetool cleanup > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCleanup.html > because I was planning to run it after replacing a node in my counter cluster > and the sentence "Cassandra assigns a new counter ID to the node" gave me > pause. I can't find any other reference to a node's counter ID in the docs > and was wondering if anyone here could shed light on what this means, and how > it would affect the data being stored on a node that had its counter ID > changed? > > Thanks, > Paul
Re: Does NTP affects LWT's ballot UUID?
Since the UUID is used as the ballot in a paxos instance, if it goes backwards in time, it will be rejected by the other replicas (if there is a more recent instance), and the proposal will fail. However, after the initial rejection, the coordinator will try again with the most recently seen ballot +1, which should succeed (unless another coordinator has started a proposal with a higher ballot in the meantime). On October 10, 2017 at 1:04:22 AM, Daniel Woo (daniel.y@gmail.com) wrote: Hi DuyHai, Thanks, and that's exactly what I am asking, if NTP goes backward. Actually NTP often does that because clock drift is inevitable. On Tue, Oct 10, 2017 at 3:13 PM, DuyHai Doanwrote: The ballot UUID is obtained using QUORUM agreement between replicas for a given partition key and we use this TimeUUID ballot as write-time for the mutation. The only scenario where I can see a problem is that NTP goes backward in time on a QUORUM of replicas, which would break the contract of monotonicity. I don't know how likely this event is ... On Tue, Oct 10, 2017 at 9:07 AM, Daniel Woo wrote: Hi guys, The ballot UUID should be monotonically increasing on each coordinator, but the UUID in cassandra is version 1 (timestamp based), what happens if the NTP service adjusts system clock while a two phase paxos prepare/commit is in progress? -- Thanks & Regards, Daniel -- Thanks & Regards, Daniel
Re: table repair question
Incremental repairs should also update the percentage, although I'd recommend not using incremental repair before 4.0. Just want to point out that running repairs based on repaired % isn't necessarily a bad thing, but it should be a secondary consideration. The important thing is to repair data more frequently than your gc grace period. On October 4, 2017 at 1:33:57 PM, Javier Canillas (javier.canil...@gmail.com) wrote: That percentage will only be updated if you do a full repair. If you do repairs on local dc or with -pr, that percentage will not be updated. I scripted a regular repair on each node based on if this percentage is below some threshold. It has been running fine since several months ago. 2017-10-04 12:46 GMT-03:00 Blake Eggleston <beggles...@apple.com>: Not really no. There's a repaired % in nodetool tablestats if you're using incremental repair (and you probably shouldn't be before 4.0 comes out), but I wouldn't make any decisions based off it's value. On October 4, 2017 at 8:05:44 AM, ZAIDI, ASAD A (az1...@att.com) wrote: Hello folk, I’m wondering if there is way to find out list of table(s) which need repair OR if there Is way to find out what %age of data would need to be repaired on a table? Is such information available from Cassandra db engine through some other means? TIA~ Asad
Re: table repair question
Not really no. There's a repaired % in nodetool tablestats if you're using incremental repair (and you probably shouldn't be before 4.0 comes out), but I wouldn't make any decisions based off it's value. On October 4, 2017 at 8:05:44 AM, ZAIDI, ASAD A (az1...@att.com) wrote: Hello folk, I’m wondering if there is way to find out list of table(s) which need repair OR if there Is way to find out what %age of data would need to be repaired on a table? Is such information available from Cassandra db engine through some other means? TIA~ Asad
Re: Materialized views stability
Hi Hannu, There are more than a few committers that don't think MVs are currently suitable for production use. I'm not involved with MV development, so this may not be 100% accurate, but the problems as I understand them are: There's no way to determine if a view is out of sync with the base table. If you do determine that a view is out of sync, the only way to fix it is to drop and rebuild the view. There are liveness issues with updates being reflected in the view. Any one of these issues makes it difficult to recommend for general application development. I'd say that if you're not super familiar with their shortcomings and are confident you can fit your use case in them, you're probably better off not using them. Thanks, Blake On October 2, 2017 at 6:55:52 AM, Hannu Kröger (hkro...@gmail.com) wrote: Hello, I have seen some discussions around Materialized Views and stability of that functionality. There are some open issues around repairs: https://issues.apache.org/jira/browse/CASSANDRA-13810?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%2C%20Testing%2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%20component%20%3D%20%22Materialized%20Views%22 Is it so that the current problems are mostly related to incremental repairs or are there also other major concerns why some people don’t consider them to be safe for production use? Cheers, Hannu
Re: Nodetool repair -pr
It will on 2.2 and higher, yes. Also, just want to point out that it would be worth it for you to compare how long incremental repairs take vs full repairs in your cluster. There are some problems (which are fixed in 4.0) that can cause significant overstreaming when using incremental repair. On September 28, 2017 at 11:46:47 AM, Dmitry Buzolin (dbuz5ga...@gmail.com) wrote: Hi All, Can someone confirm if "nodetool repair -pr -j2" does run with -inc too? I see the docs mention -inc is set by default, but I am not sure if it is enabled when -pr option is used. Thanks! - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: How to check if repair is actually successful
If nodetool repair doesn't return an error, and doesn't hang, the repair completed successfully. On September 1, 2017 at 5:50:53 AM, Akshit Jain (akshit13...@iiitd.ac.in) wrote: Hi, I am performing repair on cassandra cluster. After getting repair status as successful, How to figure out if it is successful actually? Is there any way to test it?
Re: nodetool gossipinfo question
That's the value version. Gossip uses versioned values to work out which piece of data is the most recent. Each node has it's own highest version, so I don't think it's unusual for that to be different for different nodes. When you say the node crashes, do you mean the process dies? On August 29, 2017 at 4:20:30 PM, Gopal, Dhruva (dhruva.go...@aspect.com) wrote: Hi – I have a question on the significance a particular attribute in the output result for ‘nodetool gossipinfo’: RELEASE_VERSION. We have a node that is periodically crashing with nothing really of significance in the logs and we’re trying to ascertain if it’s an OS issue or something to do with Cassandra. The output of ‘nodetool gossipinfo’ for this node was different from all other nodes in the cluster – we’re using 2.1.11 in this cluster and it was set to RELEASE_VERSION:6:2.1.11. In a subsequent run, after another crash, while attempting a repair it was set to RELEASE_VERSION:7:2.1.11. All other nodes output RELEASE_VERSION:4:2.1.11. What is the significance of that first digit (the rest appears to be the version of Cassandra running on each node). Regards, Dhruva This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Question about nodetool repair
Specifying a dc will only repair the data in that dc. If you leave out the dc flag, it will repair data in both dcs. You probably shouldn't be restricting repair to one dc without a good rationale for doing so. On August 31, 2017 at 8:56:24 AM, Harper, Paul (paul.har...@aspect.com) wrote: Hello All, I have a 6 node ring with 3 nodes in DC1 and 3 nodes in DC2. I ssh into node5 on DC2 was in a “DN” state. I ran “nodetool repair”. I’ve had this situation before and ran “nodetool repair -dc DC2”. I’m trying what if anything is different between those commands. What are they actually doing? Thanks This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Missing results during range query paging
That does sound troubling. You mentioned you're reading at local quorum. Did you write these control records at quorum, or from the same dc at local quorum? What CL/DC are the other records written at? On May 17, 2017 at 10:16:42 AM, Dominic Chevalier (dccheval...@gmail.com) wrote: Hi Folks, I've been noticing some missing rows, any where from 20-40% missing, while executing paging queries over my cluster. Basically the query is to hit every row, subdividing the entire token range into a few tens of token ranges to parallelize the work, there is no wrap around involved, at local_quorum: select * from cf where token(primaryKey) > minimum and token(primaryKey) < maximum; I have inserted a test-control data set of 100,000 records, among billions of live records. The control data set does not change, does not TTL, and queries for individual rows at local_quorum return nearly all of the data, so it's very strange paging queries consistently return 60-80% of what I expect. In the past, paging queries have returned almost all of the control data set, and still do in smaller test clusters. My suspicion is something in the cluster state is impacting these results, but I have yet to pin point anything. Nor have I been able to pinpoint what in the past lead from consistently 100% paging coverage to consistently a lot less than 100% coverage. My cluster is Apache Cassandra 2.1.15, with approximately 100 nodes in the local data center. Java driver version 3.1.0. Thank you, Dominic
Re: LCS, range tombstones, and eviction
The start and end points of a range tombstone are basically stored as special purpose rows alongside the normal data in an sstable. As part of a read, they're reconciled with the data from the other sstables into a single partition, just like the other rows. The only difference is that they don't contain any 'real' data, and, of course, they prevent 'deleted' data from being returned in the read. It's a bit more complicated than that, but that's the general idea. On May 12, 2017 at 6:23:01 AM, Stefano Ortolani (ostef...@gmail.com) wrote: Thanks a lot Blake, that definitely helps! I actually found a ticket re range tombstones and how they are accounted for: https://issues.apache.org/jira/browse/CASSANDRA-8527 I am wondering now what happens when a node receives a read request. Are the range tombstones read before scanning the SStables? More interestingly, given that a single partition might be split across different levels, and that some range tombstones might be in L0 while all the rest of the data in L1, are all the tombstones prefetched from _all_ the involved SStables before doing any table scan? Regards, Stefano On Thu, May 11, 2017 at 7:58 PM, Blake Eggleston <beggles...@apple.com> wrote: Hi Stefano, Based on what I understood reading the docs, if the ratio of garbage collectable tomstones exceeds the "tombstone_threshold", C* should start compacting and evicting. If there are no other normal compaction tasks to be run, LCS will attempt to compact the sstables it estimates it will be able to drop the most tombstones from. It does this by estimating the number of tombstones an sstable has that have passed the gc grace period. Whether or not a tombstone will actually be evicted is more complicated. Even if a tombstone has passed gc grace, it can't be dropped if the data it's deleting still exists in another sstable, otherwise the data would appear to return. So, a tombstone won't be dropped if there is data for the same partition in other sstables that is older than the tombstone being evaluated for eviction. I am quite puzzled however by what might happen when dealing with range tombstones. In that case a single tombstone might actually stand for an arbitrary number of normal tombstones. In other words, do range tombstones contribute to the "tombstone_threshold"? If so, how? From what I can tell, each end of the range tombstone is counted as a single tombstone tombstone. So a range tombstone effectively contributes '2' to the count of tombstones for an sstable. I'm not 100% sure, but I haven't seen any sstable writing logic that tracks open tombstones and counts covered cells as tombstones. So, it's likely that the effect of range tombstones covering many rows are under represented in the droppable tombstone estimate. I am also a bit confused by the "tombstone_compaction_interval". If I am dealing with a big partition in LCS which is receiving new records every day, and a weekly incremental repair job continously anticompacting the data and thus creating SStables, what is the likelhood of the default interval (10 days) to be actually hit? It will be hit, but probably only in the repaired data. Once the data is marked repaired, it shouldn't be anticompacted again, and should get old enough to pass the compaction interval. That shouldn't be an issue though, because you should be running repair often enough that data is repaired before it can ever get past the gc grace period. Otherwise you'll have other problems. Also, keep in mind that tombstone eviction is a part of all compactions, it's just that occasionally a compaction is run specifically for that purpose. Finally, you probably shouldn't run incremental repair on data that is deleted. There is a design flaw in the incremental repair used in pre-4.0 of cassandra that can cause consistency issues. It can also cause a *lot* of over streaming, so you might want to take a look at how much streaming your cluster is doing with full repairs, and incremental repairs. It might actually be more efficient to run full repairs. Hope that helps, Blake On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostef...@gmail.com) wrote: Hi all, I am trying to wrap my head around how C* evicts tombstones when using LCS. Based on what I understood reading the docs, if the ratio of garbage collectable tomstones exceeds the "tombstone_threshold", C* should start compacting and evicting. I am quite puzzled however by what might happen when dealing with range tombstones. In that case a single tombstone might actually stand for an arbitrary number of normal tombstones. In other words, do range tombstones contribute to the "tombstone_threshold"? If so, how? I am also a bit confused by the "tombstone_compaction_interval". If I am dealing with a big partition in LCS which is receiving new records every day, and a weekly incremental repair job con
Re: AWS Cassandra backup/Restore tools
OpsCenter 6.0 and up don't work with Cassandra. On May 11, 2017 at 12:31:08 PM, cass savy (casss...@gmail.com) wrote: AWS Backup/Restore process/tools for C*/DSE C*: Has anyone used Opscenter 6.1 backup tool to backup/restore data for larger datasets online ? If yes, did you run into issues using that tool to backup/restore data in PROD that caused any performance or any other impact to the cluster? If no, what are other tools that people have used or recommended for backup and restore of Cassandra keyspaces? Please advice.
Re: LCS, range tombstones, and eviction
Hi Stefano, Based on what I understood reading the docs, if the ratio of garbage collectable tomstones exceeds the "tombstone_threshold", C* should start compacting and evicting. If there are no other normal compaction tasks to be run, LCS will attempt to compact the sstables it estimates it will be able to drop the most tombstones from. It does this by estimating the number of tombstones an sstable has that have passed the gc grace period. Whether or not a tombstone will actually be evicted is more complicated. Even if a tombstone has passed gc grace, it can't be dropped if the data it's deleting still exists in another sstable, otherwise the data would appear to return. So, a tombstone won't be dropped if there is data for the same partition in other sstables that is older than the tombstone being evaluated for eviction. I am quite puzzled however by what might happen when dealing with range tombstones. In that case a single tombstone might actually stand for an arbitrary number of normal tombstones. In other words, do range tombstones contribute to the "tombstone_threshold"? If so, how? From what I can tell, each end of the range tombstone is counted as a single tombstone tombstone. So a range tombstone effectively contributes '2' to the count of tombstones for an sstable. I'm not 100% sure, but I haven't seen any sstable writing logic that tracks open tombstones and counts covered cells as tombstones. So, it's likely that the effect of range tombstones covering many rows are under represented in the droppable tombstone estimate. I am also a bit confused by the "tombstone_compaction_interval". If I am dealing with a big partition in LCS which is receiving new records every day, and a weekly incremental repair job continously anticompacting the data and thus creating SStables, what is the likelhood of the default interval (10 days) to be actually hit? It will be hit, but probably only in the repaired data. Once the data is marked repaired, it shouldn't be anticompacted again, and should get old enough to pass the compaction interval. That shouldn't be an issue though, because you should be running repair often enough that data is repaired before it can ever get past the gc grace period. Otherwise you'll have other problems. Also, keep in mind that tombstone eviction is a part of all compactions, it's just that occasionally a compaction is run specifically for that purpose. Finally, you probably shouldn't run incremental repair on data that is deleted. There is a design flaw in the incremental repair used in pre-4.0 of cassandra that can cause consistency issues. It can also cause a *lot* of over streaming, so you might want to take a look at how much streaming your cluster is doing with full repairs, and incremental repairs. It might actually be more efficient to run full repairs. Hope that helps, Blake On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostef...@gmail.com) wrote: Hi all, I am trying to wrap my head around how C* evicts tombstones when using LCS. Based on what I understood reading the docs, if the ratio of garbage collectable tomstones exceeds the "tombstone_threshold", C* should start compacting and evicting. I am quite puzzled however by what might happen when dealing with range tombstones. In that case a single tombstone might actually stand for an arbitrary number of normal tombstones. In other words, do range tombstones contribute to the "tombstone_threshold"? If so, how? I am also a bit confused by the "tombstone_compaction_interval". If I am dealing with a big partition in LCS which is receiving new records every day, and a weekly incremental repair job continously anticompacting the data and thus creating SStables, what is the likelhood of the default interval (10 days) to be actually hit? Hopefully somebody will be able to shed some lights here! Thanks in advance! Stefano
Re: massive spikes in read latency
That’s a good point. CPU steal time is very low, but I haven’t observed internode ping times during one of the peaks, I’ll have to check that out. Another thing I’ve noticed is that cassandra starts dropping read messages during the spikes, as reported by tpstats. This indicates that there’s too many queries for cassandra to handle. However, as I mentioned earlier, the spikes aren’t correlated to an increase in reads. On Jan 5, 2014, at 3:28 PM, Blake Eggleston bl...@shift.com wrote: Hi, I’ve been having a problem with 3 neighboring nodes in our cluster having their read latencies jump up to 9000ms - 18000ms for a few minutes (as reported by opscenter), then come back down. We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with cassandra reading and writing to 2 raided ssds. I’ve added 2 nodes to the struggling part of the cluster, and aside from the latency spikes shifting onto the new nodes, it has had no effect. I suspect that a single key that lives on the first stressed node may be being read from heavily. The spikes in latency don’t seem to be correlated to an increase in reads. The cluster’s workload is usually handling a maximum workload of 4200 reads/sec per node, with writes being significantly less, at ~200/sec per node. Usually it will be fine with this, with read latencies at around 3.5-10 ms/read, but once or twice an hour the latencies on the 3 nodes will shoot through the roof. The disks aren’t showing serious use, with read and write rates on the ssd volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra process is maintaining 1000-1100 open connections. GC logs aren’t showing any serious gc pauses. Any ideas on what might be causing this? Thanks, Blake
massive spikes in read latency
Hi, I’ve been having a problem with 3 neighboring nodes in our cluster having their read latencies jump up to 9000ms - 18000ms for a few minutes (as reported by opscenter), then come back down. We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with cassandra reading and writing to 2 raided ssds. I’ve added 2 nodes to the struggling part of the cluster, and aside from the latency spikes shifting onto the new nodes, it has had no effect. I suspect that a single key that lives on the first stressed node may be being read from heavily. The spikes in latency don’t seem to be correlated to an increase in reads. The cluster’s workload is usually handling a maximum workload of 4200 reads/sec per node, with writes being significantly less, at ~200/sec per node. Usually it will be fine with this, with read latencies at around 3.5-10 ms/read, but once or twice an hour the latencies on the 3 nodes will shoot through the roof. The disks aren’t showing serious use, with read and write rates on the ssd volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra process is maintaining 1000-1100 open connections. GC logs aren’t showing any serious gc pauses. Any ideas on what might be causing this? Thanks, Blake
Re: get all row keys of a table using CQL3
Hi Jimmy, Check out the token function: http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results You can use it to page through your rows. Blake On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote: hi, I want to fetch all the row keys of a table using CQL3: e.g select id from mytable limit 999 #1 For this query, does the node need to wait for all rows return from all other nodes before returning the data to the client(I am using astyanax) ? In other words, will this operation create a lot of load to the initial node receiving the request? #2 if my table is big, I have to make sure the limit is set to a big enough number, such that I can get all the result. Seems like I have to do a count(*) to be sure is there any alternative(always return all the rows)? #3 if my id is a timeuuid, is it better to combine the result from couple of the following cql to obtain all keys? e.g select id from mytable where id t minTimeuuid('2013-02-02 10:00+') limit 2 + select id from mytable where id t maxTimeuuid('2013-02-02 10:00+') limit 2 thanks
columns disappearing intermittently
Hi All, We're having a problem with our cassandra cluster and are at a loss as to the cause. We have what appear to be columns that disappear for a little while, then reappear. The rest of the row is returned normally during this time. This is, of course, very disturbing, and is wreaking havoc with our application. A bit more info about what's happening: We are repeatedly executing the same query against our cluster. Every so often, one of the columns will disappear from the row and will remain gone for some time. Then after continually executing the same query, the column will come back. The queries are being executed against a 3 node cluster, with a replication factor of 3, and all reads and writes are done with a quorum consistency level. We upgraded from cassandra 1.1.12 to 1.2.6 last week, but only started seeing issues this morning. Has anyone had a problem like this before, or have any idea what might be causing it?
Re: How to query secondary indexes
You're going to have a problem doing this in a single query because you're asking cassandra to select a non-contiguous set of rows. Also, to my knowledge, you can only use non equal operators on clustering keys. The best solution I could come up with would be to define you table like so: CREATE TABLE room_data ( room_id uuid, in_room int, temp float, time timestamp, PRIMARY KEY (room_id, in_room, temp)); Then run 2 queries: SELECT * FROM room_data WHERE in_room 7; SELECT * FROM room_data WHERE temp 50.0; And do an intersection on the results. I should add the disclaimer that I am relatively new to CQL, so there may be a better way to do this. Blake On Wed, Nov 28, 2012 at 10:02 AM, Oren Karmi oka...@gmail.com wrote: Hi, According to the documentation on Indexes ( http://www.datastax.com/docs/1.1/ddl/indexes ), in order to use WHERE on a column which is not part of my key, I must define a secondary index on it. However, I can only use equality comparison on it but I wish to use other comparisons methods like greater than. Let's say I have a room with people and every timestamp, I measure the temperature of the room and number of people. I use the timestamp as my key and I want to select all timestamps where temperature was over 50 degrees but I can't seem to be able to do it with a regular query even if I define that column as a secondary index. SELECT * FROM MyTable WHERE temp 50.4571; My lame workaround is to define a secondary index on NumOfPeopleInRoom and than for a specific value SELECT * FROM MyTable WHERE NumOfPeopleInRoom = 7 AND temp 50.4571; I'm pretty sure this is not the proper way for me to do this. How should I attack this? It feels like I'm missing a very basic concept. I'd appreciate it if your answers include also the option of not changing my schema. Thanks!!!