Re: Barman equivalent for Cassandra?
There is a community delivered tool named Medusa that may have what you're looking for as well - https://cassandra.tools/medusa Jonathan Lacefield e. jlacefi...@datastax.com w. www.datastax.com schedule a meeting on my calendar <https://calendar.google.com/calendar?cid=amxhY2VmaWVsZEBkYXRhc3RheC5jb20> On Fri, Mar 12, 2021 at 8:07 AM David Tinker wrote: > Hi Guys > > I need to backup my 3 node Cassandra cluster to a remote machine. Is there > a tool like Barman (really nice streaming backup tool for Postgresql) for > Cassandra? Or does everyone roll their own scripts using snapshots and so > on? > > The data is on all 3 nodes using about 900G of space on each. > > It would be difficult for me to recover even a day of lost data. An hour > might be ok. > > Thanks > David > >
Re: Adding new node to cluster
Hello, Please note that DataStax has updated the documentation for replacing a seed node. The new docs outline a simplified process to help avoid the confusion on this topic. http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Feb 17, 2015 at 8:04 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Feb 17, 2015 at 2:25 PM, sean_r_dur...@homedepot.com wrote: SimpleSnitch is not rack aware. You would want to choose seed nodes and then not change them. Seed nodes apparently don’t bootstrap. No one seems to know what a seed node actually *is*, but seed nodes can in fact bootstrap. They just have to temporarily forget to tell themselves that they are a seed node while bootstrapping, and then other nodes will still gossip to it as a seed once it comes up, even though it doesn't consider itself a seed. https://issues.apache.org/jira/browse/CASSANDRA-5836?focusedCommentId=13727032page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13727032 Replacing a seed node is a very common operation, and this best practice is confusing/poorly documented. There are regular contacts to #cassandra/cassandra-user@ where people ask how to replace a seed node, and are confused by the answer. The workaround also means that, if you do not restart your node after bootstrapping it (and changing the conf file back to indicate to itself that it is a seed) the node runs until next restart without any understanding that it is a seed node. Being a seed node appears to mean two things : 1) I have myself as an entry in my own seed list, so I know that I am a seed. 2) Other nodes have me in their seed list, so they consider me a seed. The current code checks for 1) and refuses to bootstrap. The workaround is to remove the 1) state temporarily. But if it is unsafe to bootstrap a seed node because of either 1) or 2), the workaround is unsafe. Can you explicate the special cases here? I sincerely would like to understand why the code tries to prevent a seed from bootstrapping when one can clearly, and apparently safely, bootstrap a seed. Unfortunately, there has been no answer. =Rob
Re: Re: Dynamic Columns
Hello, Peter highlighted the tradeoff between Thrift and CQL3 nicely in this case, i.e. requiring a different design approach for this solution. Collections do not sound like a good fit for your current challenge, but is there a different way to design/solve your challenge using CQL techniques? It is recommended to leverage CQL for new projects as this is the direction that Cassandra is heading and where the majority of effort is being applied from a development perspective. Sounds like you have a decision to make. Leverage Thrift and the Dynamic Column approach to solving this problem. Or, rethink the design approach and leverage CQL. Please let the mailing list know the direction you choose. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 9:46 PM, Peter Lin wool...@gmail.com wrote: the thing is, CQL only handles some types of dynamic column use cases. There's plenty of examples on datastax.com that shows how to do CQL style dynamic columns. based on what was described by Chetan, I don't feel CQL3 is a perfect fit for what he wants to do. To use CQL3, he'd have to change his approach. In my temporal database, I use both Thrift and CQL. They compliment each other very nice. I don't understand why people have to put down Thrift or pretend it supports 100% of the use cases. Lots of people who started using Cassandra pre CQL and had no problems using thrift. Yes you have to understand more and the learning curve is steeper, but taking time to learn the internals of cassandra is a good thing. Using CQL3 lists or maps, it would force the query to load the enter collection, but that is by design. To get the full power of the old style of dynamic columns, thrift is a better fit. I hope CQL continues to improve so that it supports 100% of the existing use cases. On Tue, Jan 20, 2015 at 8:50 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: I approximate dynamic columns by data_key and data_value columns. Is there a better way to get dynamic columns in CQL 3? At 2015-01-21 09:41:02, Peter Lin wool...@gmail.com wrote: I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns. On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Maybe this is the closest thing to dynamic columns in CQL 3. create table reivew ( product_id bigint, created_at timestamp, data_key text, data_tvalue text, data_ivalue int, primary key ((priduct_id, created_at), data_key) ); data_tvalue and data_ivalue is optional. At 2015-01-21 04:44:07, chetan verma chetanverm...@gmail.com wrote: Hi, Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map). CREATE TABLE review( product_id bigint, created_at timestamp, data_int maptext, int, data_text maptext, text, PRIMARY KEY (product_id, created_at) ); Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values) when we see output on cassandra-cli, it looks like in a partition as : clustering_key:data_int:map_key as column name and value as map value. suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution? On Wed, Jan 21, 2015 at 1:06 AM, chetan verma chetanverm...@gmail.com wrote: Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: - Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty - Could use a clustering column to store a columns type and another col (potentially clustering) to store the value - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n
Re: Dynamic Columns
Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634*
Re: Dynamic Columns
Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: - Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty - Could use a clustering column to store a columns type and another col (potentially clustering) to store the value - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); - where attname stores the name of the attribute/column and attvalue stores the value of that attribute - have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n - Could store multiple collections - Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634* -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634*
Re: High read latency after data volume increased
There's likely 2 things occurring 1) the cfhistograms error is due to https://issues.apache.org/jira/browse/CASSANDRA-8028 Which is resolved in 2.1.3. Looks like voting is under way for 2.1.3. As rcoli mentioned, you are running the latest open source of C* which should be treated as beta until a few dot releases are published. 2) compaction running all the time doesn't mean that compaction is caught up. It's possible that the nodes are behind in compaction which will cause slow reads. C* read performance is typically associated with disk system performance, both to service reads from disk as well as to enable fast background processing, like compaction. You mentioned raided hdds. What type of raid is configured? How fast are your disks responding? You may want to check iostat to see how large your queues and awaits are. If the await is high, then you could be experiencing disk perf issues impacting reads. Hope this helps On Jan 9, 2015, at 9:29 AM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi there, The compaction remains running with our workload. We are using SATA HDDs RAIDs. When trying to run cfhistograms on our user_data table, we are getting this message: nodetool: Unable to compute when histogram overflowed Please see what happens when running some queries on this cf: http://pastebin.com/jbAgDzVK Thanks, Roni Balthazar On Fri, Jan 9, 2015 at 12:03 PM, datastax jlacefi...@datastax.com wrote: Hello You may not be experiencing versioning issues. Do you know if compaction is keeping up with your workload? The behavior described in the subject is typically associated with compaction falling behind or having a suboptimal compaction strategy configured. What does the output of nodetool cfhistograms keyspace table look like for a table that is experiencing this issue? Also, what type of disks are you using on the nodes? Sent from my iPad On Jan 9, 2015, at 8:55 AM, Brian Tarbox briantar...@gmail.com wrote: C* seems to have more than its share of version x doesn't work, use version y type issues On Thu, Jan 8, 2015 at 2:23 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com wrote: We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ 2.1.2 in particular is known to have significant issues. You'd be better off running 2.1.1 ... =Rob -- http://about.me/BrianTarbox
Re: 100% CPU utilization, ParNew and never completing compactions
Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do you think you have a lot of tombstones in that one particular table? [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes48.20% Compaction mediamedia_tracks_raw30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new generation? Should I be manually running compaction on tables with lots of tombstones? Any suggestions or places to educate myself better on performance tuning would be appreciated. arne
Re: Better option to load data to cassandra
Here's another post which is pretty comprehensive for this topic. http://informationsurvival.blogspot.com/2014/02/cassandra-cql3-integration.html [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Thu, Nov 13, 2014 at 3:16 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Nov 12, 2014 at 5:19 PM, cass savy casss...@gmail.com wrote: Sstableloader works well for large tables if you want to move data from Cassandra to Cassandra. This works if both C* are on the same version. Sstable2json and json2sstable is another alternative. This post is getting a bit long in the tooth, but is still pretty relevant : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra =Rob http://twitter.com/rcolidba
Re: Load balancing in C* Cluster
Hello, Most drivers will handle the load balancing for you and provide policies for configuring your desired approach for load balancing, i.e. load balance around the entire ring or localize around a specific DC. Your clients will leverage the driver for connections so that the client machines do not simply select one node for data and coordination. Check out DataStax's driver's documentation on load balancing for more information. [1] Other drivers, like Astyan [2] provide similar capabilities as well. [1] http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/introduction/introArchOverview_c.html [2] https://github.com/Netflix/astyanax Thanks, Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Oct 28, 2014 at 6:38 AM, Syed, Basit B. (NSN - FI/Espoo) basit.b.s...@nsn.com wrote: Hi, I am learning C* and its usage these days. I have a very simple, possibly naive question about load balancing. I know that C* can automatically balance the load itself by using tokens. But what about connecting my cluster to a system. For exp, if we have a client or a set of clients (e.g. 12 client machines) accessing a 3-node C* cluster. All three nodes are independent and talk with each other through gossip. This means that we have three IP addresses to connect to a cluster. What should be the best strategy for clients to access these IP addresses? Should we connect four clients each to only one node? OR all 12 clients should see and connect all three nodes? Which strategy is better? Is there any resources available on web for this kind of issue? Regards, Basit
Re: Wide Rows - Data Model Design
Hello, Yes, this is a wide row table design. The first col is your Partition Key. The remaining 2 cols are clustering cols. You will receive ordered result sets based on client_name, record_date when running that query. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Fri, Sep 19, 2014 at 10:41 AM, Check Peck comptechge...@gmail.com wrote: I am trying to use wide rows concept in my data modelling design for Cassandra. We are using Cassandra 2.0.6. CREATE TABLE test_data ( test_id int, client_name text, record_data text, creation_date timestamp, last_modified_date timestamp, PRIMARY KEY (test_id, client_name, record_data) ) So I came up with above table design. Does my above table falls under the category of wide rows in Cassandra or not? And is there any problem If I have three columns in my PRIMARY KEY? I guess PARTITION KEY will be test_id right? And what about other two? In this table, we can have multiple record_data for same client_name. Query Pattern will be - select client_name, record_data from test_data where test_id = 1;
Re: horizontal query scaling issues follow on
Hello, Here is the documentation for cfhistograms, which is in microseconds. http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFhisto.html Your question about setting timeouts is subjective, but you have set your timeout limits to 4 mins, which seems excessive. The default timeout values should be appropriate for a well sized and operating cluster. Increasing timeouts to achieve stability isn't a recommended practice. You're VMs are undersized, and therefore, it is recommended that you reduce your workload or add nodes until stability is achieved. The goal of your exersize is to prove out linear scalability, correct? Then it is recommended to find the load your small nodes/cluster can handle without increasing timeout values, i.e. your cluster can remain stable. Once you found the sweet spot for load on your cluster, increase load by X% while increasing cluster size by X%. Do this for a few iterations so you can see that the processing capabilities of your cluster increases proportionally, and linearly, to the amount of load you are putting on your cluster. Note, with small VM's, you will not receive production-like performance from individual nodes. Also, what type of storage do you have under the VMs? It's not recommended to leverage shared storage. Leveraging shared storage will, more than likely, not allow you to achieve linear scalability. This is because your hardware will not be scaling linearly fully through the stack. Hope this helps Jonathan On Sun, Jul 20, 2014 at 9:12 PM, Diane Griffith dfgriff...@gmail.com wrote: I am running tests again across different number of client threads and number of nodes but this time I tweaked some of the timeouts configured for the nodes in the cluster. I was able to get better performance on the nodes at 10 client threads by upping 4 timeout values in cassandra.yaml to 24: - read_request_timeout_in_ms - range_request_timeout_in_ms - write_request_timeout_in_ms - request_timeout_in_ms I did this because of my interpretation of the cfhistograms output on one of the nodes. So 3 questions that come to mind: 1. Did I interpret the histogram information correctly in cassandra 2.0.6 nodetool output? That the 2 column read latency output is the offset or left column is the time in milliseconds and the right column is number of requests that fell into that bucket range. 2. Was it reasonable for me to boost those 4 timeouts and just those? 3. What are reasonable timeout values for smaller vm sizes (i.e. 8GB RAM, 4 CPUs)? If anyone has any insight it would be appreciated. Thanks, Diane On Fri, Jul 18, 2014 at 2:23 PM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Jul 18, 2014 at 8:01 AM, Diane Griffith dfgriff...@gmail.com wrote: Partition Size (bytes) 1109 bytes: 1800 Cell Count per Partition 8 cells: 1800 meaning I can't glean anything about how it partitioned or if it broke a key across partitions from this right? Does it mean for 1800 (the number of unique keys) that each has 8 cells? Yes, your interpretation is correct. Each of your 1800 partitions has 8 cells (taking up 1109 bytes). -- Tyler Hobbs DataStax http://datastax.com/ -- Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14
Re: keyspace with hundreds of columnfamilies
Hello There is overhead for memory with each col family. This type of configuration could cause heap issues. What is driving the requirement for so many Cfs? On Jul 2, 2014, at 4:14 AM, tommaso barbugli tbarbu...@gmail.com wrote: Hi, Are there any known issues, shortcomings about organising data in hundreds of column families? At this present I am running with 300 column families but I expect that to get to a couple of thousands. Is this something discouraged / unsupported (I am using Cassandra 2.0). Thanks Tommaso
Re: restarting node makes cpu load of the entire cluster to raise
Hello Alain, I'm not sure of the root cause of this item. It may be helpful to use DEBUG and start the node to see what's happening as well as watch compaction stats or tpstats to understand what is taxing your system. The log file you provided shows a large ParNew while replaying commit log segments. Does your app insert very large rows or have individual columns that are large? I quickly reviewed Changes/txt https://github.com/apache/cassandra/blob/cassandra-1.2/CHANGES.txt to see if anything jumps out as a culprit, but didn't spot anything. Sorry i can't be of more help with this one. It may take some hands-on investigation or maybe someone else in the community has experienced this issue and can provide feedback. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Wed, Jun 18, 2014 at 3:07 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 18, 2014 at 5:36 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: We stop the node using : nodetool disablegossip nodetool disablethrift nodetool disablebinary sleep 10 nodetool drain sleep 30 service cassandra stop The stuff before nodetool drain here is redundant and doesn't actually do what you are expecting it to do. https://issues.apache.org/jira/browse/CASSANDRA-4162 =Rob
Re: restarting node makes cpu load of the entire cluster to raise
Hello Have you checked the log file to see what's happening during startup ? What caused the rolling restart? Did you preform an upgrade or change a config? On Jun 18, 2014, at 5:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi guys Using 1.2.11, when I try to rolling restart the cluster, any node I restart makes the whole cluster cpu load to increase, reaching a red state in opscenter (load from 3-4 to 20+). This happens once the node is back online. The restarted node uses 100 % cpu for 5 - 10 min and sometimes drop mutations. I have tried to throttle handoff to 256 (instead of 1024), yet it doesn't seems to help that much. Disks are not the bottleneck. PARNEW GC increase a bit, but nothing problematic I think. Basically, what could be happening on node restart ? What is taking that much CPU on every machine ? There is no steal or iowait. What can I try to tune ?
Re: Configuring all nodes as seeds
Hello, What Artur is alluding to is that seed nodes do not bootstrap. Replacing seed nodes requires a slightly different approach for node replacement compared to non seed nodes. See here for more details: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html Take this into consideration, coupled with the fact that nodes will require replacing along the way, when determining the right number of seeds to use per cluster. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Wed, Jun 18, 2014 at 4:59 AM, Artur Kronenberg artur.kronenb...@openmarket.com wrote: Hi, pretty sure we started out like that and had not seen any problems doing that. On a side node, that config may become inconsistent anyway after adding new nodes, because I think you'll need a restart of all your nodes if you add new seeds to the yaml file. (Though that's just assumption) On 18/06/14 09:09, Peer, Oded wrote: My intended Cassandra cluster will have 15 nodes per DC, with 2 DCs. I am considering using all the nodes as seed nodes. It looks like having all the nodes as seeds should actually reduce the Gossip overhead (See “Gossiper implementation” in http://wiki.apache.org/cassandra/ArchitectureGossip) Is there any reason not do this?
Re: restarting node makes cpu load of the entire cluster to raise
There are several long Parnew pauses that were recorded during startup. The young gen size looks large too, if I am reading that line correctly. Did you happen to overwrite the default settings for MAX_HEAP and/or NEW size in the cassandra-env.sh? The large you gen size, set via the env.sh file, could be causing longer than typical pauses, which could make your node appear to be unresponsive and have high CPU (CPU for the ParNew GC event). Check out this one - INFO 11:42:51,939 GC for ParNew: 2148 ms for 2 collections, 1256307568 used; max is 8422162432 That is a 2 second GC pause. That's very high for ParNew. We typically want a lot of tiny ParNew events as opposed to large, and less frequent, ParNew events. One other thing that was noticed, was that the node had a lot of log segment replay's during startup. You could avoid these, or minimize them, by preforming a flush or drain before stopping and starting Cassandra. This will flush memtables and clear your log segments. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Wed, Jun 18, 2014 at 8:05 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: A simple restart of a node with no changes give this result. logs output : https://gist.github.com/arodrime/db9ab152071d1ad39f26 Here are some screenshot: - htop from a node immediatly after restarting - opscenter ring view (show load cpu on all nodes) - opscenter dashboard shows the impact of a restart on latency (can affect writes or reads, it depends, reaction seems to be quite random) 2014-06-18 13:35 GMT+02:00 Jonathan Lacefield jlacefi...@datastax.com: Hello Have you checked the log file to see what's happening during startup ? What caused the rolling restart? Did you preform an upgrade or change a config? On Jun 18, 2014, at 5:40 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi guys Using 1.2.11, when I try to rolling restart the cluster, any node I restart makes the whole cluster cpu load to increase, reaching a red state in opscenter (load from 3-4 to 20+). This happens once the node is back online. The restarted node uses 100 % cpu for 5 - 10 min and sometimes drop mutations. I have tried to throttle handoff to 256 (instead of 1024), yet it doesn't seems to help that much. Disks are not the bottleneck. PARNEW GC increase a bit, but nothing problematic I think. Basically, what could be happening on node restart ? What is taking that much CPU on every machine ? There is no steal or iowait. What can I try to tune ?
Re: Cannot query secondary index
Hello, What you are attempting to do, reminds me of the old sliding window partitioning trick in rdbms systems. You're right, there is no system provided tool that allows you to preform a similar operation. You could always leverage option 3, and then create a service that helps manage the effort of the manual delete. However, you would still have to insert into this separate table per the index item. The cost of the every once in a while delete may be infrequent enough for you to do what you were actually trying to do in the first place, use a secondary index and query the table leveraging the ALLOW FILTERING clause. My recommendation would be to: 1) leverage TTLs 2) see what type of load your original plan would put on your system, and if it's acceptable and or you have a down time to execute this costly operation, go that route when TTLs aren't keeping up with your load. You have such a special use case for this functionality that the little, in frequent, performance hit outweighs the complexity of implementing options 1 and 3. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Tue, Jun 10, 2014 at 1:39 PM, Redmumba redmu...@gmail.com wrote: Honestly, this has been by far my single biggest obstacle with Cassandra for time-based data--cleaning up the old data when the deletion criteria (i.e., date) isn't the primary key. I've asked about a few different approaches, but I haven't really seen any feasible options that can be implemented easily. I've seen the following: 1. Use date-based tables, then drop old tables, ala audit_table_20140610, audit_table_20140609, etc.. But then I run into the issue of having to query every table--I would have to execute queries against every day to get the data, and then merge the data myself. Unless, there's something in the binary driver I'm missing, it doesn't sound like this would be practical. 2. Use a TTL But then I have to basically decide on a value that works for everything and, if it ever turns out I overestimated, I'm basically SOL, because my cluster will be out of space. 3. Maintain a separate index of days to keys, and use this index as the reference for which keys to delete. But then this requires maintaining another index and a relatively manual delete. I can't help but feel that I am just way over-engineering this, or that I'm missing something basic in my data model. Except for the last approach, I can't help but feel that I'm overlooking something obvious. Andrew Of course, Jonathan, I'll do my best! It's an auditing table that, right now, uses a primary key consisting of a combination of a combined partition id of the region and the object id, the date, and the process ID. Each event in our system will create anywhere from 1-20 rows, for example, and multiple parts of the system might be working on the same object ID. So the CF is constantly being appended to, but reads are rare. CREATE TABLE audit ( id bigint, region ascii, date timestamp, pid int, PRIMARY KEY ((id, region), date, pid) ); Data is queried on a specific object ID and region. Optionally, users can restrict their query to a specific date range, which the above data model provides. However, we generate quite a bit of data, and we want a convenient way to get rid of the oldest data. Since our system scales with the time of year, we might get 50GB a day during peak, and 5GB of data off peak. We could pick the safest number--let's say, 30 days--and set the TTL using that. The problem there is that, most of the year, we'll be using a very small percentage of our available space 90% of the year. What I'd like to be able to do is drop old tables as needed--i.e., let's say when we hit 80% load across the cluster (or some such metric that takes the cluster-wide load into account), I want to drop the oldest day's records until we're under 80%. That way, we're always using the maximum amount of space we can, without having to worry about getting to the point where we run out of space cluster-wide. My thoughts are--we could always make the date part of the primary key, but then we'd either a) have to query the entire range of dates, or b) we'd have to force a small date range when querying. What are the penalties? Do you have any other suggestions? On Mon, Jun 9, 2014 at 5:15 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Will you please describe the use case and what you are trying to model. What are some questions/queries that you would like to serve via Cassandra. This will help the community help you a little better. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Jun 9
Re: Cannot query secondary index
Hello, You are receiving this item because you are not passing in the Partition Key as part of your query. Cassandra is telling you it doesn't know which node to find the data and you haven't explicitly told it to search across all your nodes for the data. The ALLOW FILTERING clause bypasses the need to pass in a partition key in your query. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Big picture, for data modeling in Cassandra, it's advisable to model your data based on the query access patterns and to duplicate data into tables that represent your query. In this case, creating a table with a Partition Key of date, could benefit you. Heavy use of ALLOW FILTERING could cause performance issues within your cluster. Also, please be aware that Secondary Indexes are much different in Cassandra-land compared to indexes in RDBMS-land. They should be used only when necessary, i.e. an explicit use case. Typically, modeling your data so you can avoid Secondary Indexes will ensure a well preforming system and queries. Here's a good intro to Cassandra data modeling: https://www.youtube.com/watch?v=HdJlsOZVGwM Hope this helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Jun 9, 2014 at 5:18 PM, Redmumba redmu...@gmail.com wrote: I have a table with a timestamp column on it; however, when I try to query based on it, it fails saying that I must use ALLOW FILTERING--which to me, means its not using the secondary index. Table definition is (snipping out irrelevant parts)... CREATE TABLE audit ( id bigint, date timestamp, ... PRIMARY KEY (id, date) ); CREATE INDEX date_idx ON audit (date); There are other fields, but they are not relevant to this example. The date is part of the primary key, and I have a secondary index on it. When I run a SELECT against it, I get an error: cqlsh SELECT * FROM asinauditing.asinaudit WHERE date '2014-05-01'; Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING cqlsh SELECT * FROM asinauditing.asinaudit WHERE date '2014-05-01' ALLOW FILTERING; Request did not complete within rpc_timeout. How can I force it to use the index? I've seen rebuild_index tasks running, but can I verify the health of the index?
Re: Cannot query secondary index
Hello, Will you please describe the use case and what you are trying to model. What are some questions/queries that you would like to serve via Cassandra. This will help the community help you a little better. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Jun 9, 2014 at 7:51 PM, Redmumba redmu...@gmail.com wrote: I've been trying to work around using date-based tables because I'd like to avoid the overhead. It seems, however, that this is just not going to work. So here's a question--for these date-based tables (i.e., a table per day/week/month/whatever), how are they queried? If I keep 60 days worth of auditing data, for example, I'd need to query all 60 tables--can I do that smoothly? Or do I have to have 60 different select statements? Is there a way for me to run the same query against all the tables? On Mon, Jun 9, 2014 at 3:42 PM, Redmumba redmu...@gmail.com wrote: Ah, so the secondary indices are really secondary against the primary key. That makes sense. I'm beginning to see why the whole date-based table approach is the only one I've been able to find... thanks for the quick responses, guys! On Mon, Jun 9, 2014 at 2:45 PM, Michal Michalski michal.michal...@boxever.com wrote: Secondary indexes internally are just CFs that map the indexed value to a row key which that value belongs to, so you can only query these indexes using =, not , = etc. However, your query does not require index *IF* you provide a row key - you can use or like you did for the date column, as long as you refer to a single row. However, if you don't provide it, it's not going to work. M. Kind regards, Michał Michalski, michal.michal...@boxever.com On 9 June 2014 21:18, Redmumba redmu...@gmail.com wrote: I have a table with a timestamp column on it; however, when I try to query based on it, it fails saying that I must use ALLOW FILTERING--which to me, means its not using the secondary index. Table definition is (snipping out irrelevant parts)... CREATE TABLE audit ( id bigint, date timestamp, ... PRIMARY KEY (id, date) ); CREATE INDEX date_idx ON audit (date); There are other fields, but they are not relevant to this example. The date is part of the primary key, and I have a secondary index on it. When I run a SELECT against it, I get an error: cqlsh SELECT * FROM asinauditing.asinaudit WHERE date '2014-05-01'; Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING cqlsh SELECT * FROM asinauditing.asinaudit WHERE date '2014-05-01' ALLOW FILTERING; Request did not complete within rpc_timeout. How can I force it to use the index? I've seen rebuild_index tasks running, but can I verify the health of the index?
Re: Query first 1 columns for each partitioning keys in CQL?
Hello, Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3? These may help you achieve your goals. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.com wrote: Hi, I'm modeling some queries in CQL3. I'd like to query first 1 columns for each partitioning keys in CQL3. For example: create table posts( author ascii, created_at timeuuid, entry text, primary key(author,created_at) ); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); And I want results like below. mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john I think that this is what SELECT FIRST statements did in CQL2. The only way I came across in CQL3 is retrieve whole records and drop manually, but it's obviously not efficient. Could you please tell me more straightforward way in CQL3?
Re: cassandra snapshots
What version of Cassandra are you using? This could be from snapshot repairs in newer versions of Cassandra. CASSANDRA-5950https://issues.apache.org/jira/browse/CASSANDRA-5950 Also, check out the snapshot settings, other than incremental, in the .yaml file. There are several snapshot configurations which could have been set for your cluster. http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, May 5, 2014 at 3:48 PM, Batranut Bogdan batra...@yahoo.com wrote: Hello all I have a big col family and I see that cassandra is taking snapshots for it. I do not have incremental enabled. What are the triggers that start the process of taking a snapshot? Is is automatic ? Thanks
Re: row caching for frequently updated column
Hello, Iirc writing a new value to a row will invalidate the row cache for that value. Row cache is only populated after a read operation. http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configuring_caches_c.html?scroll=concept_ds_n35_nnr_ck Cassandra provides the ability to preheat key and page cache, but I don't believe this is possible for row cache. Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I am wondering if there is any negative impact on Cassandra write operation, if I turn on row caching for a table that has mostly 'static columns' but few frequently write columns (like timestamp). The application will frequently write to a few columns, and the application will also frequently query entire row. How Cassandra handle update column to a cached row? does it update both memtables value and also the row cached row's column(which dealing with memory update so it is very fast) ? or in order to update the cached row, entire row need to read back from sstable? thanks
Re: How safe is nodetool move in 1.2 ?
Assuming you have enough nodes not undergoing move to meet your CL requirements, then yes, your cluster will still accept reads and writes. However, it's always good to test this before doing it in production to ensure your cluster and app will function as designed. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Wed, Apr 16, 2014 at 7:57 AM, Oleg Dulin oleg.du...@gmail.com wrote: I need to rebalance my cluster. I am sure this question has been asked before -- will 1.2 continue to serve reads and writes correctly while move is in progress ? Need this for my sanity. -- Regards, Oleg Dulin http://www.olegdulin.com
Re: List and Cancel running queries
No. This is not possible today On Apr 11, 2014, at 1:19 AM, Richard Jennings richardjenni...@gmail.com wrote: Is it possible to list all running queries on a Cassandra cluster ? Is it possible to cancel a running query on a Cassandra cluster? Regards
Re: Point in Time Recovery
Hello, Have you tried the procedure documented here: http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configLogArchive_t.html Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Thu, Apr 10, 2014 at 1:19 AM, Dennis Schwan dennis.sch...@1und1.dewrote: Hey there, do you know any description how to perform a point-in-time recovery using the archived commitlogs? We have already tried several things but it just did not work. We have a 20 Node Cluster (10 in each DC). Thanks in Advance, Dennis -- Dennis Schwan Oracle DBA Mail Core 11 Internet AG | Brauerstraße 48 | 76135 Karlsruhe | Germany Phone: +49 721 91374-8738 E-Mail: dennis.sch...@1und1.de | Web: www.1und1.de Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484 Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, Christian Würst Aufsichtsratsvorsitzender: Michael Scheeren Member of United Internet Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten Sie diese Email. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. This E-Mail may contain confidential and/or privileged information. If you are not the intended recipient of this E-Mail, you are hereby notified that saving, distribution or use of the content of this E-Mail in any way is prohibited. If you have received this E-Mail in error, please notify the sender and delete the E-Mail.
Re: Apache cassandra not joining cluster ring
Hello The nodetool status that you mentioned, was that executed on the 4th node itself? Also What does netstat display? Are the correct ports listening on that node? Per opscenter, What version of opscenter are you using? Are you able to manually start the agents on the nodes themselves? On Apr 9, 2014, at 6:57 AM, Joyabrata Das joy.luv.challen...@gmail.com wrote: Hello All, Kindly help with below issues, I'm really stuck here. Thanks, Joy On 8 April 2014 21:55, Joyabrata Das joy.luv.challen...@gmail.com wrote: Hello, I've a four node apache cassandra community 1.2 cluster in single datacenter with a seed. All configurations are similar in cassandra.yaml file. The following issues are faced, please help. 1] Though fourth node isn't listed in nodetool ring or status command, system.log displayed only this node isn't communicating via gossip protoccol with other nodes. However both jmx telnet port is enabled with proper listen/seed address configured. 2] Though Opscenter is able to recognize all four nodes, the agents are not getting installed from opscenter. However same JVM version is installed as well as JAVA_HOME is also set in all four nodes. Further observed that problematic node has Ubuntu 64-Bit other nodes are Ubuntu 32-Bit, can it be the reason? Thanks, Joy
Re: Per-keyspace partitioners?
Hello, Partitioner is per cluster. We have seen users create separate clusters for items like this, but that's an edge case. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Wed, Apr 9, 2014 at 11:57 AM, Clint Kelly clint.ke...@gmail.com wrote: Hi everyone, Is there a way to change the partitioner on a per-table or per-keyspace basis? We have some tables for which we'd like to enable ordered scans of rows, so we'd like to use the ByteOrdered partitioner for those, but use Murmur3 for everything else in our cluster. Is this possible? Or does the partitioner have to be the same for the entire cluster? Best regards, Clint
Re: Auto-Bootstrap not Auto-Bootstrapping?
Hello Not sure I follow the auto bootstrap question, but seeds are only used on startup. Also, what do you mean by convert the node to a seed node? You could simply add the 4th node IP address to the seed list of the other nodes in the .yaml file. Hope that helps Jonathan On Apr 7, 2014, at 9:35 PM, Greg Bone gbon...@gmail.com wrote: If seed nodes do not auto bootstrap, what is the procedure for replacing a node in a three node cluster, with all of them identified as seed nodes? Here's what I am thinking: 1) Add a 4th node to the cluster which is not a seed node 2) Decommission one of the seed nodes when data finished streaming to new node 3) convert newly added 4th node to a seed node by updating the cassandra.yaml file. Keith Wright kwright at nanigans.com writes:
Re: Question about how compaction and partition keys interact
Hello, Compaction strategy, leveled vs. sized tier, will impact the amount of compaction that occurs, i.e. compaction time, more than the two data model options. Check out this blog for more information on the types of compaction strategy - http://www.datastax.com/dev/blog/when-to-use-leveled-compaction My recommendation would be to choose the right compaction strategy based on the article listed above and right model based on your query access pattern, as opposed to trying to figure out which model would be more advantageous for compaction. If your system can handle the disk i/o, has ssd's, then leveraging Leveled compaction + Wide rows should give you the best read performance. Your queries will be able to be satisfied by a lot fewer disk seeks because of the wide row access pattern and advantages of Leveled compaction. Per your specific question about which model is going to be better for compaction, I don't know the answer for that question. I would think that the merge sorting of more, smaller records could be somewhat better for compaction but have no way to quantify that for you. However, the wide row scenario sounds like it will provide a significant advantage to your query access times. Hope that helps and if not, maybe someone else can provide the answer to your specific question regarding the impacts of your model on compaction. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Wed, Mar 26, 2014 at 2:54 PM, Donald Smith donald.sm...@audiencescience.com wrote: My underlying question is about the effects of the partitioning key on compaction. Specifically, would having date as part of the partitioning key make compaction easier (because compaction wouldn't have to merge wide rows over multiple days)? According to the person on irc, it wouldn't make much difference. We care mostly about read times. If read times were *all* we cared about, we'd use a CQL primary key of *((customer_id,type) date)*, especially since it lets us efficiently iterate over all dates for a given customer and type. I also care about compaction time, and if the other primary key form decreased compaction time, I might go for it. We have terabytes of data. I don't think we ever have to query all types for a given customer or date. That is, we are always given a specific customer and type, plus usually but not always a date. Thanks, Don *From:* Jonathan Lacefield [mailto:jlacefi...@datastax.com] *Sent:* Wednesday, March 26, 2014 11:20 AM *To:* user@cassandra.apache.org *Subject:* Re: Question about how compaction and partition keys interact Don, What is the underlying question? Are trying to figure out what's going to be faster for reads or are you really concerned about storage? The recommendation typically provided is to suggest that tables are modeled based on query access, to enable the fastest read performance. In your example, will your app's queries look for 1) customer interactions by type by day, with the ability to - sort by day within a type - grab ranges of dates for at type quickly - or pull all dates (and cell data) for a type or 2) customer interactions by date by type, with the ability to - sort by type within a date - grab ranges of types for a date quickly - or pull all types data for a date We also typically recommend that partitions stay within ~100k of columns or ~100MB per partition. With your first scenario, wide row, you wouldn't hit the number of columns for ~273 years :) What's interesting in your modeling scenario is that, with the current options, you don't have the ability to easily pull all dates for a customer without specifying the type, specific dates, or using ALLOW FILTERING. Did you ever consider partitioning simply on customer and using date and type as clustering keys? Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 [image: Image removed by sender.] http://www.linkedin.com/in/jlacefield [image: Image removed by sender.]http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Wed, Mar 26, 2014 at 1:22 PM, Donald Smith donald.sm...@audiencescience.com wrote: In CQL we need to decide between using *((customer_id,type),date) *as the CQL primary key for a reporting table, versus *((customer_id,date),type)*. We store reports for every day. If we use *(customer_id,type)* as the partition key (physical key), then we have a WIDE ROW where each date's data is stored in a different column. Over time, as new reports are added for different dates, the row will get wider and wider, and I thought that might cause more work
Re: Question about how compaction and partition keys interact
Don, What is the underlying question? Are trying to figure out what's going to be faster for reads or are you really concerned about storage? The recommendation typically provided is to suggest that tables are modeled based on query access, to enable the fastest read performance. In your example, will your app's queries look for 1) customer interactions by type by day, with the ability to - sort by day within a type - grab ranges of dates for at type quickly - or pull all dates (and cell data) for a type or 2) customer interactions by date by type, with the ability to - sort by type within a date - grab ranges of types for a date quickly - or pull all types data for a date We also typically recommend that partitions stay within ~100k of columns or ~100MB per partition. With your first scenario, wide row, you wouldn't hit the number of columns for ~273 years :) What's interesting in your modeling scenario is that, with the current options, you don't have the ability to easily pull all dates for a customer without specifying the type, specific dates, or using ALLOW FILTERING. Did you ever consider partitioning simply on customer and using date and type as clustering keys? Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Wed, Mar 26, 2014 at 1:22 PM, Donald Smith donald.sm...@audiencescience.com wrote: In CQL we need to decide between using *((customer_id,type),date) *as the CQL primary key for a reporting table, versus *((customer_id,date),type)*. We store reports for every day. If we use *(customer_id,type)* as the partition key (physical key), then we have a WIDE ROW where each date's data is stored in a different column. Over time, as new reports are added for different dates, the row will get wider and wider, and I thought that might cause more work for compaction. So, would a partition key of *(customer_id,date)* yield better compaction behavior? Again, if we use *(customer_id,type)* as the partition key, then over time, as new columns are added to that row for different dates, I'd think that compaction would have to merge new data for a given physical row from multiple sstables. That would make compaction expensive. But if we use *(customer_id,date)* as the partition key, then new data will be added to *new physical rows*, and so compaction would have less work to do My question is really about how compaction interacts with partition keys. Someone on the Cassandra irc channel, http://webchat.freenode.net/?channels=#cassandra, said that when partition keys overlap between sstables, there's only slightly more work to do than when they don't, for merging sstables in compaction. So he thought the first form, * ((customer_id,type),date), * would be better. One advantage of the first form,* ((customer_id,type),date) , * is that we can get all report data for all dates for a given customer and type in a single wide row -- and we do have a (uncommon) use case for such reports. If we used a primary key of *((customer_id,type,date))*, then the rows would be un-wide; that wouldn't take advantage of clustering columns and (like the second form) wouldn't support the (uncommon) use case mentioned in the previous paragraph. Thanks, Don *Donald A. Smith* | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.com [image: AudienceScience] inline: image001.jpg
Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram
Sorry to hear about the frustration. How often are you deleting data/what are you setting for ttl on cols? Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Tue, Mar 25, 2014 at 4:22 PM, Oleg Dulin oleg.du...@gmail.com wrote: Sigh, so I am back to where I started from... I did lower gc_grace... jmap -histo:live shows heap is stuffed with DeletedColumn and ExpiringColumn This is extremely frustrating. On 2014-03-11 19:24:50 +, Oleg Dulin said: Good news is that since I lowered gc_grace period it collected over 100Gigs of tombstones and seems much happier now. Oleg On 2014-03-10 13:33:43 +, Jonathan Lacefield said: Hello, You have several options: 1) going forward lower gc_grace_seconds http://www.datastax.com/ documentation/cassandra/1.2/cassandra/configuration/ configStorage_r.html?pagename=docsversion=1.2file= configuration/storage_configuration#gc-grace-seconds - this is very use case specific. Default is 10 days. Some users will put this at 0 for specific use cases. 2) you could also lower tombstone compaction threshold and interval to get tombstone compaction to fire more often on your tables/cfs: https://datastax.jira.com/wiki/pages/viewpage.action?pageId=54493436 3) to clean out old tombstones you could always run a manual compaction, those these aren't typically recommended though: http://www.datastax.com/documentation/cassandra/1.2/ cassandra/tools/toolsNodetool_r.html For 1 and 2, be sure your disks can keep up with compaction to ensure tombstone, or other, compaction fires regularly enough to clean out old tombstones. Also, you probably want to ensure you are using Level Compaction: http://www.datastax.com/dev/blog/when-to-use-leveled- compaction. Again, this assumes your disk system can handle the increased io from Leveled Compaction. Also, you may be running into this with the older version of Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-6541 Hope this helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 image image On Mon, Mar 10, 2014 at 6:41 AM, Oleg Dulin oleg.du...@gmail.com wrote: I get that :) What I'd like to know is how to fix that :) On 2014-03-09 20:24:54 +, Takenori Sato said: You have millions of org.apache.cassandra.db.DeletedColumn instances on the snapshot. This means you have lots of column tombstones, and I guess, which are read into memory by slice query. On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin oleg.du...@gmail.com wrote: I am trying to understand why one of my nodes keeps full GC. I have Xmx set to 8gigs, memtable total size is 2 gigs. Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ -- Regards, Oleg Dulin http://www.olegdulin.com -- Regards, Oleg Dulin http://www.olegdulin.com S -- Regards, Oleg Dulin http://www.olegdulin.com
Re: No output.log is ever generated
Hello, Here are a few questions to help guide your troubleshooting efforts: Do you have a cassandra system log file? Did you use a packaged or binary installation? What user are you using to start Cassandra? Does the user have permissions to the log directory? Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Mar 24, 2014 at 8:26 AM, user 01 user...@gmail.com wrote: Hints please, anyone ? On Mon, Mar 24, 2014 at 2:13 AM, user 01 user...@gmail.com wrote: No output.log is ever generated by my cassandra installation(DSC20 with C* 2.0.6 on Ubuntu 12.04). Do I need to configure anything to enable logs to output.log ?
Re: Relation between Atomic Batches and Consistency Level
Okay your question is clear to me know. My understanding, after talking this through with some of the engineers here, is that we have 2 levels of success with batches: 1) Did the batch make it to the batch log table? [yes or no] - yes = success - no = not success 2) Did each statement in the batch succeed? [yes or no] - yes = success - no = not success - the case you are interested in. If 1 and 2 are both successful - you will receive a success message if 1 is successful but 2 is not successful (your case) - you will receive a message stating the batch succeeded but not all replicas are live yet - in this case, the batch will be retried by Cassandra. This is the target scenario for atomic batches (to take the burden off of the client app to monitor, maintain, and retry batches) - i am going to test this, was shooting for last night but didn't get to it, to see what actually happens inside the batch - you could test this scenario with a trace to see what occurs (i.e. if statement 1 fails is statement 2 tried) if 1 is not successful then the batch fails - this is because it couldn't make it to the batchlog table for execution Hope this helps. I believe this is the best i can do for you at the moment. Thanks, Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Mar 17, 2014 at 4:05 PM, Drew Kutcharian d...@venarc.com wrote: I have read that blog post which actually was the source of the initial confusion ;) If I write normally (no batch) at Quorum, then a hinted write wouldn't count as a valid write so the write wouldn't succeed, which means I would have to retry. That's a pretty well defined outcome. Now if I write a logged batch at Quorum, then a by definition, a hinted write shouldn't be considered a valid response, no? - Drew On Mar 17, 2014, at 11:23 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you seen this blog post, it's old but still relevant. I think it will answer your questions. http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2. I think the answer lies in how Cassandra defines a batch In the context of a Cassandra batch operation, atomic means that if any of the batchsucceeds, all of it will. My understanding is that in your scenario if either statement succeeded, you batch would succeed. So #1 would get hinted and #2 would be applied, assuming no other failure events occur, like the coordinator fails, the client fails, etc. Hope that helps. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Mar 17, 2014 at 1:38 PM, Drew Kutcharian d...@venarc.com wrote: Hi Jonathan, I'm still a bit unclear on this. Say I have two CQL3 tables: - user (replication of 3) - user_email_index (replication of 3) Now I create a new logged batch at quorum consistency level and put two inserts in there: #1 Insert into the user table with partition key of a timeuuid of the user #2 Insert into the user_email_index with partition key of user's email address As you can see, there is a chance that these two insert statements will be executed on two different nodes because they are keyed by different partition keys. So based on the docs for Logged Batches, a batch will be applied eventually in an all or nothing fashion. So my question is, what happens if insert #1 fails (say replicas are unavailable), would insert #2 get applied? Would the whole thing be rejected and return an error to the client? PS. I'm aware of the isolation guarantees and that's not an issue. All I need to make sure is that if the first the statement failed, the whole batch needs to fail. Thanks, Drew On Mar 17, 2014, at 5:33 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Consistency is declared at the statement level, i.e. batch level when writing, but enforced at each batch row level. My understanding is that each batch (and all of it's contents) will be controlled through a specific CL declaration. So batch A could use a CL of QUORUM while batch B could use a CL of ONE. The detail that may help sort this out for you is that batch statements do not provide isolation guarantees: www.datastax.com/documentation/cql/3.0/cql/cql_reference/batch_r.html. This means that you write the batch as a batch but the reads are per row. If you are reading records contained in the batch, you will read results of partially updated batches. Taking this into account for your second question, you should expect that your read CL will preform as it would for any individual row mutation. Hope this helps. Jonathan Jonathan Lacefield Solutions
Re: How to extract information from commit log?
Hello, Is this a one time investigative item or are you looking to set something up to do this continuously? Don't recommend trying to read the commit log. You can always use the WRITETIME function in CQL or look within SSTables via the SStable2Json utility to see write times for particular versions of partitions. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Tue, Mar 18, 2014 at 2:25 PM, Han,Meng meng...@ufl.edu wrote: Hi Cassandra hackers! I have a question regarding extracting useful information from commit log. Since its a binary log, how should I extract information such as timestamp, values from it? Does anyone know any binary log reader that I can use directly to read commit log? If there is no such reader, could someone give me some advice hwo I can wrote such a reader? Particularly, I want to know the order that write operations happens at each replica(cassandra server node) along with their timestamps, Does anyone know other methods how I can get this information without instrumenting Cassandra code? Any help is appreciated! Cheers, Meng
Re: Relation between Atomic Batches and Consistency Level
Hello, Consistency is declared at the statement level, i.e. batch level when writing, but enforced at each batch row level. My understanding is that each batch (and all of it's contents) will be controlled through a specific CL declaration. So batch A could use a CL of QUORUM while batch B could use a CL of ONE. The detail that may help sort this out for you is that batch statements do not provide isolation guarantees: www.datastax.com/documentation/cql/3.0/cql/cql_reference/batch_r.html. This means that you write the batch as a batch but the reads are per row. If you are reading records contained in the batch, you will read results of partially updated batches. Taking this into account for your second question, you should expect that your read CL will preform as it would for any individual row mutation. Hope this helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Sat, Mar 15, 2014 at 12:23 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, How do Atomic Batches and Consistency Level relate to each other? More specifically: - Is consistency level set/applicable per statement in the batch or the batch as a whole? - Say if I write a Logged Batch at QUORUM and read it back at QUORUM, what can I expect at normal, single node replica failure or double node replica failure scenarios? Thanks, Drew
Re: Multi-site Active-Active replication - Preparing Sites - Cluster Name and Snitch
Hello, Please see comments under your 1) Use GossipingPropertyFileSnitc: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architectureSnitchGossipPF_c.html - much easier to manage 2) All nodes in the same cluster must have the same cluster name: http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html 3) Run repair at the very end if you would like, rebuild should take care of this for you. No need to do it when you are going from Simple (with 1 DC) to Network (with 1 dc). Not sure you need to do step 2 actually. 4) Yes, all Keyspaces should be updated as a part of this process. Hope that helps. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Sun, Mar 16, 2014 at 10:39 PM, Matthew Allen matthew.j.al...@gmail.comwrote: Hi all, New to this list, so apologies in advance if I in inadvertently break some of the guidelines. We currently have 2 geographically separate Cassandra/Application clusters (running in active/warm-standby mode), that I am looking to enable replication between so that we can have an active/active configuration. I've got the process working in our Labs, using http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_add_dc_to_cluster_t.htmlas a guide, but still have many questions (to verify that what I have done is correct), so I'm trying to break down my questions into various emails. Our Setup --- - Our replication factor is currently set to 5 in both sites (NSW and VIC). Each site has 9 nodes. - We use a read/write quorum of ONE - We have autoNodeDiscovery set to off in our app ( in anticipation of multi-site replication), so that it only points to its local Cassandra cluster - The 2 sites have a 16-20ms latency The Plan - 1. Update and restart each node in active Cluster (NSW) 1 at a time to get it to use NetworkTopologySnitch in preparation of addition of standby cluster. - update cassandra-topologies.yaml file with settings as below so NSW Cluster is aware of NSW only - update cassandra.yaml to use PropertyFileSnitch - restart node # Cassandra Node IP=Data Center:Rack xxx.yy.zzz.144=DC_NSW:rack1 xxx.yy.zzz.145=DC_NSW:rack1 xxx.yy.zzz.146=DC_NSW:rack1 xxx.yy.zzz.147=DC_NSW:rack1 xxx.yy.zzz.148=DC_NSW:rack1 ... and so forth for 9 nodes 2. Update App Keyspace to use NetworkTopologySnitch with {'DC_NSW':5} 3. Stop and blow away the standby cluster (VIC) and start afresh, - assign new tokens NSW+100 - set auto_bootstrap: false - update seeds to point to mixture of VIC and NSW nodes. - update cassandra-topologies.yaml file with below so VIC Cluster is aware of VIC and NSW. - Leave cassandra cluster down # Cassandra Node IP=Data Center:Rack xxx.yy.zzz.144=DC_NSW:rack1 xxx.yy.zzz.145=DC_NSW:rack1 xxx.yy.zzz.146=DC_NSW:rack1 xxx.yy.zzz.147=DC_NSW:rack1 xxx.yy.zzz.148=DC_NSW:rack1 ... and so forth for 9 nodes aaa.bb.ccc.144=DC_VIC:rack1 aaa.bb.ccc.145=DC_VIC:rack1 aaa.bb.ccc.146=DC_VIC:rack1 aaa.bb.ccc.147=DC_VIC:rack1 aaa.bb.ccc.148=DC_VIC:rack1 ... and so forth for 9 nodes 4. Update each node in active Cluster (NSW) 1 at a time. - update cassandra-topologies.yaml file with settings as below so NSW Cluster is aware of VIC and NSW. # Cassandra Node IP=Data Center:Rack xxx.yy.zzz.144=DC_NSW:rack1 xxx.yy.zzz.145=DC_NSW:rack1 xxx.yy.zzz.146=DC_NSW:rack1 xxx.yy.zzz.147=DC_NSW:rack1 xxx.yy.zzz.148=DC_NSW:rack1 ... and so forth for 9 nodes aaa.bb.ccc.144=DC_VIC:rack1 aaa.bb.ccc.145=DC_VIC:rack1 aaa.bb.ccc.146=DC_VIC:rack1 aaa.bb.ccc.147=DC_VIC:rack1 aaa.bb.ccc.148=DC_VIC:rack1 ... and so forth for 9 nodes 5. Update App Keyspace to use NetworkTopologySnitch with {'DC_NSW':5,'DC_VIC':5}. 6. Start standby cluster (VIC). - run a nodetool rebuild on each node. Some questions --- - Does the Cluster Name on both clusters need to be the same ? - Do I need to run a repair as part of Step 2 (after changing from Simple to NetworkTopologyStrategy) ? - Does the system keyspace snitch need to be updated to use NetworkTopologyStrategy as well ? As currently in the Lab it display as follows (please see 0.00% ownership below), or is this normal ? - Can the different sites run different minor versions ? 1.2.9 - 1.2.15, with a view to upgrading the other site to 1.2.15 ? System Datacenter: DC_NSW == AddressRackStatus State Load OwnsToken 0 xxx.yy.zzz.65 rack1 Up Normal 433.42 KB 50.00% -9223372036854775808 xxx.yy.zzz.66 rack1 Up Normal 459.3 KB 50.00% 0 Datacenter
Re: Relation between Atomic Batches and Consistency Level
Hello, Have you seen this blog post, it's old but still relevant. I think it will answer your questions. http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2. I think the answer lies in how Cassandra defines a batch In the context of a Cassandra batch operation, atomic means that if any of the batchsucceeds, all of it will. My understanding is that in your scenario if either statement succeeded, you batch would succeed. So #1 would get hinted and #2 would be applied, assuming no other failure events occur, like the coordinator fails, the client fails, etc. Hope that helps. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Mar 17, 2014 at 1:38 PM, Drew Kutcharian d...@venarc.com wrote: Hi Jonathan, I'm still a bit unclear on this. Say I have two CQL3 tables: - user (replication of 3) - user_email_index (replication of 3) Now I create a new logged batch at quorum consistency level and put two inserts in there: #1 Insert into the user table with partition key of a timeuuid of the user #2 Insert into the user_email_index with partition key of user's email address As you can see, there is a chance that these two insert statements will be executed on two different nodes because they are keyed by different partition keys. So based on the docs for Logged Batches, a batch will be applied eventually in an all or nothing fashion. So my question is, what happens if insert #1 fails (say replicas are unavailable), would insert #2 get applied? Would the whole thing be rejected and return an error to the client? PS. I'm aware of the isolation guarantees and that's not an issue. All I need to make sure is that if the first the statement failed, the whole batch needs to fail. Thanks, Drew On Mar 17, 2014, at 5:33 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Consistency is declared at the statement level, i.e. batch level when writing, but enforced at each batch row level. My understanding is that each batch (and all of it's contents) will be controlled through a specific CL declaration. So batch A could use a CL of QUORUM while batch B could use a CL of ONE. The detail that may help sort this out for you is that batch statements do not provide isolation guarantees: www.datastax.com/documentation/cql/3.0/cql/cql_reference/batch_r.html. This means that you write the batch as a batch but the reads are per row. If you are reading records contained in the batch, you will read results of partially updated batches. Taking this into account for your second question, you should expect that your read CL will preform as it would for any individual row mutation. Hope this helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Sat, Mar 15, 2014 at 12:23 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, How do Atomic Batches and Consistency Level relate to each other? More specifically: - Is consistency level set/applicable per statement in the batch or the batch as a whole? - Say if I write a Logged Batch at QUORUM and read it back at QUORUM, what can I expect at normal, single node replica failure or double node replica failure scenarios? Thanks, Drew
Re: DSE Hadoop support for provisioning hardware
Hello, Not sure this question is appropriate for the Open Source C* users group. If you would like, please email me directly to discuss DataStax specific items. Thanks, Jonathan jlacefi...@datastax.om Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Tue, Mar 11, 2014 at 11:27 AM, Ariel Weisberg ar...@weisberg.ws wrote: Hi, I am doing a presentation at Big Data Boston about how people are bridging the gap between OLTP and ingest side databases and their analytic storage and queries. One class of systems I am talking about are things like HBase and DSE that let you run map reduce against your OLTP dataset. I remember reading at some point that DSE allows you to provision dedicated hardware for map reduce, but the docs didn't seem to fully explain how that works.I looked at http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/ana/anaStrt.html My question is what kind of provisioning can I do? Can I provision dedicated hardware for just the filesystem or can I also provision replicas that are dedicated to the file system and also serving reads for map reduce jobs. What kind of support is there for keeping OLTP reads from hitting the Hadoop storage nodes and how does this relate to doing quorum reads and writes? Thanks, Ariel
Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram
Hello, You have several options: 1) going forward lower gc_grace_seconds http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configStorage_r.html?pagename=docsversion=1.2file=configuration/storage_configuration#gc-grace-seconds - this is very use case specific. Default is 10 days. Some users will put this at 0 for specific use cases. 2) you could also lower tombstone compaction threshold and interval to get tombstone compaction to fire more often on your tables/cfs: https://datastax.jira.com/wiki/pages/viewpage.action?pageId=54493436 3) to clean out old tombstones you could always run a manual compaction, those these aren't typically recommended though: http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsNodetool_r.html For 1 and 2, be sure your disks can keep up with compaction to ensure tombstone, or other, compaction fires regularly enough to clean out old tombstones. Also, you probably want to ensure you are using Level Compaction: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction. Again, this assumes your disk system can handle the increased io from Leveled Compaction. Also, you may be running into this with the older version of Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-6541 Hope this helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Mar 10, 2014 at 6:41 AM, Oleg Dulin oleg.du...@gmail.com wrote: I get that :) What I'd like to know is how to fix that :) On 2014-03-09 20:24:54 +, Takenori Sato said: You have millions of org.apache.cassandra.db.DeletedColumn instances on the snapshot. This means you have lots of column tombstones, and I guess, which are read into memory by slice query. On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin oleg.du...@gmail.com wrote: I am trying to understand why one of my nodes keeps full GC. I have Xmx set to 8gigs, memtable total size is 2 gigs. Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ -- Regards, Oleg Dulin http://www.olegdulin.com -- Regards, Oleg Dulin http://www.olegdulin.com
Re: Backup/Restore in Cassandra
Hello, Full snapshot forces a flush, yes. Incremental hard-links to SSTables, yes. This question really depends on how your cluster was lost. Node Loss: You would be able to restore a node based on restoring backups + commit log or just by using repair. Cluster Loss: (all nodes down scenario with recoverable machines/disks): You would be able to restore to the point where you captured your last incremental backup. If you had a commit log, then those operations would be replayed during bootstrapping. You would also have to restore SSTables that were written to disk but not captured in an incremental backup. Cluster Loss: (all nodes down scenario with unrecoverable machines/disks): You would only be able to restore to the last incremental backup point. This assumes you save backups off Cluster. The commit log's goal is to provide durability in case of node failure prior to a flush operation. The commit log will be replayed during bootstrapping a node and will repopulate memtables. There is also a commit log archive and restore feature as well: http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLogArchive_t.html. I have not personally used this feature so cannot comment on it's performance/stability. Does this help? BTW: Here's the 1.2 documentation for backup and restore - http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_backup_restore_c.html Here's the 2.0 documentation for backup and restore - http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Thu, Mar 6, 2014 at 9:14 AM, java8964 java8...@hotmail.com wrote: Hi, Currently I am looking how the bacup/restore be done in Cassandra, based the document from DataStax: http://www.datastax.com/docs/1.1/backup_restore Here is one way to do it: 1) Do a full snapshot every week 2) Enable incremental backup every day So with last snapshot + the incremental backups after that snapshot, you can restore the cluster to the stage before it is lost. Here are my understanding how Cassandra will flush from Memtable to SSTable files in snapshot or incremental backups: 1) Full snapshot will force Cassandra flush all memtables to SSTable files, but incremental backup won't. 2) Incremental backup just hard-link all the SSTables after the last snapshot. Here is my question: If the my above understanding is correct, let's say there is a data change happened after the last snapshot, then recorded into commit log, and stored in memtable, but never flush to the SSTables yet, at this time, we lost our cluster. Will that change be lost just based on last snapshot plus incremental backups? Or besides the last snapshot plus incremental backups, we also need all commit log for a restore? Thanks Yong
Re: replication_factor: ?
Hello, The rule of thumb depends on your use case, particularly your consistency requirements. Typical configuration is to leverage RF3. Here's documentation on consistency levels: http://www.datastax.com/documentation/cassandra/1.2/cassandra/dml/dml_config_consistency_c.html If you had a 3 node cluster with an RF 2, then 2 copies of data would exist on each node., i.e. you would have 2 copies of data in your cluster. Hope that helps. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Fri, Mar 7, 2014 at 10:26 AM, Daniel Curry daniel.cu...@arrayent.comwrote: I would like to know on what is the rule of thumb for replication_factor: number? I think the answer is depends on how many nodes one has? IE: three nodes will be the number 3. What would happen it I put the number 2 for a three node cluster? We are using both 3.2.4 and 3.1.3 ( that will be upgraded to 3.2.4). Thank you. -- Daniel Curry Sr. Linux System Administrator, Network Operations PGP : AD5A 96DC 7556 A020 B8E7 0E4D 5D5E 9BA5 C83E 8C92 Arrayent, Inc. 2317 Broadway Street, Suite 20 Redwood City, CA 94063 dan...@arrayent.com 650-260-4520
Re: read one -- internal behavior
Yikes my apologies. B is not the answer On Mar 7, 2014, at 8:24 PM, Russell Hatch rha...@datastax.com wrote: If you are using cqlsh, you can get a look at what's happening behind the scenes by enabling tracing with 'tracing on;' before executing a query. In this scenario you'll see 'Sending message to [ip address]' for each of the replicas. On Fri, Mar 7, 2014 at 5:44 PM, Jonathan Lacefield jlacefi...@datastax.comwrote: B is the answer On Mar 7, 2014, at 7:35 PM, James Lyons james.ly...@gmail.com wrote: I'm wondering about the following scenario. Consider a cluster of nodes with replication say 3. When performing a read at read one consistency and lets say my client isn't smart enough to route the request to the Cassandra node housing the data at first. the contacted node acts as a coordinator and forwards the request to: A) a node that houses the data and waits for a reply, possibly timesout and re-issues to another in a failure or slow host scenario. or B) all (3) the nodes that house the data and returns after any one of them replies. I'm hoping for B... anyone know for sure?
Re: Cassandra nodetool status result after restoring snapshot
Hello, That's a large variation between the old and new cluster. Are you sure you pulled over all the SSTables for your keyspaces? Also, did you run a repair after the data move? Do you have a lot of tombstone data in the old cluster that was removed during the migration process? Are you using Opscenter? A quick comparison of cfstats between clusters may help you analyze your situation and help you pinpoint if you are missing any data for a particular keyspace, etc as well. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Wed, Feb 26, 2014 at 6:07 AM, Ranking Lekarzy rankinglekarzy@gmail.com wrote: Hi I have two separated clusters consist of 4 nodes. One cluster is running on 1.2.12 and the other one on 2.0.5. I loaded data from first cluster (1.2.12) to the second one (2.0.5) by copying snapshots between corresponding nodes. I removed commitlogs, started second cluster and run nodetool upgradesstables. After this I expect that nodetool status will give me the same results in Load column on both clusters. Unfortunately it is completely different: - old cluster: [728.02 GB, 558.24 GB, 787.08 GB, 555.1 GB] - new cluster: [14.63 GB, 35.98 GB, 18 GB, 38.39 GB] When I briefly check data on new cluster it looks fine. But I'm worry about this difference. Do you have any idea what does it mean? Thanks, Michu
Re: Cassandra Version History
Hello, Check out the full version list here: https://issues.apache.org/jira/browse/CASSANDRA?selectedTab=com.atlassian.jira.plugin.system.project:versions-panelsubset=-1 Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Feb 24, 2014 at 2:14 PM, Timmy Turner timm.t...@gmail.com wrote: Hi, is there a history/list showing which major (as in x.y) versions of Cassandra were released on which date? Or is the list on Wikipedia complete? Did 2.0 come after 1.2? Thanks!
Re: Worse perf after Row Caching version 1.2.5:
Hello, Please paste the output of cfhistograms for these tables. Also, what does your environment look like, number of nodes, disk drive configs, memory, C* version, etc. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Tue, Feb 11, 2014 at 10:26 AM, PARASHAR, BHASKARJYA JAY bp1...@att.comwrote: Hi, I have two tables and I enabled row caching for both of them using CQL. These two CF's are very small with one about 300 rows and other 2000 rows. The rows themselves are small. Cassandra heap: 8gb. a. alter table TABLE_X with caching = 'rows_only'; b. alter table TABLE_Y with caching = 'rows_only'; I also changed row_cache_size_in_mb: 1024 in the Cassandra.yaml file. After extensive testing, it seems the performance of Table_X degraded from 600ms to 750ms and Table_Y gained about 10 ms (from 188ms to 177 ms). More Info Table X is always queried with Select * from Table_X; Cfstats in Table_X shows Read Latency: NaN ms. I assumed that since we select all the rows, the entire table would be cached. Table_Y has a secondary index and is queried on that index. Would appreciate any input why the performance is worse and how to enable row caching for these two tables. Thanks Jay
Re: Help me on Cassandra Data Modelling
Hello, The trick with this data model is to get to partition based, and/or cluster based access pattern so C* returns results quickly. In C* you want to model your tables based on your query access patterns and remember that writes are cheap and fast in C*. So, try something like the following: 1 Table with a Partition Key = Tag String Tag String = Tag or set of Tags Cluster based on tag combination (probably desc order) This will allow you to select any combination that includes Tag or set of Tags This will duplicate data as you will store 1 tag combination in every Tag partition, i.e. if a tag combination has 2 parts, then you will have 2 rows Hope this helps. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav nyadav@gmail.com wrote: Hi all, Urgently need help on modelling this usecase on Cassandra. I have concept of tags and tagcombinations. For example U.S.A and Pen are two tags AND if they come together in some definition then register a tagcombination(U.S.A-Pen) for that.. *tags *(U.S.A, Pen, Pencil, India, Shampoo) *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen, India-Pen-Shampoo) - millions of tags - billions of tagcombinations - one tagcombination generally have 2-8 tags - Every day we get lakhs of new tagcombinations to write Query need to support : one tag or set of tags appears in how many tagcombinationids If i query for Pen,India then it should return two tagcombinaions (India-Pen, India-Pen-Shampoo))..Query will be fired by application in realtime. I am new to cassandra and need to deliver fast so please give your inputs. Thanks Naresh
Re: GC eden filled instantly (any size). Dropping messages.
Hello, A couple of items, 1) your reads are going to be very slow with that many SSTables being accessed in the worse case. Personally i have never seen 72 SStables being accessed before, if I'm reading that correctly. 2) 16GB of Heap is very large and may actually increase the duration of GC events, though they GC events should occur less frequently. Looks like you actually have several issues: 1) GC issue - please add GC logging information, using the GC log flags included int he cassandra-env.sh file. Please share the results of the gc log file once collected. 2) Compaction/Tombstone/Data Model issues - this one is tougher to solve via irc as data modeling requires a bit more in-depth review of the problem statement/goal. But, what does your data model look like? Also, how are you doing reads and writes and deletes? - I would expect your GC issues are the result of the Data Model or access patterns based on the results of CFHistograms 3) What does tpstats look like? Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Mon, Jan 27, 2014 at 7:07 AM, Dimetrio dimet...@flysoft.ru wrote: No one advice did't help to me for reduce GC load I tried these: MAX_HEAP_SIZE from default(8GB) to 16G with HEAP_NEWSIZE from 400M to 9600M key cache on/off compacting memory size and other limits 15 c3.4xlarge nodes (adding 5 nodes to 10 nodes cluster did't help): and many other Reads ~5000 ops/s Writes ~ 5000 ops/s max batch is 50 heavy reads and heavy writes (and heavy deletes) sometimes i have message: Read 1001 live and 2691 Read 12 live and 2796 sudo jstat -gcutil -h15 `sudo cat /var/run/cassandra/cassandra.pid` 250ms 0 S0 S1 E O P YGC YGCTFGCFGCT GCT 18.93 0.00 4.52 75.36 59.77225 30.11918 28.361 58.480 0.00 13.12 3.78 81.09 59.77226 30.19318 28.617 58.810 0.00 13.12 39.50 81.09 59.78226 30.19318 28.617 58.810 0.00 13.12 80.70 81.09 59.78226 30.19318 28.617 58.810 17.21 9.13 0.66 87.38 59.78228 30.23518 28.617 58.852 0.00 10.96 29.43 87.89 59.78228 30.32818 28.617 58.945 0.00 10.96 62.67 87.89 59.78228 30.32818 28.617 58.945 0.00 10.96 96.62 87.89 59.78228 30.32818 28.617 58.945 0.00 10.69 10.29 94.56 59.78230 30.46218 28.617 59.078 0.00 10.69 38.08 94.56 59.78230 30.46218 28.617 59.078 0.00 10.69 71.70 94.56 59.78230 30.46218 28.617 59.078 15.91 6.24 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 S0 S1 E O P YGC YGCTFGCFGCT GCT 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 15.91 8.02 0.03 99.96 59.78232 30.50618 28.617 59.123 $ nodetool cfhistograms Social home_timeline Social/home_timeline histograms Offset SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 1 10458 0 0 0 26330 2 72428 0 0 0 0 3 33949011 0 0 42398 4 661819 156 0 0 0 5 67186 893 0 0 0 6 33284 3064 0 0 15907 7 41287 10542 0
Re: Tracking word frequencies
Hi David, How do you know that you are receiving a seek for each row? Are you querying for a specific word at a time or do the queries span multiple words, i.e. what's the query pattern? Also, what is your goal for read latency? Most customers can achieve microsecond partition key base query reads with Cassanda. This can be done through tuning, data modeling, and/or scaling. Please post a cfhistograms for this table as well as provide some details on the specific queries you are running. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Fri, Jan 17, 2014 at 1:41 AM, David Tinker david.tin...@gmail.comwrote: I have an app that stores lots of bits of text in Cassandra. One of the things I need to do is keep a global word frequency table. Something like this: CREATE TABLE IF NOT EXISTS word_count ( word text, count value, PRIMARY KEY (word) ); This is slow to read as the rows (100's of thousands of them) each need a seek. Is there a better way to model this in Cassandra? I could periodically snapshot the rows into a fat row in another table I suppose. Or should I use Redis or something instead? I would prefer to keep it all Cassandra if possible.