Read performance
Hi, I am writing an application that will periodically read big amounts of data from Cassandra and I am experiencing odd performances. My column family is a classic time series one, with series ID and Day as partition key and a timestamp as clustering key, the value being a double. The query I run gets all the values for a given time series for a given day (so about 86400 points): SELECT UtcDate, ValueFROM Metric_OneSecWHERE MetricId = 12215ece-6544-4fcf-a15d-4f9e9ce1567eAND Day = '2015-05-05 00:00:00+'LIMIT 86400; This takes about 450ms to run and when I trace the query I see that it takes about 110ms to read the data from disk and 224ms to send the data from the responsible node to the coordinator (full trace in attachment). I did a quick estimation of the requested data (correct me if I'm wrong): 86400 * (column name + column value + timestamp + ttl) = 86400 * (8 + 8 + 8 + 8?) = 2.6Mb Let's say about 3Mb with misc. overhead, so these timings seem pretty slow to me for a modern SSD and a 1Gb/s NIC. Do those timings seem normal? Am I missing something? Thank you, Kévin activity | timestamp| source | source_elapsed -+--++ execute_cql3_query | 09:25:45,027 | node01 | 0 Message received from /node01 | 09:25:45,021 | node02 | 10 Executing single-partition query on Metric_OneSec | 09:25:45,021 | node02 |156 Acquiring sstable references | 09:25:45,021 | node02 |164 Merging memtable tombstones | 09:25:45,021 | node02 |179 Bloom filter allows skipping sstable 5153 | 09:25:45,021 | node02 |198 Bloom filter allows skipping sstable 5152 | 09:25:45,021 | node02 |205 Bloom filter allows skipping sstable 5151 | 09:25:45,021 | node02 |211 Bloom filter allows skipping sstable 5146 | 09:25:45,021 | node02 |217 Key cache hit for sstable 5125 | 09:25:45,021 | node02 |228 Seeking to partition beginning in data file | 09:25:45,021 | node02 |231 Bloom filter allows skipping sstable 5040 | 09:25:45,022 | node02 |470 Bloom filter allows skipping sstable 4955 | 09:25:45,022 | node02 |479 Bloom filter allows skipping sstable 4614 | 09:25:45,022 | node02 |485 Skipped 0/8 non-slice-intersecting sstables, included 0 due to tombstones | 09:25:45,022 | node02 |491 Merging data from memtables and 1 sstables | 09:25:45,022 | node02 |495 Parsing SELECT Value FROM Metric_OneSec WHERE MetricId = 12215ece-6544-4fcf-a15d-4f9e9ce1567e AND Day = '2015-05-05 00:00:00+' LIMIT 86400; | 09:25:45,027 | node01 | 23
Slow bulk loading
Hi, I m streaming a big sstable using bulk loader of sstableloader but it's very slow (3 Mbytes/sec) : Summary statistics: Connections per host: : 1 Total files transferred: : 1 Total bytes transferred: : 10357947484 Total duration (ms): : 3280229 Average transfer rate (MB/s): : 3 Peak transfer rate (MB/s):: 3 I'm on a single node configuration, empty keyspace and table, with good hardware 8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of ressource for the process. I'm uploading from another server. The sstable is 9GB in size and have 4 partitions, but a lot of rows per partition (like 100 millions), the clustering key is a INT and have 4 other regulars columns, so approximatly 500 millions cells per ColumnFamily. When I upload I notice one core of the cassandra node is full CPU (all other cores are idleing), so I assume I'm CPU bound on node side. But why ? What the node is doing ? Why does it take so long time ?
RE: Inserting null values
I’ve added an option to prevent tombstone creation when using PreparedStatements to trunk, see CASSANDRA-7304. The problem is having tombstones in regular columns. When you perform a read request (range query or by PK): - Cassandra iterates over all the cells (all, not only the cells specified in the query) in the relevant rows while counting tombstone cells (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java#L199) - creates a ColumnFamily object instance with the rows - filters the selected columns from the internal CF (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java#L653) - returns the result If you have many unnecessary tombstones you read many unnecessary cells. From: Eric Stevens [mailto:migh...@gmail.com] Sent: Wednesday, May 06, 2015 4:37 PM To: user@cassandra.apache.org Subject: Re: Inserting null values I agree that inserting null is not as good as not inserting that column at all when you have confidence that you are not shadowing any underlying data. But pragmatically speaking it really doesn't sound like a small number of incidental nulls/tombstones ( 20% of columns, otherwise CASSANDRA-3442 takes over) is going to have any performance impact either in your query patterns or in compaction in any practical sense. If INSERT of null values is problematic for small portions of your data, then it stands to reason that an INSERT option containing an instruction to prevent tombstone creation would be an important performance optimization (and would also address the fact that non-null collections also generate tombstones on INSERT as well). INSERT INTO ... USING no_tombstones; There's thresholds (log messages, etc.) which operate on tombstone counts over a certain number, but not on column counts over the same number. tombstone_warn_threshold and tombstone_failure_threshold only apply to clustering scans right? I.E. tombstones don't count against those thresholds if they are not part of the clustering key column being considered for the non-EQ relation? The documentation certainly implies so: tombstone_warn_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold (Default: 1000) The maximum number of tombstones a query can scan before warning. tombstone_failure_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold (Default: 10) The maximum number of tombstones a query can scan before aborting. On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com wrote: On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens migh...@gmail.commailto:migh...@gmail.com wrote: In the end, inserting a tombstone into a non-clustered column shouldn't be appreciably worse (if it is at all) than inserting a value instead. Or am I missing something here? There's thresholds (log messages, etc.) which operate on tombstone counts over a certain number, but not on column counts over the same number. Given that tombstones are often smaller than data columns, sorta hard to understand conceptually? =Rob
Re: Hive support on Cassandra
You may find Spark to be useful. You can do SQL, but also use Python, Scala or Java. I wrote a post last week on getting started with DataFrames Spark, which you can register as tables query using Hive compatible SQL: http://rustyrazorblade.com/2015/05/on-the-bleeding-edge-pyspark-dataframes-and-cassandra/ On Thu, May 7, 2015 at 10:07 AM Ajay ajay.ga...@gmail.com wrote: Thanks everyone. Basically we are looking at Hive because it supports advanced queries (CQL is limited to the data model). Does Stratio supports similar to Hive? Thanks Ajay On Thu, May 7, 2015 at 10:33 PM, Andres de la Peña adelap...@stratio.com wrote: You may also find interesting https://github.com/Stratio/crossdata. This project provides batch and streaming capabilities for Cassandra and others databases though a SQL-like language. Disclaimer: I am an employee of Stratio 2015-05-07 17:29 GMT+02:00 l...@airstreamcomm.net: You might also look at Apache Drill, which has support (I think alpha) for ANSI SQL queries against Cassandra if that would suit your needs. On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote: Hi, Does Apache Cassandra (not DSE) support Hive Integration? I found couple of open source efforts but nothing is available currently. Thanks Ajay -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
Re: Can a Cassandra node accept writes while being repaired
Yes, Cassandra nodes accept writes during Repair. Also Repair triggers compactions to remove any tombstones. On Thu, May 7, 2015 at 9:31 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] raziuddin.kh...@nih.gov wrote: I was not able to find a conclusive answer to this question on the internet so I am asking this question here. Is a Cassandra node able to accept insert or delete operations while the node is being repaired? Thanks -Razi -- Arun Senior Hadoop/Cassandra Engineer Cloudwick Champion of Big Data (Cloudera) http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html 2014 Data Impact Award Winner (Cloudera) http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
Re: Can a Cassandra node accept writes while being repaired
Thanks for the answers. From: arun sirimalla arunsi...@gmail.commailto:arunsi...@gmail.com Date: Thursday, May 7, 2015 at 2:00 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Cc: Razi Khaja raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov Subject: Re: Can a Cassandra node accept writes while being repaired Yes, Cassandra nodes accept writes during Repair. Also Repair triggers compactions to remove any tombstones. On Thu, May 7, 2015 at 9:31 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov wrote: I was not able to find a conclusive answer to this question on the internet so I am asking this question here. Is a Cassandra node able to accept insert or delete operations while the node is being repaired? Thanks -Razi -- Arun Senior Hadoop/Cassandra Engineer Cloudwick Champion of Big Data (Cloudera) http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html 2014 Data Impact Award Winner (Cloudera) http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
Re: Offline Compaction and Token Splitting
On Thu, May 7, 2015 at 12:07 PM, Jeff Ferland j...@tubularlabs.com wrote: Does anybody have any thoughts in regards to other things that might exist and fulfill this (particularly offline collective compaction), have a desire for such tools, or have any useful information for me before I attempt to build such beasts? Were I doing this, I'd : 1) probably just run an embedded cassandra cluster-of-one node and use that to compact 2) look at the code of offline scrub and/or sstablesplit tools =Rob
Offline Compaction and Token Splitting
I have an ideal for backups in my mind with Cassandra to dump each columnfamily to a directory and use an offline process to compact them all into one sstable (or max sstable size set). I have an ideal for restoration which involves a streaming read an sstable set and output based on whether the data fits within a token range. The result of this is that I can store a single copy of data that is effectively already repaired and can read from the specific range that covers a node that I wish to restore. My first look at this was somewhat frustrated by sstable code in the current versions have a strong reliance on the system keyspace. Does anybody have any thoughts in regards to other things that might exist and fulfill this (particularly offline collective compaction), have a desire for such tools, or have any useful information for me before I attempt to build such beasts? -Jeff
Re: Slow bulk loading
It sounds as though you could be having troubles with Garbage Collection. Check your cassandra system logs and search for GC. If you see frequent garbage collections taking more than a second or two to complete, you're going to need to do some configuration tweaking. On 05/07/2015 04:44 AM, Pierre Devops wrote: Hi, I m streaming a big sstable using bulk loader of sstableloader but it's very slow (3 Mbytes/sec) : Summary statistics: Connections per host: : 1 Total files transferred: : 1 Total bytes transferred: : 10357947484 Total duration (ms): : 3280229 Average transfer rate (MB/s): : 3 Peak transfer rate (MB/s):: 3 I'm on a single node configuration, empty keyspace and table, with good hardware 8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of ressource for the process. I'm uploading from another server. The sstable is 9GB in size and have 4 partitions, but a lot of rows per partition (like 100 millions), the clustering key is a INT and have 4 other regulars columns, so approximatly 500 millions cells per ColumnFamily. When I upload I notice one core of the cassandra node is full CPU (all other cores are idleing), so I assume I'm CPU bound on node side. But why ? What the node is doing ? Why does it take so long time ? -- Mike Neir Liquid Web, Inc. Infrastructure Administrator
Java 8
Hi Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7 is EOL? Currently Java 7 is also recommended for 2.1. Are there any reasons not to recommend Java 8 for 2.1? Thanks, Stefan
Re: Java 8
First link was broken (sorry), here is the correct link: http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREJNAabout_c.html 2015-05-07 8:49 GMT-03:00 Paulo Motta pauloricard...@gmail.com: The official recommendation is to run with Java7 ( http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html), mostly to play it safe I guess, however you can probably already run C* with Java8, since it has been stable for a while. We've been running with Java8 for several months now without any noticeable problem. Regarding source compatibility, the official plan is compile with Java8 starting from version 3.0. You may find more information on this ticket: https://issues.apache.org/jira/browse/CASSANDRA-8168 https://issues.apache.org/jira/browse/CASSANDRA-8168 2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de : Hi Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7 is EOL? Currently Java 7 is also recommended for 2.1. Are there any reasons not to recommend Java 8 for 2.1? Thanks, Stefan
Re: Java 8
DSE 4.6.5 supports Java 8 ( http://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/RNdse46.html?scroll=RNdse46__rel465) and DSE 4.6.5 is Cassandra 2.0.14 under the hood. I would go with 8 On 7 May 2015 at 04:51, Paulo Motta pauloricard...@gmail.com wrote: First link was broken (sorry), here is the correct link: http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREJNAabout_c.html 2015-05-07 8:49 GMT-03:00 Paulo Motta pauloricard...@gmail.com: The official recommendation is to run with Java7 ( http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html), mostly to play it safe I guess, however you can probably already run C* with Java8, since it has been stable for a while. We've been running with Java8 for several months now without any noticeable problem. Regarding source compatibility, the official plan is compile with Java8 starting from version 3.0. You may find more information on this ticket: https://issues.apache.org/jira/browse/CASSANDRA-8168 https://issues.apache.org/jira/browse/CASSANDRA-8168 2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de : Hi Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7 is EOL? Currently Java 7 is also recommended for 2.1. Are there any reasons not to recommend Java 8 for 2.1? Thanks, Stefan -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Hive support on Cassandra
Hi Ajay, I just Googled your question and ended up here: http://stackoverflow.com/q/11850186/260805 The only solution seem to be Datastax Enterprise. Cheers, Jens On Wed, May 6, 2015 at 7:57 AM, Ajay ajay.ga...@gmail.com wrote: Hi, Does Apache Cassandra (not DSE) support Hive Integration? I found couple of open source efforts but nothing is available currently. Thanks Ajay -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Hive support on Cassandra
You might also look at Apache Drill, which has support (I think alpha) for ANSI SQL queries against Cassandra if that would suit your needs. On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote: Hi, Does Apache Cassandra (not DSE) support Hive Integration? I found couple of open source efforts but nothing is available currently. Thanks Ajay
Re: Can a Cassandra node accept writes while being repaired
Sorry if this is a double post. My message may not have posted since I sent the email before receiving the WELCOME message. From: Khaja, Razi Khaja raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov Date: Thursday, May 7, 2015 at 12:31 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Cc: Razi Khaja raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov Subject: Can a Cassandra node accept writes while being repaired I was not able to find a conclusive answer to this question on the internet so I am asking this question here. Is a Cassandra node able to accept insert or delete operations while the node is being repaired? Thanks -Razi
Can a Cassandra node accept writes while being repaired
I was not able to find a conclusive answer to this question on the internet so I am asking this question here. Is a Cassandra node able to accept insert or delete operations while the node is being repaired? Thanks -Razi
Re: Can a Cassandra node accept writes while being repaired
Yes On Thu, May 7, 2015 at 9:53 AM -0700, Khaja, Raziuddin (NIH/NLM/NCBI) [C] raziuddin.kh...@nih.gov wrote: I was not able to find a conclusive answer to this question on the internet so I am asking this question here. Is a Cassandra node able to accept insert or delete operations while the node is being repaired? Thanks -Razi
Re: Hive support on Cassandra
You may also find interesting https://github.com/Stratio/crossdata. This project provides batch and streaming capabilities for Cassandra and others databases though a SQL-like language. Disclaimer: I am an employee of Stratio 2015-05-07 17:29 GMT+02:00 l...@airstreamcomm.net: You might also look at Apache Drill, which has support (I think alpha) for ANSI SQL queries against Cassandra if that would suit your needs. On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote: Hi, Does Apache Cassandra (not DSE) support Hive Integration? I found couple of open source efforts but nothing is available currently. Thanks Ajay -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
Re: Hive support on Cassandra
Thanks everyone. Basically we are looking at Hive because it supports advanced queries (CQL is limited to the data model). Does Stratio supports similar to Hive? Thanks Ajay On Thu, May 7, 2015 at 10:33 PM, Andres de la Peña adelap...@stratio.com wrote: You may also find interesting https://github.com/Stratio/crossdata. This project provides batch and streaming capabilities for Cassandra and others databases though a SQL-like language. Disclaimer: I am an employee of Stratio 2015-05-07 17:29 GMT+02:00 l...@airstreamcomm.net: You might also look at Apache Drill, which has support (I think alpha) for ANSI SQL queries against Cassandra if that would suit your needs. On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote: Hi, Does Apache Cassandra (not DSE) support Hive Integration? I found couple of open source efforts but nothing is available currently. Thanks Ajay -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
Re: Slow bulk loading
When I upload I notice one core of the cassandra node is full CPU (all other cores are idleing), Take a look at the interrupt distribution (cat /proc/interrupts). You'll probably see disk and network interrupts mostly/all bound to CPU0. If that is the case, this article has an excellent description of the underlying issue as well as some work-arounds: http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Java 8
The official recommendation is to run with Java7 ( http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html), mostly to play it safe I guess, however you can probably already run C* with Java8, since it has been stable for a while. We've been running with Java8 for several months now without any noticeable problem. Regarding source compatibility, the official plan is compile with Java8 starting from version 3.0. You may find more information on this ticket: https://issues.apache.org/jira/browse/CASSANDRA-8168 https://issues.apache.org/jira/browse/CASSANDRA-8168 2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de: Hi Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7 is EOL? Currently Java 7 is also recommended for 2.1. Are there any reasons not to recommend Java 8 for 2.1? Thanks, Stefan