Read performance

2015-05-07 Thread Alprema
Hi,

I am writing an application that will periodically read big amounts of data
from Cassandra and I am experiencing odd performances.

My column family is a classic time series one, with series ID and Day as
partition key and a timestamp as clustering key, the value being a double.

The query I run gets all the values for a given time series for a given day
(so about 86400 points):

SELECT UtcDate, ValueFROM Metric_OneSecWHERE MetricId =
12215ece-6544-4fcf-a15d-4f9e9ce1567eAND Day = '2015-05-05
00:00:00+'LIMIT 86400;


This takes about 450ms to run and when I trace the query I see that it
takes about 110ms to read the data from disk and 224ms to send the data
from the responsible node to the coordinator (full trace in attachment).

I did a quick estimation of the requested data (correct me if I'm wrong):
86400 * (column name + column value + timestamp + ttl)
= 86400 * (8 + 8 + 8 + 8?)
= 2.6Mb

Let's say about 3Mb with misc. overhead, so these timings seem pretty slow
to me for a modern SSD and a 1Gb/s NIC.

Do those timings seem normal? Am I missing something?

Thank you,

Kévin

 activity   
 | 
timestamp| source | source_elapsed
-+--++

  execute_cql3_query | 
09:25:45,027 | node01 |  0

   Message received from /node01 | 
09:25:45,021 | node02 | 10

   Executing single-partition query on Metric_OneSec | 
09:25:45,021 | node02 |156

Acquiring sstable references | 
09:25:45,021 | node02 |164

 Merging memtable tombstones | 
09:25:45,021 | node02 |179

   Bloom filter allows skipping sstable 5153 | 
09:25:45,021 | node02 |198

   Bloom filter allows skipping sstable 5152 | 
09:25:45,021 | node02 |205

   Bloom filter allows skipping sstable 5151 | 
09:25:45,021 | node02 |211

   Bloom filter allows skipping sstable 5146 | 
09:25:45,021 | node02 |217

  Key cache hit for sstable 5125 | 
09:25:45,021 | node02 |228

 Seeking to partition beginning in data file | 
09:25:45,021 | node02 |231

   Bloom filter allows skipping sstable 5040 | 
09:25:45,022 | node02 |470

   Bloom filter allows skipping sstable 4955 | 
09:25:45,022 | node02 |479

   Bloom filter allows skipping sstable 4614 | 
09:25:45,022 | node02 |485
   
Skipped 0/8 non-slice-intersecting sstables, included 0 due to tombstones | 
09:25:45,022 | node02 |491

  Merging data from memtables and 1 sstables | 
09:25:45,022 | node02 |495
 Parsing SELECT Value FROM Metric_OneSec WHERE MetricId = 
12215ece-6544-4fcf-a15d-4f9e9ce1567e AND Day = '2015-05-05 00:00:00+' 
LIMIT 86400; | 09:25:45,027 | node01 | 23
 

Slow bulk loading

2015-05-07 Thread Pierre Devops
Hi,

I m streaming a big sstable using bulk loader of sstableloader but it's
very slow (3 Mbytes/sec) :

Summary statistics:
   Connections per host: : 1
   Total files transferred:  : 1
   Total bytes transferred:  : 10357947484
   Total duration (ms):  : 3280229
   Average transfer rate (MB/s): : 3
   Peak transfer rate (MB/s):: 3

I'm on a single node configuration, empty keyspace and table, with good
hardware 8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of
ressource for the process. I'm uploading from another server.

The sstable is 9GB in size and have 4 partitions, but a lot of rows per
partition (like 100 millions), the clustering key is a INT and have 4 other
regulars columns, so approximatly 500 millions cells per ColumnFamily.

When I upload I notice one core of the cassandra node is full CPU (all
other cores are idleing), so I assume I'm CPU bound on node side. But why ?
What the node is doing ? Why does it take so long time ?


RE: Inserting null values

2015-05-07 Thread Peer, Oded
I’ve added an option to prevent tombstone creation when using 
PreparedStatements to trunk, see CASSANDRA-7304.

The problem is having tombstones in regular columns.
When you perform a read request (range query or by PK):
- Cassandra iterates over all the cells (all, not only the cells specified in 
the query) in the relevant rows while counting tombstone cells 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java#L199)
- creates a ColumnFamily object instance with the rows
- filters the selected columns from the internal CF 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java#L653)
- returns the result

If you have many unnecessary tombstones you read many unnecessary cells.



From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Wednesday, May 06, 2015 4:37 PM
To: user@cassandra.apache.org
Subject: Re: Inserting null values

I agree that inserting null is not as good as not inserting that column at all 
when you have confidence that you are not shadowing any underlying data. But 
pragmatically speaking it really doesn't sound like a small number of 
incidental nulls/tombstones ( 20% of columns, otherwise CASSANDRA-3442 takes 
over) is going to have any performance impact either in your query patterns or 
in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data, then 
it stands to reason that an INSERT option containing an instruction to prevent 
tombstone creation would be an important performance optimization (and would 
also address the fact that non-null collections also generate tombstones on 
INSERT as well).  INSERT INTO ... USING no_tombstones;


 There's thresholds (log messages, etc.) which operate on tombstone counts 
 over a certain number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to 
clustering scans right?  I.E. tombstones don't count against those thresholds 
if they are not part of the clustering key column being considered for the 
non-EQ relation?  The documentation certainly implies so:

tombstone_warn_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold
(Default: 1000) The maximum number of tombstones a query can scan before 
warning.
tombstone_failure_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold
(Default: 10) The maximum number of tombstones a query can scan before 
aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens 
migh...@gmail.commailto:migh...@gmail.com wrote:
In the end, inserting a tombstone into a non-clustered column shouldn't be 
appreciably worse (if it is at all) than inserting a value instead.  Or am I 
missing something here?

There's thresholds (log messages, etc.) which operate on tombstone counts over 
a certain number, but not on column counts over the same number.

Given that tombstones are often smaller than data columns, sorta hard to 
understand conceptually?

=Rob




Re: Hive support on Cassandra

2015-05-07 Thread Jonathan Haddad
You may find Spark to be useful.  You can do SQL, but also use Python,
Scala or Java.

I wrote a post last week on getting started with DataFrames  Spark, which
you can register as tables  query using Hive compatible SQL:
http://rustyrazorblade.com/2015/05/on-the-bleeding-edge-pyspark-dataframes-and-cassandra/

On Thu, May 7, 2015 at 10:07 AM Ajay ajay.ga...@gmail.com wrote:

 Thanks everyone.

 Basically we are looking at Hive because it supports advanced queries (CQL
 is limited to the data model).

 Does Stratio supports similar to Hive?

 Thanks
 Ajay


 On Thu, May 7, 2015 at 10:33 PM, Andres de la Peña adelap...@stratio.com
 wrote:

 You may also find interesting https://github.com/Stratio/crossdata. This
 project provides batch and streaming capabilities for Cassandra and others
 databases though a SQL-like language.

 Disclaimer: I am an employee of Stratio

 2015-05-07 17:29 GMT+02:00 l...@airstreamcomm.net:

 You might also look at Apache Drill, which has support (I think alpha)
 for ANSI SQL queries against Cassandra if that would suit your needs.


  On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote:
 
  Hi,
 
  Does Apache Cassandra (not DSE) support Hive Integration?
 
  I found couple of open source efforts but nothing is available
 currently.
 
  Thanks
  Ajay





 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*





Re: Can a Cassandra node accept writes while being repaired

2015-05-07 Thread arun sirimalla
Yes, Cassandra nodes accept writes during Repair. Also Repair triggers
compactions to remove any tombstones.

On Thu, May 7, 2015 at 9:31 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
raziuddin.kh...@nih.gov wrote:

 I was not able to find a conclusive answer to this question on the
 internet so I am asking this question here.
 Is a Cassandra node able to accept insert or delete operations while the
 node is being repaired?
 Thanks
 -Razi




-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick

Champion of Big Data (Cloudera)
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html

2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: Can a Cassandra node accept writes while being repaired

2015-05-07 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
Thanks for the answers.

From: arun sirimalla arunsi...@gmail.commailto:arunsi...@gmail.com
Date: Thursday, May 7, 2015 at 2:00 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Cc: Razi Khaja raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov
Subject: Re: Can a Cassandra node accept writes while being repaired

Yes, Cassandra nodes accept writes during Repair. Also Repair triggers 
compactions to remove any tombstones.

On Thu, May 7, 2015 at 9:31 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov wrote:
I was not able to find a conclusive answer to this question on the internet so 
I am asking this question here.
Is a Cassandra node able to accept insert or delete operations while the node 
is being repaired?
Thanks
-Razi



--
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick

Champion of Big Data (Cloudera)
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html

2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html



Re: Offline Compaction and Token Splitting

2015-05-07 Thread Robert Coli
On Thu, May 7, 2015 at 12:07 PM, Jeff Ferland j...@tubularlabs.com wrote:

 Does anybody have any thoughts in regards to other things that might exist
 and fulfill this (particularly offline collective compaction), have a
 desire for such tools, or have any useful information for me before I
 attempt to build such beasts?


Were I doing this, I'd :

1) probably just run an embedded cassandra cluster-of-one node and use that
to compact
2) look at the code of offline scrub and/or sstablesplit tools

=Rob


Offline Compaction and Token Splitting

2015-05-07 Thread Jeff Ferland
I have an ideal for backups in my mind with Cassandra to dump each columnfamily 
to a directory and use an offline process to compact them all into one sstable 
(or max sstable size set). I have an ideal for restoration which involves a 
streaming read an sstable set and output based on whether the data fits within 
a token range. The result of this is that I can store a single copy of data 
that is effectively already repaired and can read from the specific range that 
covers a node that I wish to restore. My first look at this was somewhat 
frustrated by sstable code in the current versions have a strong reliance on 
the system keyspace.

Does anybody have any thoughts in regards to other things that might exist and 
fulfill this (particularly offline collective compaction), have a desire for 
such tools, or have any useful information for me before I attempt to build 
such beasts?

-Jeff

Re: Slow bulk loading

2015-05-07 Thread Mike Neir
It sounds as though you could be having troubles with Garbage Collection. Check 
your cassandra system logs and search for GC. If you see frequent garbage 
collections taking more than a second or two to complete, you're going to need 
to do some configuration tweaking.


On 05/07/2015 04:44 AM, Pierre Devops wrote:

Hi,

I m streaming a big sstable using bulk loader of sstableloader but it's very
slow (3 Mbytes/sec) :

Summary statistics:
Connections per host: : 1
Total files transferred:  : 1
Total bytes transferred:  : 10357947484
Total duration (ms):  : 3280229
Average transfer rate (MB/s): : 3
Peak transfer rate (MB/s):: 3

I'm on a single node configuration, empty keyspace and table, with good hardware
8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of ressource for the
process. I'm uploading from another server.

The sstable is 9GB in size and have 4 partitions, but a lot of rows per
partition (like 100 millions), the clustering key is a INT and have 4 other
regulars columns, so approximatly 500 millions cells per ColumnFamily.

When I upload I notice one core of the cassandra node is full CPU (all other
cores are idleing), so I assume I'm CPU bound on node side. But why ? What the
node is doing ? Why does it take so long time ?



--



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator



Java 8

2015-05-07 Thread Stefan Podkowinski
Hi

Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7 is EOL?
Currently Java 7 is also recommended for 2.1. Are there any reasons not to 
recommend Java 8 for 2.1?

Thanks,
Stefan


Re: Java 8

2015-05-07 Thread Paulo Motta
First link was broken (sorry), here is the correct link:
http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREJNAabout_c.html

2015-05-07 8:49 GMT-03:00 Paulo Motta pauloricard...@gmail.com:

 The official recommendation is to run with Java7 (
 http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html),
 mostly to play it safe I guess, however you can probably already run C*
 with Java8, since it has been stable for a while. We've been running with
 Java8 for several months now without any noticeable problem.

 Regarding source compatibility, the official plan is compile with Java8
 starting from version 3.0. You may find more information on this ticket:
 https://issues.apache.org/jira/browse/CASSANDRA-8168
 https://issues.apache.org/jira/browse/CASSANDRA-8168

 2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de
 :

  Hi



 Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7
 is EOL?

 Currently Java 7 is also recommended for 2.1. Are there any reasons not
 to recommend Java 8 for 2.1?



 Thanks,

 Stefan





Re: Java 8

2015-05-07 Thread Ben Bromhead
DSE 4.6.5 supports Java 8 (
http://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/RNdse46.html?scroll=RNdse46__rel465)
and DSE 4.6.5 is Cassandra 2.0.14 under the hood.

I would go with 8

On 7 May 2015 at 04:51, Paulo Motta pauloricard...@gmail.com wrote:

 First link was broken (sorry), here is the correct link:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREJNAabout_c.html

 2015-05-07 8:49 GMT-03:00 Paulo Motta pauloricard...@gmail.com:

 The official recommendation is to run with Java7 (
 http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html),
 mostly to play it safe I guess, however you can probably already run C*
 with Java8, since it has been stable for a while. We've been running with
 Java8 for several months now without any noticeable problem.

 Regarding source compatibility, the official plan is compile with Java8
 starting from version 3.0. You may find more information on this ticket:
 https://issues.apache.org/jira/browse/CASSANDRA-8168
 https://issues.apache.org/jira/browse/CASSANDRA-8168

 2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de
 :

  Hi



 Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7
 is EOL?

 Currently Java 7 is also recommended for 2.1. Are there any reasons not
 to recommend Java 8 for 2.1?



 Thanks,

 Stefan






-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692


Re: Hive support on Cassandra

2015-05-07 Thread Jens Rantil
Hi Ajay,

I just Googled your question and ended up here:
http://stackoverflow.com/q/11850186/260805 The only solution seem to be
Datastax Enterprise.

Cheers,
Jens

On Wed, May 6, 2015 at 7:57 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 Does Apache Cassandra (not DSE) support Hive Integration?

 I found couple of open source efforts but nothing is available currently.

 Thanks
 Ajay




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink


Re: Hive support on Cassandra

2015-05-07 Thread list
You might also look at Apache Drill, which has support (I think alpha) for ANSI 
SQL queries against Cassandra if that would suit your needs.


 On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote:
 
 Hi,
 
 Does Apache Cassandra (not DSE) support Hive Integration?
 
 I found couple of open source efforts but nothing is available currently. 
 
 Thanks
 Ajay




Re: Can a Cassandra node accept writes while being repaired

2015-05-07 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
Sorry  if this is a double post.  My message may not have posted since I sent 
the email before receiving the WELCOME message.

From: Khaja, Razi Khaja 
raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov
Date: Thursday, May 7, 2015 at 12:31 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Cc: Razi Khaja raziuddin.kh...@nih.govmailto:raziuddin.kh...@nih.gov
Subject: Can a Cassandra node accept writes while being repaired

I was not able to find a conclusive answer to this question on the internet so 
I am asking this question here.
Is a Cassandra node able to accept insert or delete operations while the node 
is being repaired?
Thanks
-Razi


Can a Cassandra node accept writes while being repaired

2015-05-07 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
I was not able to find a conclusive answer to this question on the internet so 
I am asking this question here.
Is a Cassandra node able to accept insert or delete operations while the node 
is being repaired?
Thanks
-Razi


Re: Can a Cassandra node accept writes while being repaired

2015-05-07 Thread Russell Bradberry
Yes






On Thu, May 7, 2015 at 9:53 AM -0700, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
raziuddin.kh...@nih.gov wrote:










I was not able to find a conclusive answer to this question on the internet so 
I am asking this question here.
Is a Cassandra node able to accept insert or delete operations while the node 
is being repaired?
Thanks
-Razi

Re: Hive support on Cassandra

2015-05-07 Thread Andres de la Peña
You may also find interesting https://github.com/Stratio/crossdata. This
project provides batch and streaming capabilities for Cassandra and others
databases though a SQL-like language.

Disclaimer: I am an employee of Stratio

2015-05-07 17:29 GMT+02:00 l...@airstreamcomm.net:

 You might also look at Apache Drill, which has support (I think alpha) for
 ANSI SQL queries against Cassandra if that would suit your needs.


  On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote:
 
  Hi,
 
  Does Apache Cassandra (not DSE) support Hive Integration?
 
  I found couple of open source efforts but nothing is available currently.
 
  Thanks
  Ajay





-- 

Andrés de la Peña


http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*


Re: Hive support on Cassandra

2015-05-07 Thread Ajay
Thanks everyone.

Basically we are looking at Hive because it supports advanced queries (CQL
is limited to the data model).

Does Stratio supports similar to Hive?

Thanks
Ajay


On Thu, May 7, 2015 at 10:33 PM, Andres de la Peña adelap...@stratio.com
wrote:

 You may also find interesting https://github.com/Stratio/crossdata. This
 project provides batch and streaming capabilities for Cassandra and others
 databases though a SQL-like language.

 Disclaimer: I am an employee of Stratio

 2015-05-07 17:29 GMT+02:00 l...@airstreamcomm.net:

 You might also look at Apache Drill, which has support (I think alpha)
 for ANSI SQL queries against Cassandra if that would suit your needs.


  On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote:
 
  Hi,
 
  Does Apache Cassandra (not DSE) support Hive Integration?
 
  I found couple of open source efforts but nothing is available
 currently.
 
  Thanks
  Ajay





 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*



Re: Slow bulk loading

2015-05-07 Thread Nate McCall



 When I upload I notice one core of the cassandra node is full CPU (all
 other cores are idleing),


Take a look at the interrupt distribution (cat /proc/interrupts). You'll
probably see disk and network interrupts mostly/all bound to CPU0. If that
is the case, this article has an excellent description of the underlying
issue as well as some work-arounds:
http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Java 8

2015-05-07 Thread Paulo Motta
The official recommendation is to run with Java7 (
http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html),
mostly to play it safe I guess, however you can probably already run C*
with Java8, since it has been stable for a while. We've been running with
Java8 for several months now without any noticeable problem.

Regarding source compatibility, the official plan is compile with Java8
starting from version 3.0. You may find more information on this ticket:
https://issues.apache.org/jira/browse/CASSANDRA-8168
https://issues.apache.org/jira/browse/CASSANDRA-8168

2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de:

  Hi



 Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7
 is EOL?

 Currently Java 7 is also recommended for 2.1. Are there any reasons not to
 recommend Java 8 for 2.1?



 Thanks,

 Stefan