Re: Cassandra, vnodes, and spark

2014-09-16 Thread DuyHai Doan
Look into the source code of the Spark connector. CassandraRDD try to find all token ranges (even when using vnodes) for each node (endpoint) and create RDD partition to match this distribution of token ranges. Thus data locality is guaranteed On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe

Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Hi. As I see massive data processing tools (map\reduce) with C* data include connectors - Calliope http://tuplejump.github.io/calliope/ - Datastax spark cassandra connector https://github.com/datastax/spark-cassandra-connector - Startio Deep https://github.com/Stratio/stratio-deep - other

Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread DuyHai Doan
If you access directly the C* sstables from those frameworks, you will: 1) miss live data which are in memory and not dumped yet to disk 2) skip the Dynamo layer of C* responsible for data consistency Le 16 sept. 2014 10:58, platon.tema platon.t...@yandex.ru a écrit : Hi. As I see massive

Document of WRITETIME function needs update

2014-09-16 Thread ziju feng
Hi, I found that the WRITETIME function on counter column returns date/time in milliseconds instead of microseconds, which is not mentioned in the document http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use_writetime.html. It will be helpful to clarify the difference in the document.

Re: Cassandra, vnodes, and spark

2014-09-16 Thread George Stergiou
Run into this performance report https://github.com/datastax/spark-cassandra-connector/issues/200 Does spark connector in its current state issue one CQL per vnode or task per vnode? Regards. On Tue, Sep 16, 2014 at 2:05 AM, DuyHai Doan doanduy...@gmail.com wrote: Look into the source code

Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Thanks. But 1) overcomes with C* API for commitlog and memtables or with mixed access (direct IO + traditional connectors or pure CQL if data model allows, we experimented with it). 2) is more complex for universal solution. In our case C* uses without replication (RF=1) because of huge

RE: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread moshe.kranc
You will also have to read/resolve multiple row instances (if you update records) and tombstones (if you delete records) yourself. From: platon.tema [mailto:platon.t...@yandex.ru] Sent: Tuesday, September 16, 2014 1:51 PM To: user@cassandra.apache.org Subject: Re: Direct IO with Spark and Hadoop

Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Yes, updates and deletes is trouble. At the moment for updates collection we refresh result data by query to C* (java driver) before reporting to user. For deletes we can skip it during scanning by TTL for example (not tested yet). On 09/16/2014 04:53 PM, moshe.kr...@barclays.com wrote: You

Re: hs_err_pid3013.log, out of memory?

2014-09-16 Thread Chris Lohfink
How much memory does your system have? How much memory is system utilizing before starting Cassandra (use command free)? What are the heap setting it tries to use? Chris On Sep 15, 2014, at 8:16 PM, Yatong Zhang bluefl...@gmail.com wrote: It's during the startup. I tried to upgrade cassandra

Re: Trying to understand cassandra gc logs

2014-09-16 Thread Chris Lohfink
Check out: https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs The young gen collection is stop the world that pauses application threads, and a couple parts of CMS can as well. I would recommend disabling the #JVM_OPTS=$JVM_OPTS -XX:PrintFLSStatistics=1 line in your

Consistency Level for Atomic Batches

2014-09-16 Thread Viswanathan Ramachandran
Is consistency level honored for batch statements? If I have 100 insert/update statements in my batch and use LOCAL_QUORUM consistency, will the control from coordinator return only after a local quorum update has been done for all the 100 statements? Or is it different ? Thanks Vish

Re: Consistency Level for Atomic Batches

2014-09-16 Thread Viswanathan Ramachandran
A follow up on the earlier question. I meant to ask earlier if control returns to client after batch log is written on coordinator irrespective of consistency level mentioned. Also: will the coordinator attempt all statements one after the other, or in parallel ? Thanks On Tue, Sep 16, 2014

Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread Kevin Burton
Say I want to do a rolling restart of Cassandra… I can’t just restart all of them because they need some time to gossip and for that gossip to get to all nodes. What is the best strategy for this. It would be something like: /etc/init.d/cassandra restart wait-for-cassandra.sh … or something

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread Duncan Sands
Hi Kevin, if you are using the latest version of opscenter, then even the community (= free) edition can do a rolling restart of your cluster. It's pretty convenient. Ciao, Duncan. On 16/09/14 19:44, Kevin Burton wrote: Say I want to do a rolling restart of Cassandra… I can’t just restart

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread James Briggs
FYI: OpsCenter has a default of sleep 60 seconds after each node restart, and an option of drain before stopping. I haven't noticed if they do anything special with seeds. (At least one seed needs to be running before you restart other nodes.) I wondered the same thing as Kevin and came to

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 12:21 PM, James Briggs james.bri...@yahoo.com wrote: I haven't noticed if they do anything special with seeds. (At least one seed needs to be running before you restart other nodes.) If the nodes have all seen each other before (the cluster has coalesced once) then

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread James Briggs
Hi Robert. I just did a test (shutdown all nodes, start one non-seed node.) You're correct that an old non-seed node can start by itself. So startup scripts don't have to be intelligent, but apps need to wait until there's enough nodes up to serve the whole keyspace: cqlsh:my_keyspace

backport of CASSANDRA-6916

2014-09-16 Thread Paulo Ricardo Motta Gomes
Hello, Has anyone backported incremental replacement of compacted SSTables (CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies introduced in 2.1? Haven't checked the ticket detail yet, but just in case anyone has interesting info to share. Cheers, -- *Paulo Motta* Chaordic |

Re: backport of CASSANDRA-6916

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Has anyone backported incremental replacement of compacted SSTables (CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies introduced in 2.1? Haven't checked the ticket detail

Re: backport of CASSANDRA-6916

2014-09-16 Thread Paulo Ricardo Motta Gomes
own purposes but wouldn't mind making it public so people could patch it themselves if they want too.. (if nobody has already done so) :) On Tue, Sep 16, 2014 at 8:13 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes

Re: backport of CASSANDRA-6916

2014-09-16 Thread James Briggs
Paulo: Out of curiosity, why not just upgrade to 2.1 if you want the new features? You know you want to! :) Thanks, James Briggs -- Cassandra/MySQL DBA. Available in San Jose area or remote. From: Robert Coli rc...@eventbrite.com To:

Re: backport of CASSANDRA-6916

2014-09-16 Thread Paulo Ricardo Motta Gomes
Because I want this specific feature, and not all 2.1 features, even though this is probably one of the most significant changes in 2.1. Upgrading would be nice, but want to wait a little more before fully jumping into 2.1 :) We're having sudden peaks on read latency some time after a massive

Re: backport of CASSANDRA-6916

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 4:38 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: We're having sudden peaks on read latency some time after a massive batch write which is mostly likely caused by cold page cache of newly compacted sstables, which will hopefully be solved by

Re: backport of CASSANDRA-6916

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 4:50 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Sep 16, 2014 at 4:38 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: We're having sudden peaks on read latency some time after a massive batch write which is mostly likely caused by cold

no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
Hi - We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances were using EBS for storage (I know it is not recommended). We replaced the EBS storage with SSDs. However, we didn't see any change in read latency. A query that took 10 seconds when data was stored on EBS still

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.com wrote: Does anyone have insight as to why we don't see any performance impact on the reads going from EBS to SSD? What does it say when you enable tracing on this CQL query? 10 seconds is a really long time to access

Announce: top for Cassandra - cass_top

2014-09-16 Thread James Briggs
I wrote cass_top, a poor man's version of OpsCenter, in bash (no dependencies.) http://www.jebriggs.com/blog/2014/09/top-utility-for-cassandra-clusters-cass_top/ Actually, if it had node or cluster restart, it would do most of what the OpsCenter free version does. :) The features of

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread James Briggs
To expand on what Robert said, Cassandra is a log-structured database: - writes are append operations, so both correctly configured disk volumes and SSD are fast at that - reads could be helped by SSD if they're not in cache (ie. on disk) - but compaction is definitely helped by SSD with large

Re: hs_err_pid3013.log, out of memory?

2014-09-16 Thread J. Ryan Earl
Are you using JNA? Did you adjust your memlock limit? On Tue, Sep 16, 2014 at 9:46 AM, Chris Lohfink clohf...@blackbirdit.com wrote: How much memory does your system have? How much memory is system utilizing before starting Cassandra (use command free)? What are the heap setting it tries to

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Alex Kamil
Mohammed, to add to previous answers, EBS is network attached, with SSD or without it , you access your disk via network constrained by network bandwidth and latency, if you really need to improve IO performance try switching to ephemeral storage (also called instance storage) which is

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Tony Anecito
If you cached your tables or the database you may not see any difference at all. Regards, -Tony On Tuesday, September 16, 2014 6:36 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi - We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances were using EBS for

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Ben Bromhead
EBS vs local SSD in terms of latency you are using ms as your unit of measurement. If your query runs for 10s you will not notice anything. What is a few less ms for the life of a 10 second query. To reiterate what Rob said. The query is probably slow because of your use case / data model, not

RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
Rob, The 10 seconds latency that I gave earlier is from CQL tracing. Almost 5 seconds out of that was taken up by the “merge memtable and sstables” step. The remaining 5 seconds are from “read live and tombstoned cells.” I too first thought that maybe disk is not the bottleneck and Cassandra is

Re: C 2.1

2014-09-16 Thread Jack Krupansky
DSE/Solr is tightly integrated, so there is no “external” system to manage – insert data in CQL and within a few seconds it is available for query from Solr running in the same JVM as Cassandra. DSE/Solr indexes the data on each Cassandra node, and uses Cassandra’s cluster management for