Look into the source code of the Spark connector. CassandraRDD try to find
all token ranges (even when using vnodes) for each node (endpoint) and
create RDD partition to match this distribution of token ranges. Thus data
locality is guaranteed
On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe
Hi.
As I see massive data processing tools (map\reduce) with C* data include
connectors
- Calliope http://tuplejump.github.io/calliope/
- Datastax spark cassandra connector
https://github.com/datastax/spark-cassandra-connector
- Startio Deep https://github.com/Stratio/stratio-deep
- other
If you access directly the C* sstables from those frameworks, you will:
1) miss live data which are in memory and not dumped yet to disk
2) skip the Dynamo layer of C* responsible for data consistency
Le 16 sept. 2014 10:58, platon.tema platon.t...@yandex.ru a écrit :
Hi.
As I see massive
Hi,
I found that the WRITETIME function on counter column returns date/time in
milliseconds instead of microseconds, which is not mentioned in the document
http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use_writetime.html.
It will be helpful to clarify the difference in the document.
Run into this performance report
https://github.com/datastax/spark-cassandra-connector/issues/200
Does spark connector in its current state issue one CQL per vnode or task
per vnode?
Regards.
On Tue, Sep 16, 2014 at 2:05 AM, DuyHai Doan doanduy...@gmail.com wrote:
Look into the source code
Thanks.
But 1) overcomes with C* API for commitlog and memtables or with mixed
access (direct IO + traditional connectors or pure CQL if data model
allows, we experimented with it).
2) is more complex for universal solution. In our case C* uses without
replication (RF=1) because of huge
You will also have to read/resolve multiple row instances (if you update
records) and tombstones (if you delete records) yourself.
From: platon.tema [mailto:platon.t...@yandex.ru]
Sent: Tuesday, September 16, 2014 1:51 PM
To: user@cassandra.apache.org
Subject: Re: Direct IO with Spark and Hadoop
Yes, updates and deletes is trouble. At the moment for updates
collection we refresh result data by query to C* (java driver) before
reporting to user. For deletes we can skip it during scanning by TTL for
example (not tested yet).
On 09/16/2014 04:53 PM, moshe.kr...@barclays.com wrote:
You
How much memory does your system have? How much memory is system utilizing
before starting Cassandra (use command free)? What are the heap setting it
tries to use?
Chris
On Sep 15, 2014, at 8:16 PM, Yatong Zhang bluefl...@gmail.com wrote:
It's during the startup. I tried to upgrade cassandra
Check out:
https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs
The young gen collection is stop the world that pauses application threads, and
a couple parts of CMS can as well. I would recommend disabling the
#JVM_OPTS=$JVM_OPTS -XX:PrintFLSStatistics=1
line in your
Is consistency level honored for batch statements?
If I have 100 insert/update statements in my batch and use LOCAL_QUORUM
consistency, will the control from coordinator return only after a local
quorum update has been done for all the 100 statements?
Or is it different ?
Thanks
Vish
A follow up on the earlier question.
I meant to ask earlier if control returns to client after batch log is
written on coordinator irrespective of consistency level mentioned.
Also: will the coordinator attempt all statements one after the other, or
in parallel ?
Thanks
On Tue, Sep 16, 2014
Say I want to do a rolling restart of Cassandra…
I can’t just restart all of them because they need some time to gossip and
for that gossip to get to all nodes.
What is the best strategy for this.
It would be something like:
/etc/init.d/cassandra restart wait-for-cassandra.sh
… or something
Hi Kevin, if you are using the latest version of opscenter, then even the
community (= free) edition can do a rolling restart of your cluster. It's
pretty convenient.
Ciao, Duncan.
On 16/09/14 19:44, Kevin Burton wrote:
Say I want to do a rolling restart of Cassandra…
I can’t just restart
FYI: OpsCenter has a default of sleep 60 seconds after each node restart,
and an option of drain before stopping.
I haven't noticed if they do anything special with seeds.
(At least one seed needs to be running before you restart other nodes.)
I wondered the same thing as Kevin and came to
On Tue, Sep 16, 2014 at 12:21 PM, James Briggs james.bri...@yahoo.com
wrote:
I haven't noticed if they do anything special with seeds.
(At least one seed needs to be running before you restart other nodes.)
If the nodes have all seen each other before (the cluster has coalesced
once) then
Hi Robert.
I just did a test (shutdown all nodes, start one non-seed node.)
You're correct that an old non-seed node can start by itself.
So startup scripts don't have to be intelligent, but apps need to wait
until there's enough nodes up to serve the whole keyspace:
cqlsh:my_keyspace
Hello,
Has anyone backported incremental replacement of compacted SSTables
(CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies
introduced in 2.1?
Haven't checked the ticket detail yet, but just in case anyone has
interesting info to share.
Cheers,
--
*Paulo Motta*
Chaordic |
On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes
paulo.mo...@chaordicsystems.com wrote:
Has anyone backported incremental replacement of compacted SSTables
(CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies
introduced in 2.1?
Haven't checked the ticket detail
own purposes but wouldn't mind making it public so people could patch it
themselves if they want too.. (if nobody has already done so) :)
On Tue, Sep 16, 2014 at 8:13 PM, Robert Coli rc...@eventbrite.com wrote:
On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes
Paulo:
Out of curiosity, why not just upgrade to 2.1 if you want the new features?
You know you want to! :)
Thanks, James Briggs
--
Cassandra/MySQL DBA. Available in San Jose area or remote.
From: Robert Coli rc...@eventbrite.com
To:
Because I want this specific feature, and not all 2.1 features, even though
this is probably one of the most significant changes in 2.1. Upgrading
would be nice, but want to wait a little more before fully jumping into 2.1
:)
We're having sudden peaks on read latency some time after a massive
On Tue, Sep 16, 2014 at 4:38 PM, Paulo Ricardo Motta Gomes
paulo.mo...@chaordicsystems.com wrote:
We're having sudden peaks on read latency some time after a massive batch
write which is mostly likely caused by cold page cache of newly compacted
sstables, which will hopefully be solved by
On Tue, Sep 16, 2014 at 4:50 PM, Robert Coli rc...@eventbrite.com wrote:
On Tue, Sep 16, 2014 at 4:38 PM, Paulo Ricardo Motta Gomes
paulo.mo...@chaordicsystems.com wrote:
We're having sudden peaks on read latency some time after a massive batch
write which is mostly likely caused by cold
Hi -
We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances
were using EBS for storage (I know it is not recommended). We replaced the EBS
storage with SSDs. However, we didn't see any change in read latency. A query
that took 10 seconds when data was stored on EBS still
On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.com
wrote:
Does anyone have insight as to why we don't see any performance impact on
the reads going from EBS to SSD?
What does it say when you enable tracing on this CQL query?
10 seconds is a really long time to access
I wrote cass_top, a poor man's version of OpsCenter, in bash (no dependencies.)
http://www.jebriggs.com/blog/2014/09/top-utility-for-cassandra-clusters-cass_top/
Actually, if it had node or cluster restart, it would do most of what the
OpsCenter free version does. :)
The features of
To expand on what Robert said, Cassandra is a log-structured database:
- writes are append operations, so both correctly configured disk volumes and
SSD are fast at that
- reads could be helped by SSD if they're not in cache (ie. on disk)
- but compaction is definitely helped by SSD with large
Are you using JNA? Did you adjust your memlock limit?
On Tue, Sep 16, 2014 at 9:46 AM, Chris Lohfink clohf...@blackbirdit.com
wrote:
How much memory does your system have? How much memory is system utilizing
before starting Cassandra (use command free)? What are the heap setting it
tries to
Mohammed, to add to previous answers, EBS is network attached, with SSD or
without it , you access your disk via network constrained by network
bandwidth and latency, if you really need to improve IO performance try
switching to ephemeral storage (also called instance storage) which is
If you cached your tables or the database you may not see any difference at all.
Regards,
-Tony
On Tuesday, September 16, 2014 6:36 PM, Mohammed Guller
moham...@glassbeam.com wrote:
Hi -
We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances
were using EBS for
EBS vs local SSD in terms of latency you are using ms as your unit of
measurement.
If your query runs for 10s you will not notice anything. What is a few less
ms for the life of a 10 second query.
To reiterate what Rob said. The query is probably slow because of your use
case / data model, not
Rob,
The 10 seconds latency that I gave earlier is from CQL tracing. Almost 5
seconds out of that was taken up by the “merge memtable and sstables” step. The
remaining 5 seconds are from “read live and tombstoned cells.”
I too first thought that maybe disk is not the bottleneck and Cassandra is
DSE/Solr is tightly integrated, so there is no “external” system to manage –
insert data in CQL and within a few seconds it is available for query from Solr
running in the same JVM as Cassandra. DSE/Solr indexes the data on each
Cassandra node, and uses Cassandra’s cluster management for
34 matches
Mail list logo