Hi Malay ,
you can do below things,
cassandra stress tool is inside /tools/bin/cassandra-stress
for performing inserts and reads to test a keyspace to measure performace
cassandra-stress [options] [-o [operation name]]
-o (--operation name) : INSERT,READ,ETC.. (default INSERT)
-t (--threads)
Will try... Thank you
Rahul Neelakantan
On Sep 10, 2014, at 12:01 AM, Rahul Menon ra...@apigee.com wrote:
I use jmxterm. http://wiki.cyclopsgroup.org/jmxterm/ attach it to your c*
process and then use the org.apache.cassandra.db:HintedHandoffManager bean
and run deleteHintsforEndpoint
Hi All.
Is multi datacenter replication capability available in community
addition?
If yes can someone share the experience how stable is it and where can I
read the best practice of it?
Thanks
Oleg.
Hi ,
I try to evaluate different option of spark + cassandra and I have couple
of questions:
My aim is to use cassandra+spark without hadoop:
1) Is it possible to use only cassandra as input/output parameter for
PySpark?
2) In case I'll use Spark (java,scala) is it possible to use only
Is there any serious difference in the used disk and memory storage between
upsert and delete + insert ?
e.g. 2 vs 2A + 2B.
PK ((key), version, c1)
1. INSERT INTO A (key , version , c1, val) values (1, 1, 4711, “X1”)
...
2. INSERT INTO A (key , version , c1, val) values (1, 1, 4711, “X2”)
Vs.
Hi Oleg,
Yes Replication cross DC is something available for a long time already, so
it is assumed to be stable.
As discussed in this thread, Cassandra documentation is often outdated or
inexistant, the alternative is datastax one.
Hi Oleg,
If you want to use cassandra+spark without hadoop, perhaps Stratio Deep
is your best choice (https://github.com/Stratio/stratio-deep). It's an
open-source Spark + Cassandra connector that doesn't make any use of
Hadoop or Hadoop component.
My understanding is that a update is the same as an insert. So I would
think delete+insert is a bad idea. Also insert+delete would put 2 entries
in the commit log.
On Sep 10, 2014 9:49 AM, Michal Budzyn michalbud...@gmail.com wrote:
Is there any serious difference in the used disk and memory
Hello Oleg
Question 2: yes. The official spark cassandra connector can be found here:
https://github.com/datastax/spark-cassandra-connector
There is docs in the doc/ folder. You can read write directly from/to
Cassandra without EVER using HDFS. You still need a resource manager like
Apache
Multi-dc is available in every version of Cassandra.
On Wed, Sep 10, 2014 at 9:21 AM, Oleg Ruchovets oruchov...@gmail.com wrote:
Thank you very much for the links.
Just to be sure: is this capability available for COMMUNITY ADDITION?
Thanks
Oleg.
On Wed, Sep 10, 2014 at 11:49 PM, Alain
One insert would be much better e.g. for performance and network latency.
I wanted to know if there is a significant difference (apart from
additional commit log entry) in the used storage between these 2 use cases.
IMHO, delete then insert will take two times more disk space then
single insert. But after compaction the difference will disappear.
This was true in version prior to 2.0, but it should still work this
way. But maybe someone will correct me, if i'm wrong.
Cheers,
Olek
2014-09-10 18:30 GMT+02:00
Great stuff Paco.
Thanks for sharing.
Couple of questions:
Is it required additional installation to be HA like apache mesos?
Are you supporting PySpark?
How stable /ready for production ?
Thanks
Oleg.
On Thu, Sep 11, 2014 at 12:01 AM, Francisco Madrid-Salvador
pmad...@stratio.com wrote:
Thanks for the info.
can you share please where can I read about mesos integration for HA and
StandAlone mode execution?
Thanks
Oleg.
On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan doanduy...@gmail.com wrote:
Hello Oleg
Question 2: yes. The official spark cassandra connector can be found
Would the factor before compaction be always 2 ?
On Wed, Sep 10, 2014 at 6:38 PM, olek.stas...@gmail.com
olek.stas...@gmail.com wrote:
IMHO, delete then insert will take two times more disk space then
single insert. But after compaction the difference will disappear.
This was true in version
Consider that I have configured 1 Mb of key-cache (Consider it can hold 13000
of keys ).
Then I wrote some records in a column family(say 2).
Then read it at first (All keys sequentially in the same order used to write ),
and keys are started to stored in key-cache.
When the read reached
As far as I know, the Datastax connector uses thrift to connect Spark with
Cassandra although thrift is already deprecated, could someone confirm this
point?
-- the Scala connector is using the latest Java driver, so no there is no
Thrift there.
For the Java version, I'm not sure, have not
Source code check for the Java version:
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/spark/connector/RDDJavaFunctions.java#L26
It's using the RDDFunctions from scala code so yes, it's Java driver again.
On Wed, Sep
Interesting things actually:
We have hadoop in our eco system. It has single point of failure and I
am not sure about inter data center replication.
Plan is to use cassandra - no single point of failure , there is data
center replication.
For aggregation/transformation using SPARK. BUT storm
Stupid question: do you really need both Storm Spark ? Can't you
implement the Storm jobs in Spark ? It will be operationally simpler to
have less moving parts. I'm not saying that Storm is not the right fit, it
may be totally suitable for some usages.
But if you want to avoid the SPOF thing
Hello, I have a node that is in MutationStage for the last 5 hours.
Actually the node is *down*.
The pendings task go from 776 to 110 and then to 964.
There are some way to finish this stage?
The last heavy write workload was 5 days ago.
Pool NameActive Pending
On 10.09.14 02:09, Robert Coli wrote:
On Tue, Sep 9, 2014 at 2:36 PM, Eugene Voytitsky viy@gmail.com
mailto:viy@gmail.com wrote:
As I understand, atomic batch for counters can't work correctly
(atomically) prior to 2.1 because of counters implementation.
[Link:
Good to know. Thanks, DuyHai! I'll take a look (but most probably tomorrow
;-))
Paco
2014-09-10 20:15 GMT+02:00 DuyHai Doan doanduy...@gmail.com:
Source code check for the Java version:
On Wed, Sep 10, 2014 at 11:38 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Actually the node is *down*.
The node can't be that down if it's printing tpstats...
https://issues.apache.org/jira/browse/CASSANDRA-4162
?
=Rob
On Wed, Sep 10, 2014 at 12:03 PM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Yes, the tpstats is printing. The Opcenter show the node down.
Have you recently restarted it or anything?
If not, try doing so?
=Rob
I think so.
this is how i see it:
on the very beginning you have such line in datafile:
{key: [col_name, col_value, date_of_last_change]} //something similar,
i don't remember now
after delete you're adding line:
{key:[col_name, last_col_value, date_of_delete, 'd']} //this d
indicates that field
I have a datacenter with a single node, and I want to start using vnodes. I
have followed the instructions (
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html),
and set up a new node in a new datacenter (auto_bootstrap=false, seed=node
in old dc,
delete inserts a tombstone which is likely smaller than the original record
(though still (currently) has overhead of cost for full key/column name
the data for the insert after a delete would be identical to the data if you
just inserted/updated
no real benefit I can think of for doing the
You're right, there is no data in tombstone, only a column name. So
there is only small overhead of disk size after delete. But i must
agree with post above, it's pointless in deleting prior to inserting.
Moreover, it needs one op more to compute resulting row.
cheers,
Olek
2014-09-10 22:18
agreed
On Sep 10, 2014, at 3:27 PM, olek.stas...@gmail.com wrote:
You're right, there is no data in tombstone, only a column name. So
there is only small overhead of disk size after delete. But i must
agree with post above, it's pointless in deleting prior to inserting.
Moreover, it needs
On Wed, Sep 10, 2014 at 12:16 PM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Yes, I restarted the node becaouse the write latency was 2500 ms, when
usually is 5 ms.
And did that help?
=Rob
With the official release of 2.1, I highly recommend using the new stress
tool bundled with it - it is improved in many ways over the tool in 2.0,
and is compatible with older clusters.
It supports the same simple mode of operation as the old stress, with
better command line interface and more
Could you post the results of jstack on the process somewhere?
On Thu, Sep 11, 2014 at 7:07 AM, Robert Coli rc...@eventbrite.com wrote:
On Wed, Sep 10, 2014 at 1:53 PM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
No, is still running the Mutation Stage.
If you're sure that it
Typo. I am talking about spark only.
Thanks
Oleg.
On Thursday, September 11, 2014, DuyHai Doan doanduy...@gmail.com wrote:
Stupid question: do you really need both Storm Spark ? Can't you
implement the Storm jobs in Spark ? It will be operationally simpler to
have less moving parts. I'm not
34 matches
Mail list logo