Re: Performance testing in Cassandra

2014-09-10 Thread Umang Shah
Hi Malay , you can do below things, cassandra stress tool is inside /tools/bin/cassandra-stress for performing inserts and reads to test a keyspace to measure performace cassandra-stress [options] [-o [operation name]] -o (--operation name) : INSERT,READ,ETC.. (default INSERT) -t (--threads)

Re: Questions about cleaning up/purging Hinted Handoffs

2014-09-10 Thread Rahul Neelakantan
Will try... Thank you Rahul Neelakantan On Sep 10, 2014, at 12:01 AM, Rahul Menon ra...@apigee.com wrote: I use jmxterm. http://wiki.cyclopsgroup.org/jmxterm/ attach it to your c* process and then use the org.apache.cassandra.db:HintedHandoffManager bean and run deleteHintsforEndpoint

multi datacenter replication

2014-09-10 Thread Oleg Ruchovets
Hi All. Is multi datacenter replication capability available in community addition? If yes can someone share the experience how stable is it and where can I read the best practice of it? Thanks Oleg.

cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Hi , I try to evaluate different option of spark + cassandra and I have couple of questions: My aim is to use cassandra+spark without hadoop: 1) Is it possible to use only cassandra as input/output parameter for PySpark? 2) In case I'll use Spark (java,scala) is it possible to use only

Storage: upsert vs. delete + insert

2014-09-10 Thread Michal Budzyn
Is there any serious difference in the used disk and memory storage between upsert and delete + insert ? e.g. 2 vs 2A + 2B. PK ((key), version, c1) 1. INSERT INTO A (key , version , c1, val) values (1, 1, 4711, “X1”) ... 2. INSERT INTO A (key , version , c1, val) values (1, 1, 4711, “X2”) Vs.

Re: multi datacenter replication

2014-09-10 Thread Alain RODRIGUEZ
Hi Oleg, Yes Replication cross DC is something available for a long time already, so it is assumed to be stable. As discussed in this thread, Cassandra documentation is often outdated or inexistant, the alternative is datastax one.

cassandra + spark / pyspark

2014-09-10 Thread Francisco Madrid-Salvador
Hi Oleg, If you want to use cassandra+spark without hadoop, perhaps Stratio Deep is your best choice (https://github.com/Stratio/stratio-deep). It's an open-source Spark + Cassandra connector that doesn't make any use of Hadoop or Hadoop component.

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread Shane Hansen
My understanding is that a update is the same as an insert. So I would think delete+insert is a bad idea. Also insert+delete would put 2 entries in the commit log. On Sep 10, 2014 9:49 AM, Michal Budzyn michalbud...@gmail.com wrote: Is there any serious difference in the used disk and memory

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
Hello Oleg Question 2: yes. The official spark cassandra connector can be found here: https://github.com/datastax/spark-cassandra-connector There is docs in the doc/ folder. You can read write directly from/to Cassandra without EVER using HDFS. You still need a resource manager like Apache

Re: multi datacenter replication

2014-09-10 Thread Jonathan Haddad
Multi-dc is available in every version of Cassandra. On Wed, Sep 10, 2014 at 9:21 AM, Oleg Ruchovets oruchov...@gmail.com wrote: Thank you very much for the links. Just to be sure: is this capability available for COMMUNITY ADDITION? Thanks Oleg. On Wed, Sep 10, 2014 at 11:49 PM, Alain

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread Michal Budzyn
One insert would be much better e.g. for performance and network latency. I wanted to know if there is a significant difference (apart from additional commit log entry) in the used storage between these 2 use cases.

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread olek.stas...@gmail.com
IMHO, delete then insert will take two times more disk space then single insert. But after compaction the difference will disappear. This was true in version prior to 2.0, but it should still work this way. But maybe someone will correct me, if i'm wrong. Cheers, Olek 2014-09-10 18:30 GMT+02:00

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Great stuff Paco. Thanks for sharing. Couple of questions: Is it required additional installation to be HA like apache mesos? Are you supporting PySpark? How stable /ready for production ? Thanks Oleg. On Thu, Sep 11, 2014 at 12:01 AM, Francisco Madrid-Salvador pmad...@stratio.com wrote:

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Thanks for the info. can you share please where can I read about mesos integration for HA and StandAlone mode execution? Thanks Oleg. On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan doanduy...@gmail.com wrote: Hello Oleg Question 2: yes. The official spark cassandra connector can be found

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread Michal Budzyn
Would the factor before compaction be always 2 ? On Wed, Sep 10, 2014 at 6:38 PM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: IMHO, delete then insert will take two times more disk space then single insert. But after compaction the difference will disappear. This was true in version

Cassandra -What is really happens once Key-cache get filled

2014-09-10 Thread Job Thomas
Consider that I have configured 1 Mb of key-cache (Consider it can hold 13000 of keys ). Then I wrote some records in a column family(say 2). Then read it at first (All keys sequentially in the same order used to write ), and keys are started to stored in key-cache. When the read reached

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
As far as I know, the Datastax connector uses thrift to connect Spark with Cassandra although thrift is already deprecated, could someone confirm this point? -- the Scala connector is using the latest Java driver, so no there is no Thrift there. For the Java version, I'm not sure, have not

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
Source code check for the Java version: https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/spark/connector/RDDJavaFunctions.java#L26 It's using the RDDFunctions from scala code so yes, it's Java driver again. On Wed, Sep

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Interesting things actually: We have hadoop in our eco system. It has single point of failure and I am not sure about inter data center replication. Plan is to use cassandra - no single point of failure , there is data center replication. For aggregation/transformation using SPARK. BUT storm

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
Stupid question: do you really need both Storm Spark ? Can't you implement the Storm jobs in Spark ? It will be operationally simpler to have less moving parts. I'm not saying that Storm is not the right fit, it may be totally suitable for some usages. But if you want to avoid the SPOF thing

Mutation Stage does not finish

2014-09-10 Thread Eduardo Cusa
Hello, I have a node that is in MutationStage for the last 5 hours. Actually the node is *down*. The pendings task go from 776 to 110 and then to 964. There are some way to finish this stage? The last heavy write workload was 5 days ago. Pool NameActive Pending

Re: Atomic batch of counters in Cassandra 2.1

2014-09-10 Thread Eugene Voytitsky
On 10.09.14 02:09, Robert Coli wrote: On Tue, Sep 9, 2014 at 2:36 PM, Eugene Voytitsky viy@gmail.com mailto:viy@gmail.com wrote: As I understand, atomic batch for counters can't work correctly (atomically) prior to 2.1 because of counters implementation. [Link:

Re: cassandra + spark / pyspark

2014-09-10 Thread Paco Madrid
Good to know. Thanks, DuyHai! I'll take a look (but most probably tomorrow ;-)) Paco 2014-09-10 20:15 GMT+02:00 DuyHai Doan doanduy...@gmail.com: Source code check for the Java version:

Re: Mutation Stage does not finish

2014-09-10 Thread Robert Coli
On Wed, Sep 10, 2014 at 11:38 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Actually the node is *down*. The node can't be that down if it's printing tpstats... https://issues.apache.org/jira/browse/CASSANDRA-4162 ? =Rob

Re: Mutation Stage does not finish

2014-09-10 Thread Robert Coli
On Wed, Sep 10, 2014 at 12:03 PM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Yes, the tpstats is printing. The Opcenter show the node down. Have you recently restarted it or anything? If not, try doing so? =Rob

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread olek.stas...@gmail.com
I think so. this is how i see it: on the very beginning you have such line in datafile: {key: [col_name, col_value, date_of_last_change]} //something similar, i don't remember now after delete you're adding line: {key:[col_name, last_col_value, date_of_delete, 'd']} //this d indicates that field

Node being rebuilt receives read requests

2014-09-10 Thread Tom van den Berge
I have a datacenter with a single node, and I want to start using vnodes. I have followed the instructions ( http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html), and set up a new node in a new datacenter (auto_bootstrap=false, seed=node in old dc,

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread graham sanderson
delete inserts a tombstone which is likely smaller than the original record (though still (currently) has overhead of cost for full key/column name the data for the insert after a delete would be identical to the data if you just inserted/updated no real benefit I can think of for doing the

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread olek.stas...@gmail.com
You're right, there is no data in tombstone, only a column name. So there is only small overhead of disk size after delete. But i must agree with post above, it's pointless in deleting prior to inserting. Moreover, it needs one op more to compute resulting row. cheers, Olek 2014-09-10 22:18

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread graham sanderson
agreed On Sep 10, 2014, at 3:27 PM, olek.stas...@gmail.com wrote: You're right, there is no data in tombstone, only a column name. So there is only small overhead of disk size after delete. But i must agree with post above, it's pointless in deleting prior to inserting. Moreover, it needs

Re: Mutation Stage does not finish

2014-09-10 Thread Robert Coli
On Wed, Sep 10, 2014 at 12:16 PM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Yes, I restarted the node becaouse the write latency was 2500 ms, when usually is 5 ms. And did that help? =Rob

Re: Performance testing in Cassandra

2014-09-10 Thread Benedict Elliott Smith
With the official release of 2.1, I highly recommend using the new stress tool bundled with it - it is improved in many ways over the tool in 2.0, and is compatible with older clusters. It supports the same simple mode of operation as the old stress, with better command line interface and more

Re: Mutation Stage does not finish

2014-09-10 Thread Benedict Elliott Smith
Could you post the results of jstack on the process somewhere? On Thu, Sep 11, 2014 at 7:07 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Sep 10, 2014 at 1:53 PM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: No, is still running the Mutation Stage. If you're sure that it

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Typo. I am talking about spark only. Thanks Oleg. On Thursday, September 11, 2014, DuyHai Doan doanduy...@gmail.com wrote: Stupid question: do you really need both Storm Spark ? Can't you implement the Storm jobs in Spark ? It will be operationally simpler to have less moving parts. I'm not