mirror between github and apache

2014-12-04 Thread Christian Andersson
Hi, Have a question regarding the mirror of git://git.apache.org/cassandra.git and github. How are the repositories synchronized? Manually, some kind of automatic job doing this.. //chibbe

[Import csv to Cassandra] Taking too much time

2014-12-04 Thread 严超
Hi, Everyone: I'm importing a CSV file into Cassandra, and it always get error:Request did not complete within rpc_timeout , then I have to continue my COPY command of cql again. And the CSV file is 2.2 G . It is taking a long time. How can I speed up csv file importing ? Is there

Re: Wide rows best practices and GC impact

2014-12-04 Thread Jabbar Azam
Hello, I saw this earlier yesterday but didn't want to reply because I didn't know what the cause was. Basically I using wide rows with cassandra 1.x and was inserting data constantly. After about 18 hours the JVM would crash with a dump file. For some reason I removed the compaction throttling

2.0.10 upgrade to 2.1.2 gives Unable to gossip with any seeds

2014-12-04 Thread sinonim
Hi all, We have the case of a cassandra cluster with nodes version 2.0.10, all in a single EC2 region. We want to perform a rolling upgrade to version 2.1.2 but the new node has the following exceptions: java.lang.RuntimeException: Unable to gossip with any seeds at

Re: 2.0.10 upgrade to 2.1.2 gives Unable to gossip with any seeds

2014-12-04 Thread Neha
Check if u have rpc_server = hsha .. Change it to sync and try .. Sent from my iPhone On Dec 4, 2014, at 3:55 PM, sinonim sino...@gmail.com wrote: Hi all, We have the case of a cassandra cluster with nodes version 2.0.10, all in a single EC2 region. We want to perform a rolling upgrade to

Re: [Import csv to Cassandra] Taking too much time

2014-12-04 Thread Akshay Ballarpure
Hello Chao Yan, CSV data import using Copy command in cassandra is always painful for large size file (say 1Gig). CQL tool is not developed for performing such heavy operations instead try using SSTableLoader to import. Best Regards Akshay From: 严超 yanchao...@gmail.com To:

Re: [Import csv to Cassandra] Taking too much time

2014-12-04 Thread 严超
Thank you very much for your advice. Can you give me more advice for using SSTableLoader to import csv ? What is the best practice to use SStableLoader importing csv in Cassandra ? *Best Regards!* *Chao Yan--**My twitter:Andy Yan @yanchao727 https://twitter.com/yanchao727* *My

Re: [Import csv to Cassandra] Taking too much time

2014-12-04 Thread Yuki Morishita
Here's blog post about writing SSTables from CSV and using SSTableLoader to load them. http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated On Thu, Dec 4, 2014 at 5:57 AM, 严超 yanchao...@gmail.com wrote: Thank you very much for your advice. Can you give me more advice for

Replacing a dead node by deleting it and auto_bootstrap'ing a new node (Cassandra 2.0)

2014-12-04 Thread Omri Bahumi
Hi, I was wondering, how would auto_bootstrap behave in this scenario: 1. I had a cluster with 3 nodes (RF=2) 2. One node died, I deleted it with nodetool removenode (+ force) 3. A new node launched with auto_bootstrap: true The question is: will the right vnodes go to the new node as if it was

Repair taking many snapshots per minute

2014-12-04 Thread Robert Wille
This is a follow-up to my previous post “Cassandra taking snapshots automatically?”. I’ve renamed the thread to better describe the new information I’ve discovered. I have a four node, RF=3, 2.0.11 cluster that was producing snapshots at a prodigious rate. I let the cluster sit idle overnight

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Tyler Hobbs
On Wed, Dec 3, 2014 at 11:02 PM, Dong Dai daidon...@gmail.com wrote: 1) except I am using TokenAwarePolicy, the async insert also can not be sent to the right coordinator. Yes. Of course, TokenAwarePolicy can wrap any other policy. 2) the TokenAwarePolicy actually is doing the job that

Re: mirror between github and apache

2014-12-04 Thread Tyler Hobbs
The Apache git repo is the main repo. The github repo is periodically synched (I believe every few hours). On Thu, Dec 4, 2014 at 2:39 AM, Christian Andersson chi...@gmail.com wrote: Hi, Have a question regarding the mirror of git://git.apache.org/cassandra.git and github. How are the

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Dong Dai
On Dec 4, 2014, at 11:37 AM, Tyler Hobbs ty...@datastax.com wrote: On Wed, Dec 3, 2014 at 11:02 PM, Dong Dai daidon...@gmail.com mailto:daidon...@gmail.com wrote: 1) except I am using TokenAwarePolicy, the async insert also can not be sent to the right coordinator. Yes. Of

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Tyler Hobbs
On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai daidon...@gmail.com wrote: As we already did what coordinators do in client side, why don’t we do one step more: break the UNLOGGED batch statements into several small batch statements, each of which contains the statements with the same partition

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Shane Hansen
I'd be really interested to know what sort of performance or load improvements you see by doing client side partitioning. Please post back some results if you've tried that strategy. On Thu, Dec 4, 2014 at 11:46 AM, Tyler Hobbs ty...@datastax.com wrote: On Thu, Dec 4, 2014 at 11:50 AM, Dong

full gc too often

2014-12-04 Thread Philo Yang
Hi,all I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full gc that sometime there may be one or two nodes full gc more than one time per minute and over 10 seconds each time, then the node will be unreachable and the latency of cluster will be increased. I grep the

Re: full gc too often

2014-12-04 Thread Tim Heckman
On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote: Hi,all I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full gc that sometime there may be one or two nodes full gc more than one time per minute and over 10 seconds each time, then the node will be unreachable and

Re: full gc too often

2014-12-04 Thread Philo Yang
I have two kinds of machine: 16G RAM, with default heap size setting, about 4G. 64G RAM, with default heap size setting, about 8G. These two kinds of nodes have same number of vnodes, and both of them have gc issue, although the node of 16G have a higher probability of gc issue. Thanks, Philo

Re: Replacing a dead node by deleting it and auto_bootstrap'ing a new node (Cassandra 2.0)

2014-12-04 Thread Jaydeep Chovatia
as per my knowledge if you have externally NOT specified -Dcassandra.replace_address=old_node_ipaddress then new tokens (randomly) would get assigned to bootstrapping node instead of tokens of dead node. -jaydeep On Thu, Dec 4, 2014 at 6:50 AM, Omri Bahumi om...@everything.me wrote: Hi, I

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Dong Dai
On Dec 4, 2014, at 1:46 PM, Tyler Hobbs ty...@datastax.com wrote: On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai daidon...@gmail.com mailto:daidon...@gmail.com wrote: As we already did what coordinators do in client side, why don’t we do one step more: break the UNLOGGED batch statements

Re: full gc too often

2014-12-04 Thread Jonathan Haddad
I recommend reading through https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of how the JVM GC works and what you can do to tune it. Also good is Blake Eggleston's writeup which can be found here: http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html