Re: cassandra data to hadoop.

2011-12-23 Thread ravikumar visweswara
Jeremy, We use cloudera distribution for our hadoop cluster and may not be possible to migrate to brisk quickly because of flume/hue dependencies. Did you successfully pull the data from independent cassandra cluster and dump into completely disconnected hadoop cluster? It will be really helpful i

Re: Best way to determine how a Cassandra cluster is doing

2011-12-23 Thread Pierre-Luc Brunet
I was under the impression that Opscenter was only compatible with the DataStax version of Cassandra. I'll give that a shot :) Thank you. On 2011-12-23, at 6:49 PM, Jeremy Hanna wrote: > One way to get a good bird's eye view of the cluster would be to install > DataStax Opscenter - the commu

Re: Best way to determine how a Cassandra cluster is doing

2011-12-23 Thread Jeremy Hanna
One way to get a good bird's eye view of the cluster would be to install DataStax Opscenter - the community edition is free. You can do a lot of checks from a web interface that are based on the jmx hooks that are in Cassandra. We use it and it's helped us a lot. Hope it helps for what you're

Best way to determine how a Cassandra cluster is doing

2011-12-23 Thread Pierre-Luc Brunet
I just imported a lot of data in a 9 node Cassandra cluster and before I create a new ColumnFamily with even more data, I'd like to be able to determine how full my cluster currently is (in terms of memory usage). I'm not too sure what I need to look at. I don't want to import another 20-30GB of

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
We currently have cassandra nodes co-located with hadoop nodes and do a lot of data analytics with it. We've looked at brisk - brisk still open-source and available but datastax is putting its resources in a closed version of brisk as part of datastax enterprise. We'll likely be moving to that

Re: cassandra data to hadoop.

2011-12-23 Thread Praveen Sadhu
Have you tried Brisk? On Dec 23, 2011, at 9:30 AM, "Jeremy Hanna" wrote: > We do this all the time. Take a look at > http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use > mapreduce or pig to get data out of cassandra. If it's going to a separate > hadoop cluster,

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
We do this all the time. Take a look at http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use mapreduce or pig to get data out of cassandra. If it's going to a separate hadoop cluster, I don't think you'd need to co-locate task trackers or data nodes on your cassandra

Re: Read TPS in 0.8.6

2011-12-23 Thread Radim Kolar
cassandra read performance depends on your disk cache (free memory at node not used by cassandra) and disk IOPS peformance. In ideal case (no need to merge sstables) cassandra needs 2 IOPS per data read if cassandra key/row caches are not used. Standard hard drive has about 150 IOPS. If you ha

Re: cassandra data to hadoop.

2011-12-23 Thread Brian O'Neill
I'm not sure this is much help, but we actually run Hadoop jobs to load and extract data to and from HDFS. You can use ColumnFamilyInputFormat to race over the data in Cassandra and output it to a file. That doesn't solve the continuous problem, but should give you a batch mechanism to refresh th

cassandra data to hadoop.

2011-12-23 Thread ravikumar visweswara
Hello All, I have a situation to dump cassandra data to hadoop cluster for further analytics. Lot of other relevant data which is not present in cassandra is already available in hdfs for analysis. Both are independent clusters right now. Is there a suggested way to get the data periodically or co

Re: Cassandra stress test and max vs. average read/write latency.

2011-12-23 Thread Peter Fales
Peter, Thanks for your response. I'm looking into some of the ideas in your other recent mail, but I had another followup question on this one... Is there any way to control the CPU load when using the "stress" benchmark? I have some control over that with our home-grown benchmark, but I thought

RE: Garbage collection freezes cassandra node

2011-12-23 Thread Rene Kochen
Thanks for your quick response! I am currently running the performance tests with extended gc logging. I will post the gc logging if clients time out at the same moment that the full garbage collect runs. Thanks Rene -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-23 Thread aaron morton
No problems. IMHO you should develop a sizable bruise banging your head against a using Standard CF's and the Random Partitioner before using something else. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/12/2011, at 6:29 AM, Bryce A

Re: Routine nodetool repair

2011-12-23 Thread aaron morton
Next time I will finish my morning coffee first :) A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/12/2011, at 5:08 AM, Peter Schuller wrote: >> One other thing to consider is are you creating a few very large rows ? You >> can check the min,

Re: Counters and Top 10

2011-12-23 Thread aaron morton
Counters only update the value of the column, they cannot be used as column names. So you cannot have a dynamically updating top ten list using counters. You have a couple of options. First use something like redis if that fits your use case. Redis could either be the database of record for the

Read TPS in 0.8.6

2011-12-23 Thread Jeesoo Shin
Hi, I'm doing stress test with tools/stress (java version) I used 3 EC2 Xlarge with 4 raid instance storage for cluster. I get write TPS min 4000 & up to 1 but I only get 50 for read TPS. is this right? what am I doing wrong? these are option. java -jar stress.jar --nodes=ip1,ip2,ip3 --consist