Re: Map/Reduce over Cassandra

2010-08-18 Thread Drew Dahlke
Hey Bill,

A few months ago we did an experiment with 5 hadoop nodes pulling from
4 cass nodes. It was pulling down 1 column family with 8 small columns
 just dumping the raw data to hdfs. It was cycling through around 17K
map tasks per sec. The machines weren't being taxed too hard, so I'm
sure there's some concurrency tuning we could have done to speed that
up. Unfortunately we don't have that same data on HDFS yet, so I can't
really give a direct comparison.

Hope that helps. I'm curious what others have seen as well.

On Tue, Aug 17, 2010 at 6:59 PM, Bill Hastings bllhasti...@gmail.com wrote:
 Hi All
 How performant is M/R on Cassandra when compared to running it on HDFS?
 Anyone have any numbers they can share? Specifically how much of data the
 M/R job was run against and what was the throughput etc. Any information
 would be very helpful.

 --
 Cheers
 Bill



Map/Reduce over Cassandra

2010-08-17 Thread Bill Hastings
Hi All

How performant is M/R on Cassandra when compared to running it on HDFS?
Anyone have any numbers they can share? Specifically how much of data the
M/R job was run against and what was the throughput etc. Any information
would be very helpful.

-- 
Cheers
Bill