map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
better than a tool written to HDFS and adapted. I hear people saying Map/Reduce on Cassandra/HBase is usually 30% slower than M/R in HDFS. Does it really make sense? Should we expect a result like this? Final question: Do you think writting a new M/R tool like described would be reinventing the wheel

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
on the maximum capacity of a single host, but my guess is that a map / reduce tool written specifically to Cassandra, from the beggining, could perform much better than a tool written to HDFS and adapted. I hear people saying Map/Reduce on Cassandra/HBase is usually 30% slower than M/R in HDFS. Does

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
depend on the maximum capacity of a single host, but my guess is that a map / reduce tool written specifically to Cassandra, from the beggining, could perform much better than a tool written to HDFS and adapted. I hear people saying Map/Reduce on Cassandra/HBase is usually 30% slower than

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
, but my guess is that a map / reduce tool written specifically to Cassandra, from the beggining, could perform much better than a tool written to HDFS and adapted. I hear people saying Map/Reduce on Cassandra/HBase is usually 30% slower than M/R in HDFS. Does it really make sense? Should

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
written to HDFS and adapted. I hear people saying Map/Reduce on Cassandra/HBase is usually 30% slower than M/R in HDFS. Does it really make sense? Should we expect a result like this? Final question: Do you think writting a new M/R tool like described would be reinventing

Re: map reduce for Cassandra

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 10:54 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: My understanding (please some correct me if I am wrong) is that when you insert N items in a Cassandra CF, you are executing N binary searches to insert the item already indexed by a key. When you read

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi Robert, First of all, thanks for answering. 2014-07-21 20:18 GMT-03:00 Robert Coli rc...@eventbrite.com: You're wrong, unless you're talking about insertion into a memtable, which you probably aren't and which probably doesn't actually work that way enough to be meaningful. On disk,

Re: map reduce for Cassandra

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Although several sstables (disk fragments) may have the same row key, inside a single sstable row keys and column keys are indexed, right? Otherwise, doing a GET in Cassandra would take some time. From

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi, But if you are only relying on memtables to sort writes, that seems like a pretty heavyweight reason to use Cassandra? Actually, it's not a reason to use Cassandra. I already use Cassandra and I need to map reduce data from it. I am trying to see a reason to use the conventional M/R

Re: Pig / Map Reduce on Cassandra

2013-03-18 Thread cscetbon.ext
, January 17, 2013 8:58 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those who use Hadoop. I just want to use pig and hive

Re: Pig / Map Reduce on Cassandra

2013-03-14 Thread aaron morton
@cassandra.apache.org Date: Thursday, January 17, 2013 8:58 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those who use Hadoop. I just want to use pig and hive on cassandra. I know that pig

Re: Pig / Map Reduce on Cassandra

2013-03-13 Thread cscetbon.ext
@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, January 17, 2013 8:58 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those who use

Re: Pig / Map Reduce on Cassandra

2013-03-13 Thread cscetbon.ext
@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those who use Hadoop. I just want to use pig and hive on cassandra. I know that pig samples are provided and work now with cassandra natively (they are part of the core). However

Re: Pig / Map Reduce on Cassandra

2013-03-13 Thread cscetbon.ext
AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those who use Hadoop. I just want to use pig and hive on cassandra. I know

Re: Pig / Map Reduce on Cassandra

2013-03-11 Thread cscetbon.ext
@orange.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, January 17, 2013 8:58 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand

Re: Pig / Map Reduce on Cassandra

2013-03-11 Thread aaron morton
someone else will have to answer that question. From: cscetbon@orange.com Reply-To: user@cassandra.apache.org Date: Thursday, January 17, 2013 8:58 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can

Re: Pig / Map Reduce on Cassandra

2013-01-18 Thread aaron morton
: Thursday, January 17, 2013 8:58 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those who use Hadoop. I just want to use pig and hive on cassandra. I know that pig samples are provided

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread cscetbon.ext
what do you mean ? it's not needed by Pig or Hive to access Cassandra data. Regards On Jan 16, 2013, at 11:14 PM, Brandon Williams dri...@gmail.commailto:dri...@gmail.com wrote: You won't get CFS, but it's not a hard requirement, either.

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread James Schappet
-- http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra- 1.2.0-src.tar.gz --Jimmy From: cscetbon@orange.com Reply-To: user@cassandra.apache.org Date: Thursday, January 17, 2013 6:35 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Pig / Map Reduce

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread cscetbon.ext
@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra what do you mean ? it's not needed by Pig or Hive to access Cassandra data. Regards On Jan 16, 2013, at 11:14 PM, Brandon Williams dri...@gmail.commailto:dri...@gmail.com wrote: You won't get CFS, but it's not a hard requirement, either

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread James Schappet
someone else will have to answer that question. From: cscetbon@orange.com Reply-To: user@cassandra.apache.org Date: Thursday, January 17, 2013 8:58 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread cscetbon.ext
user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those who use Hadoop. I just want to use pig and hive on cassandra. I know that pig samples are provided and work now with cassandra natively

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread James Lyons
to answer that question. From: cscetbon@orange.com Reply-To: user@cassandra.apache.org Date: Thursday, January 17, 2013 8:58 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Pig / Map Reduce on Cassandra Jimmy, I understand that CFS can replace HDFS for those

Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ? I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread James Schappet
Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara. https://github.com/jschappet/medline You will also need: https://github.com/jschappet/medline-base These examples are Hadoop Jobs using Cassandra as the Data Store. This one is a good place to

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
I don't want to write to Cassandra as it replicates data from another datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from it. I would like to use the same configuration as http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster but I want to know

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread James Schappet
Try this one then, it reads from cassandra, then writes back to cassandra, but you could change the write to where ever you would like. getConf().set(IN_COLUMN_NAME, columnName ); Job job = new Job(getConf(), ProcessRawXml);

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread Michael Kjellman
Brisk is pretty much stagnant. I think someone forked it to work with 1.0 but not sure how that is going. You'll need to pay for DSE to get CFS (which is essentially Brisk) if you want to use any modern version of C*. Best, Michael On 1/16/13 11:17 AM, cscetbon@orange.com

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
Here is the point. You're right this github repository has not been updated for a year and a half. I thought brisk was just a bundle of some technologies and that it was possible to install the same components and make them work together without using this bundle :( On Jan 16, 2013, at 8:22

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread Brandon Williams
On Wed, Jan 16, 2013 at 2:37 PM, cscetbon@orange.com wrote: Here is the point. You're right this github repository has not been updated for a year and a half. I thought brisk was just a bundle of some technologies and that it was possible to install the same components and make them work

Map Reduce and Cassandra with Trigger patch

2012-11-26 Thread Felipe Schmidt
I'm having some problems during running a Map Reduce program using Cassandra as input. I already right some MapRed programs using the cassandra 1.0.9, but now I'm trying with an old version with a patch that supports trigger. (this one: https://issues.apache.org/jira/browse/CASSANDRA-1311) When I

Re: Map/Reduce over Cassandra

2010-08-18 Thread Drew Dahlke
Hey Bill, A few months ago we did an experiment with 5 hadoop nodes pulling from 4 cass nodes. It was pulling down 1 column family with 8 small columns just dumping the raw data to hdfs. It was cycling through around 17K map tasks per sec. The machines weren't being taxed too hard, so I'm sure

Map/Reduce over Cassandra

2010-08-17 Thread Bill Hastings
Hi All How performant is M/R on Cassandra when compared to running it on HDFS? Anyone have any numbers they can share? Specifically how much of data the M/R job was run against and what was the throughput etc. Any information would be very helpful. -- Cheers Bill