Thanx again Enis. I'll have a look at Pig and Hive. Regards Arijit
Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com -----Original Message----- From: Enis Soztutar [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 24, 2008 4:53 PM To: [email protected] Subject: Re: Questions about Hadoop Arijit Mukherjee wrote: > Thanx Enis. > > By workflow, I was trying to mean something like a chain of MapReduce > jobs - the first one will extract a certain amount of data from the > original set and do some computation resulting in a smaller summary, > which will then be the input to a further MR job, and so on...somewhat > similar to a workflow as in the SOA world. > > Yes, you can always chain job together to form a final summary. o.a.h.mapred.jobcontrol.JobControl might be interesting for you. > Is it possible to use statistical analysis tools such as R (or say > PL/R) within MapReduce on Hadoop? As far as I've heard, Greenplum is > working on a custom MapReduce engine over their Greenplum database > which will also support PL/R procedures. > Using R on Hadoop might include some level of custom coding. If you are looking for an ad-hoc tool for data mining, then check Pig and Hive. Enis > Arijit > > Dr. Arijit Mukherjee > Principal Member of Technical Staff, Level-II > Connectiva Systems (I) Pvt. Ltd. > J-2, Block GP, Sector V, Salt Lake > Kolkata 700 091, India > Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com > > > -----Original Message----- > From: Enis Soztutar [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 24, 2008 2:57 PM > To: [email protected] > Subject: Re: Questions about Hadoop > > > Hi, > > Arijit Mukherjee wrote: > >> Hi >> >> We've been thinking of using Hadoop for a decision making system which >> > > >> will analyze telecom-related data from various sources to take certain >> > > >> decisions. The data can be huge, of the order of terabytes, and can be >> > > >> stored as CSV files, which I understand will fit into Hadoop as Tom >> White mentions in the Rough Cut Guide that Hadoop is well suited for >> records. The question I want to ask is whether it is possible to >> perform statistical analysis on the data using Hadoop and MapReduce. >> If anyone has done such a thing, we'd be very interested to know about >> > > >> it. Is it also possible to create a workflow like functionality with >> MapReduce? >> >> > Hadoop can handle TB data sizes, and statistical data analysis is one of > > the > perfect things that fit into the mapreduce computation model. You can > check what people are doing with Hadoop at > http://wiki.apache.org/hadoop/PoweredBy. > I think the best way to see if your requirements can be met by > Hadoop/mapreduce is > to read the Mapreduce paper by Dean et.al. Also you might be interested > in checking out > Mahout, which is a subproject of Lucene. They are doing ML on top of > Hadoop. > > Hadoop is mostly suitable for batch jobs, however these jobs can be > chained together to > form a workflow. I will try to be more helpful if you could extend what > > you mean by workflow. > > Enis Soztutar > > >> Regards >> Arijit >> >> Dr. Arijit Mukherjee >> Principal Member of Technical Staff, Level-II >> Connectiva Systems (I) Pvt. Ltd. >> J-2, Block GP, Sector V, Salt Lake >> Kolkata 700 091, India >> Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com >> >> >> >> > No virus found in this incoming message. > Checked by AVG - http://www.avg.com > Version: 8.0.169 / Virus Database: 270.7.1/1687 - Release Date: > 9/23/2008 6:32 PM > > > > No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database: 270.7.1/1687 - Release Date: 9/23/2008 6:32 PM
