RE: Questions about Hadoop

Arijit Mukherjee Wed, 24 Sep 2008 04:29:41 -0700

Thanx again Enis. I'll have a look at Pig and Hive.

Regards
Arijit


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com


-----Original Message-----
From: Enis Soztutar [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 24, 2008 4:53 PM
To: [email protected]
Subject: Re: Questions about Hadoop




Arijit Mukherjee wrote:
> Thanx Enis.
>
> By workflow, I was trying to mean something like a chain of MapReduce 
> jobs - the first one will extract a certain amount of data from the 
> original set and do some computation resulting in a smaller summary, 
> which will then be the input to a further MR job, and so on...somewhat

> similar to a workflow as in the SOA world.
>
>   
Yes, you can always chain job together to form a final summary. 
o.a.h.mapred.jobcontrol.JobControl might be interesting for you.
> Is it possible to use statistical analysis tools such as R (or say 
> PL/R) within MapReduce on Hadoop? As far as I've heard, Greenplum is 
> working on a custom MapReduce engine over their Greenplum database 
> which will also support PL/R procedures.
>   
Using R on Hadoop might include some level of custom coding. If you are 
looking for an ad-hoc tool for data mining, then check Pig and Hive.

Enis
> Arijit
>
> Dr. Arijit Mukherjee
> Principal Member of Technical Staff, Level-II
> Connectiva Systems (I) Pvt. Ltd.
> J-2, Block GP, Sector V, Salt Lake
> Kolkata 700 091, India
> Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com
>
>
> -----Original Message-----
> From: Enis Soztutar [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, September 24, 2008 2:57 PM
> To: [email protected]
> Subject: Re: Questions about Hadoop
>
>
> Hi,
>
> Arijit Mukherjee wrote:
>   
>> Hi
>>
>> We've been thinking of using Hadoop for a decision making system
which
>>     
>
>   
>> will analyze telecom-related data from various sources to take
certain
>>     
>
>   
>> decisions. The data can be huge, of the order of terabytes, and can
be
>>     
>
>   
>> stored as CSV files, which I understand will fit into Hadoop as Tom 
>> White mentions in the Rough Cut Guide that Hadoop is well suited for 
>> records. The question I want to ask is whether it is possible to 
>> perform statistical analysis on the data using Hadoop and MapReduce. 
>> If anyone has done such a thing, we'd be very interested to know
about
>>     
>
>   
>> it. Is it also possible to create a workflow like functionality with 
>> MapReduce?
>>   
>>     
> Hadoop can handle TB data sizes, and statistical data analysis is one
of
>
> the
> perfect things that fit into the mapreduce computation model. You can
> check what people are doing with Hadoop at 
> http://wiki.apache.org/hadoop/PoweredBy.
> I think the best way to see if your requirements can be met by 
> Hadoop/mapreduce is
> to read the Mapreduce paper by Dean et.al. Also you might be
interested 
> in checking out
> Mahout, which is a subproject of Lucene. They are doing ML on top of 
> Hadoop.
>
> Hadoop is mostly suitable for batch jobs, however these jobs can be 
> chained together to
> form a workflow.  I will try to be more helpful if you could extend
what
>
> you mean by workflow.
>
> Enis Soztutar
>
>   
>> Regards
>> Arijit
>>
>> Dr. Arijit Mukherjee
>> Principal Member of Technical Staff, Level-II
>> Connectiva Systems (I) Pvt. Ltd.
>> J-2, Block GP, Sector V, Salt Lake
>> Kolkata 700 091, India
>> Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com
>>
>>
>>   
>>     
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com 
> Version: 8.0.169 / Virus Database: 270.7.1/1687 - Release Date:
> 9/23/2008 6:32 PM
>
>
>
>   
No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.169 / Virus Database: 270.7.1/1687 - Release Date:
9/23/2008 6:32 PM

RE: Questions about Hadoop

Reply via email to