Hi,

Arijit Mukherjee wrote:
Hi

We've been thinking of using Hadoop for a decision making system which
will analyze telecom-related data from various sources to take certain
decisions. The data can be huge, of the order of terabytes, and can be
stored as CSV files, which I understand will fit into Hadoop as Tom
White mentions in the Rough Cut Guide that Hadoop is well suited for
records. The question I want to ask is whether it is possible to perform
statistical analysis on the data using Hadoop and MapReduce. If anyone
has done such a thing, we'd be very interested to know about it. Is it
also possible to create a workflow like functionality with MapReduce?
Hadoop can handle TB data sizes, and statistical data analysis is one of the
perfect things that fit into the mapreduce computation model. You can check
what people are doing with Hadoop at http://wiki.apache.org/hadoop/PoweredBy. I think the best way to see if your requirements can be met by Hadoop/mapreduce is to read the Mapreduce paper by Dean et.al. Also you might be interested in checking out Mahout, which is a subproject of Lucene. They are doing ML on top of Hadoop.

Hadoop is mostly suitable for batch jobs, however these jobs can be chained together to form a workflow. I will try to be more helpful if you could extend what you mean by workflow.

Enis Soztutar

Regards
Arijit

Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com



Reply via email to