I have written a blog about Hadoop's implementation couple months back here at 
...
http://horicky.blogspot.com/2008/11/hadoop-mapreduce-implementation.html

Note that Hadoop is not about reducing latency.  It is about increasing 
throughput (not throughput per resource) by adding more machines in case your 
problem is "data parallel".

Time-wise:
If it takes T seconds to process B amount of data, then by using Hadoop with N 
machines, you can process it within cT/N seconds where constant c > 1 accounts 
for the overhead.

Space-wise:
If it takes M amount of memory during the processing, then by using Hadoop with 
N machines, you need M/N + c

Bandwidth-wise:
You definitely need more bandwidth because a distributed file system is used.  
And it also depends on your read / write ratio and how many ways of 
replication.  ... Need more time to think of the formula...

Rgds,
Ricky

-----Original Message-----
From: Hadooper [mailto:[email protected]] 
Sent: Tuesday, March 31, 2009 3:35 PM
To: [email protected]
Subject: Re: Please help!

Thanks, Jim.
I am very familiar with Google's original publication.

On Tue, Mar 31, 2009 at 4:31 PM, Jim Twensky <[email protected]> wrote:

> See the original Map Reduce paper by Google at
> http://labs.google.com/papers/mapreduce.html and please don't spam the
> list.
>
> -jim
>
> On Tue, Mar 31, 2009 at 6:15 PM, Hadooper <[email protected]
> >wrote:
>
> > Dear developers,
> >
> > Is there any detailed example of how Hadoop processes input?
> > Article
> > http://hadoop.apache.org/core/docs/r0.19.1/mapred_tutorial.htmlgives
> > a good idea, but I want to see input data being passed from class to
> > class, and how each class manipulates data. The purpose is to analyze the
> > time and space complexity of Hadoop as a generalized computational
> > model/algorithm. I tried to search the web and could not find more
> detail.
> > Any pointer/hint?
> > Thanks a million.
> >
> > --
> > Cheers! Hadoop core
> >
>



-- 
Cheers! Hadoop core

Reply via email to