I have written a blog about Hadoop's implementation couple months back here at ... http://horicky.blogspot.com/2008/11/hadoop-mapreduce-implementation.html
Note that Hadoop is not about reducing latency. It is about increasing throughput (not throughput per resource) by adding more machines in case your problem is "data parallel". Time-wise: If it takes T seconds to process B amount of data, then by using Hadoop with N machines, you can process it within cT/N seconds where constant c > 1 accounts for the overhead. Space-wise: If it takes M amount of memory during the processing, then by using Hadoop with N machines, you need M/N + c Bandwidth-wise: You definitely need more bandwidth because a distributed file system is used. And it also depends on your read / write ratio and how many ways of replication. ... Need more time to think of the formula... Rgds, Ricky -----Original Message----- From: Hadooper [mailto:[email protected]] Sent: Tuesday, March 31, 2009 3:35 PM To: [email protected] Subject: Re: Please help! Thanks, Jim. I am very familiar with Google's original publication. On Tue, Mar 31, 2009 at 4:31 PM, Jim Twensky <[email protected]> wrote: > See the original Map Reduce paper by Google at > http://labs.google.com/papers/mapreduce.html and please don't spam the > list. > > -jim > > On Tue, Mar 31, 2009 at 6:15 PM, Hadooper <[email protected] > >wrote: > > > Dear developers, > > > > Is there any detailed example of how Hadoop processes input? > > Article > > http://hadoop.apache.org/core/docs/r0.19.1/mapred_tutorial.htmlgives > > a good idea, but I want to see input data being passed from class to > > class, and how each class manipulates data. The purpose is to analyze the > > time and space complexity of Hadoop as a generalized computational > > model/algorithm. I tried to search the web and could not find more > detail. > > Any pointer/hint? > > Thanks a million. > > > > -- > > Cheers! Hadoop core > > > -- Cheers! Hadoop core
