Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by Gal Nitzan: http://wiki.apache.org/nutch/MapReduce New page: [http://weblogs.java.net/blog/tomwhite/archive/2005/09/mapreduce.html#more "Excerpt from TomWhite's blog: MapReduce"] MapReduce is the brainchild of Google and is very well documented by Jeffrey Dean and Sanjay Ghemawat in their paper [http://labs.google.com/papers/mapreduce.html "MapReduce: Simplified Data Processing on Large Clusters"]. In essence, it allows massive data sets to be processed in a distributed fashion by breaking the processing into many small computations of two types: a map operation that transforms the input into an intermediate representation, and a reduce function that recombines the intermediate representation into the final output. This processing model is ideal for the operations a search engine indexer like Nutch or Google needs to perform - like computing inlinks for URLs, or building inverted indexes - and it will [http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/mapred.pdf "transform Nutch"] into a scalable, distributed search engine.
