Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by Gal Nitzan:
http://wiki.apache.org/nutch/MapReduce

New page:

[http://weblogs.java.net/blog/tomwhite/archive/2005/09/mapreduce.html#more 
"Excerpt from TomWhite's blog: MapReduce"]
MapReduce is the brainchild of Google and is very well documented by Jeffrey 
Dean and Sanjay Ghemawat in their paper 
[http://labs.google.com/papers/mapreduce.html "MapReduce: Simplified Data 
Processing on Large Clusters"]. In essence, it allows massive data sets to be 
processed in a distributed fashion by breaking the processing into many small 
computations of two types: a map operation that transforms the input into an 
intermediate representation, and a reduce function that recombines the 
intermediate representation into the final output. This processing model is 
ideal for the operations a search engine indexer like Nutch or Google needs to 
perform - like computing inlinks for URLs, or building inverted indexes - and 
it will 
[http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/mapred.pdf
 "transform Nutch"] into a scalable, distributed search engine.

Reply via email to