Re: mapred

Doug Cutting Mon, 15 Aug 2005 10:13:22 -0700

webmaster wrote:

I need some help with how to use mapred, what are the commands to use with it?


The mapred work is in progress and is not yet ready for production use.

In the mapred branch most of the Nutch backend crawling and indexingcommands have been rewritten in terms of MapReduce. The new versionsare in the org.apache.nutch.crawl package. The old versions are stillpresent in other packages and will be removed later.

Many of the commands are the same, but the file structures have changed(e.g., webdb split into crawldb and linkdb), some commands are gone(e.g., updatesegs) and some new commands have been added (e.g.,invertlinks). Notably still missing is a mapred-based dedup. Allinputs and outputs are directories of files, never single files. So,for example, the crawl command takes a directory of files containingroot urls rather than a single file containing root urls.

Things are still in some flux at this point and the documentation hasnot yet been updated. The best place to see a typical sequence ofcommands is looking at the source code for crawl/Crawl.java.


Doug

Re: mapred

Reply via email to