webmaster wrote:
I need some help with how to use mapred, what are the commands to use with it?
The mapred work is in progress and is not yet ready for production use.
In the mapred branch most of the Nutch backend crawling and indexing
commands have been rewritten in terms of MapReduce. The new versions
are in the org.apache.nutch.crawl package. The old versions are still
present in other packages and will be removed later.
Many of the commands are the same, but the file structures have changed
(e.g., webdb split into crawldb and linkdb), some commands are gone
(e.g., updatesegs) and some new commands have been added (e.g.,
invertlinks). Notably still missing is a mapred-based dedup. All
inputs and outputs are directories of files, never single files. So,
for example, the crawl command takes a directory of files containing
root urls rather than a single file containing root urls.
Things are still in some flux at this point and the documentation has
not yet been updated. The best place to see a typical sequence of
commands is looking at the source code for crawl/Crawl.java.
Doug