is the org.apache.nutch.crawl package a part of the nightly builds? -J ----- Original Message ----- From: "Doug Cutting" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Monday, August 15, 2005 1:13 PM Subject: Re: mapred
> webmaster wrote: > > I need some help with how to use mapred, what are the commands to use with it? > > The mapred work is in progress and is not yet ready for production use. > > In the mapred branch most of the Nutch backend crawling and indexing > commands have been rewritten in terms of MapReduce. The new versions > are in the org.apache.nutch.crawl package. The old versions are still > present in other packages and will be removed later. > > Many of the commands are the same, but the file structures have changed > (e.g., webdb split into crawldb and linkdb), some commands are gone > (e.g., updatesegs) and some new commands have been added (e.g., > invertlinks). Notably still missing is a mapred-based dedup. All > inputs and outputs are directories of files, never single files. So, > for example, the crawl command takes a directory of files containing > root urls rather than a single file containing root urls. > > Things are still in some flux at this point and the documentation has > not yet been updated. The best place to see a typical sequence of > commands is looking at the source code for crawl/Crawl.java. > > Doug > >
