Re: mapred

Jay Pound Mon, 15 Aug 2005 10:55:53 -0700

is the org.apache.nutch.crawl package a part of the nightly builds?
-J
----- Original Message ----- 
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Monday, August 15, 2005 1:13 PM
Subject: Re: mapred



> webmaster wrote:
> > I need some help with how to use mapred, what are the commands to use
with it?
>
> The mapred work is in progress and is not yet ready for production use.
>
> In the mapred branch most of the Nutch backend crawling and indexing
> commands have been rewritten in terms of MapReduce.  The new versions
> are in the org.apache.nutch.crawl package.  The old versions are still
> present in other packages and will be removed later.
>
> Many of the commands are the same, but the file structures have changed
> (e.g., webdb split into crawldb and linkdb), some commands are gone
> (e.g., updatesegs) and some new commands have been added (e.g.,
> invertlinks).  Notably still missing is a mapred-based dedup.  All
> inputs and outputs are directories of files, never single files.  So,
> for example, the crawl command takes a directory of files containing
> root urls rather than a single file containing root urls.
>
> Things are still in some flux at this point and the documentation has
> not yet been updated.  The best place to see a typical sequence of
> commands is looking at the source code for crawl/Crawl.java.
>
> Doug
>
>

Re: mapred

Reply via email to