That is definitely possible, but may not be very desirable.

Take a look at the Bixo project for a full-scale crawler.  There is a lot of
subtlety in the fetching of URL's
due to the varying quality of different sites and the interaction with crawl
choking due to robots.txt considerations.

http://bixo.101tec.com/

On Thu, Dec 9, 2010 at 11:27 PM, edward choi <[email protected]> wrote:

> So my design is:
> Map phase ==> crawl news articles, process text, write the result to a
> file.
>        II
>        II     pass (term, term_frequency) pair to the Reducer
>        II
>        V
> Reduce phase ==> Merge the (term, term_frequency) pair and create a
> dictionary
>
> Is this at all possible? Or is it inherently impossible due to the
> structure
> of Hadoop?
>

Reply via email to