Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

Edward Choi Fri, 10 Dec 2010 10:21:18 -0800

Thanks for the tip. I guess it's a little different project from Nutch. My 
understanding is that while Nutch tries to implement a whole web search 
package, Bixo focuses on the crawling part. I should look into both projects 
more deeply. Thanks again!!


Ed

From mp2893's iPhone

On 2010. 12. 11., at 오전 1:15, Ted Dunning <[email protected]> wrote:

> That is definitely possible, but may not be very desirable.
> 
> Take a look at the Bixo project for a full-scale crawler.  There is a lot of
> subtlety in the fetching of URL's
> due to the varying quality of different sites and the interaction with crawl
> choking due to robots.txt considerations.
> 
> http://bixo.101tec.com/
> 
> On Thu, Dec 9, 2010 at 11:27 PM, edward choi <[email protected]> wrote:
> 
>> So my design is:
>> Map phase ==> crawl news articles, process text, write the result to a
>> file.
>>       II
>>       II     pass (term, term_frequency) pair to the Reducer
>>       II
>>       V
>> Reduce phase ==> Merge the (term, term_frequency) pair and create a
>> dictionary
>> 
>> Is this at all possible? Or is it inherently impossible due to the
>> structure
>> of Hadoop?
>>

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

Reply via email to