Backport FetcherJob should run more reduce tasks than default
-------------------------------------------------------------

                 Key: NUTCH-1033
                 URL: https://issues.apache.org/jira/browse/NUTCH-1033
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.3, 1.4
            Reporter: Markus Jelsma
             Fix For: 1.4
         Attachments: NUTCH-1033-1.4-1.patch

Andrzej wrote:"FetcherJob now performs fetching in the reduce phase. This means 
that in a typical Hadoop setup there will be many fewer reduce tasks than map 
tasks, and consequently the max. total throughput of Fetcher will be 
proportionally reduced. I propose that FetcherJob should set the number of 
reduce tasks to the number of map tasks. This way the fetching will be more 
granular."

This issue covers the backport of NUTCH-884 to Nutch 1.4-dev.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to