[
https://issues.apache.org/jira/browse/NUTCH-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614115#comment-14614115
]
ASF GitHub Bot commented on NUTCH-1836:
---------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/nutch/pull/45
> Timeouts in protocol-httpclient when crawling same host with >2 threads
> NUTCH-1613 is not a complete solution
> -------------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-1836
> URL: https://issues.apache.org/jira/browse/NUTCH-1836
> Project: Nutch
> Issue Type: Improvement
> Components: protocol
> Affects Versions: 1.9
> Reporter: Adrian Newby
> Priority: Minor
>
> NUTCH-1613 provided a fix for the hardcoded limitation of 2 threads for
> protocol-httpclient. However, just extending the hardwired 10 max threads
> and allocating them all to a single host only provides a partial solution.
> It is still possible to exhaust the thread pool and observe timeouts
> depending on the settings of:
> - fetcher.threads.per.host (nutch-site.xml)
> - mapred.tasktracker.map.tasks.maximum (mapred-site.xml)
> It would perhaps be more robust to set the httpclient thread pool as a
> derivative of these two configuration parameters as below:
> {code}
> params.setMaxTotalConnections(maxThreadsTotal);
> // Add the following lines ...
> //
> --------------------------------------------------------------------------------
> // Modification to increase the number of available connections for
> // multi-threaded crawls.
> //
> --------------------------------------------------------------------------------
>
> connectionManager.setMaxConnectionsPerHost(conf.getInt("fetcher.threads.per.host",
> 10));
>
> connectionManager.setMaxTotalConnections(conf.getInt("mapred.tasktracker.map.tasks.maximum",
> 5) * conf.getInt("fetcher.threads.per.host", 10));
> LOG.debug("setMaxConnectionsPerHost: " +
> connectionManager.getMaxConnectionsPerHost());
> LOG.debug("setMaxTotalConnections : " +
> connectionManager.getMaxTotalConnections());
> //
> --------------------------------------------------------------------------------
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)