[ https://issues.apache.org/jira/browse/NUTCH-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614115#comment-14614115 ]
ASF GitHub Bot commented on NUTCH-1836: --------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/45 > Timeouts in protocol-httpclient when crawling same host with >2 threads > NUTCH-1613 is not a complete solution > ------------------------------------------------------------------------------------------------------------- > > Key: NUTCH-1836 > URL: https://issues.apache.org/jira/browse/NUTCH-1836 > Project: Nutch > Issue Type: Improvement > Components: protocol > Affects Versions: 1.9 > Reporter: Adrian Newby > Priority: Minor > > NUTCH-1613 provided a fix for the hardcoded limitation of 2 threads for > protocol-httpclient. However, just extending the hardwired 10 max threads > and allocating them all to a single host only provides a partial solution. > It is still possible to exhaust the thread pool and observe timeouts > depending on the settings of: > - fetcher.threads.per.host (nutch-site.xml) > - mapred.tasktracker.map.tasks.maximum (mapred-site.xml) > It would perhaps be more robust to set the httpclient thread pool as a > derivative of these two configuration parameters as below: > {code} > params.setMaxTotalConnections(maxThreadsTotal); > // Add the following lines ... > // > -------------------------------------------------------------------------------- > // Modification to increase the number of available connections for > // multi-threaded crawls. > // > -------------------------------------------------------------------------------- > > connectionManager.setMaxConnectionsPerHost(conf.getInt("fetcher.threads.per.host", > 10)); > > connectionManager.setMaxTotalConnections(conf.getInt("mapred.tasktracker.map.tasks.maximum", > 5) * conf.getInt("fetcher.threads.per.host", 10)); > LOG.debug("setMaxConnectionsPerHost: " + > connectionManager.getMaxConnectionsPerHost()); > LOG.debug("setMaxTotalConnections : " + > connectionManager.getMaxTotalConnections()); > // > -------------------------------------------------------------------------------- > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)