[ 
https://issues.apache.org/jira/browse/NUTCH-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614115#comment-14614115
 ] 

ASF GitHub Bot commented on NUTCH-1836:
---------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/nutch/pull/45


> Timeouts in protocol-httpclient when crawling same host with >2 threads 
> NUTCH-1613 is not a complete solution
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1836
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1836
>             Project: Nutch
>          Issue Type: Improvement
>          Components: protocol
>    Affects Versions: 1.9
>            Reporter: Adrian Newby
>            Priority: Minor
>
> NUTCH-1613 provided a fix for the hardcoded limitation of 2 threads for 
> protocol-httpclient.  However, just extending the hardwired 10 max threads 
> and allocating them all to a single host only provides a partial solution.  
> It is still possible to exhaust the thread pool and observe timeouts 
> depending on the settings of:
>  - fetcher.threads.per.host (nutch-site.xml)
>  - mapred.tasktracker.map.tasks.maximum (mapred-site.xml)
> It would perhaps be more robust to set the httpclient thread pool as a 
> derivative of these two configuration parameters as below:
> {code}
>     params.setMaxTotalConnections(maxThreadsTotal);
> // Add the following lines ...
>       // 
> --------------------------------------------------------------------------------
>       // Modification to increase the number of available connections for
>       // multi-threaded crawls.
>       // 
> --------------------------------------------------------------------------------
>       
> connectionManager.setMaxConnectionsPerHost(conf.getInt("fetcher.threads.per.host",
>  10));
>       
> connectionManager.setMaxTotalConnections(conf.getInt("mapred.tasktracker.map.tasks.maximum",
>  5) * conf.getInt("fetcher.threads.per.host", 10));
>       LOG.debug("setMaxConnectionsPerHost: " + 
> connectionManager.getMaxConnectionsPerHost());
>       LOG.debug("setMaxTotalConnections  : " + 
> connectionManager.getMaxTotalConnections());
>       // 
> --------------------------------------------------------------------------------
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to