[ 
https://issues.apache.org/jira/browse/NUTCH-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614063#comment-14614063
 ] 

ASF GitHub Bot commented on NUTCH-1836:
---------------------------------------

GitHub user PeterCiuffetti opened a pull request:

    https://github.com/apache/nutch/pull/45

    Nutch 2059 - Unit test failures for protocol-http and protocol-httclient

    This also incorporates the suggestion in NUTCH-1836, except that the 
parameters used to do the suggested computation changed to use the current 
parameter name.
    
    Note that while this eliminates some exceptions that were logged during 
protocol-httpclient testing, its not certain if this will make any material 
difference regarding the Jenkins unit test failures.  The test are passing on 
my sandbox with or without these changes.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/PeterCiuffetti/nutch NUTCH-2059

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/45.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #45
    
----
commit 2ac9a4bf251d0b7b5b9d14bb7596b790d67bd785
Author: PeterCiuffetti <[email protected]>
Date:   2015-07-04T20:46:00Z

    Eliminating java.lang.IllegalStateException: STREAM in unit tests for 
protocol-httpclient.  Removing unneessary white space sent to jsp output

commit 38ef6308268a1895a434c8bc6c311a964cf71bfc
Author: PeterCiuffetti <[email protected]>
Date:   2015-07-04T21:33:12Z

    Change max thread computations as suggested by NUTCH-1836; code formatting

----


> Timeouts in protocol-httpclient when crawling same host with >2 threads 
> NUTCH-1613 is not a complete solution
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1836
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1836
>             Project: Nutch
>          Issue Type: Improvement
>          Components: protocol
>    Affects Versions: 1.9
>            Reporter: Adrian Newby
>            Priority: Minor
>
> NUTCH-1613 provided a fix for the hardcoded limitation of 2 threads for 
> protocol-httpclient.  However, just extending the hardwired 10 max threads 
> and allocating them all to a single host only provides a partial solution.  
> It is still possible to exhaust the thread pool and observe timeouts 
> depending on the settings of:
>  - fetcher.threads.per.host (nutch-site.xml)
>  - mapred.tasktracker.map.tasks.maximum (mapred-site.xml)
> It would perhaps be more robust to set the httpclient thread pool as a 
> derivative of these two configuration parameters as below:
> {code}
>     params.setMaxTotalConnections(maxThreadsTotal);
> // Add the following lines ...
>       // 
> --------------------------------------------------------------------------------
>       // Modification to increase the number of available connections for
>       // multi-threaded crawls.
>       // 
> --------------------------------------------------------------------------------
>       
> connectionManager.setMaxConnectionsPerHost(conf.getInt("fetcher.threads.per.host",
>  10));
>       
> connectionManager.setMaxTotalConnections(conf.getInt("mapred.tasktracker.map.tasks.maximum",
>  5) * conf.getInt("fetcher.threads.per.host", 10));
>       LOG.debug("setMaxConnectionsPerHost: " + 
> connectionManager.getMaxConnectionsPerHost());
>       LOG.debug("setMaxTotalConnections  : " + 
> connectionManager.getMaxTotalConnections());
>       // 
> --------------------------------------------------------------------------------
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to