[
https://issues.apache.org/jira/browse/NUTCH-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614063#comment-14614063
]
ASF GitHub Bot commented on NUTCH-1836:
---------------------------------------
GitHub user PeterCiuffetti opened a pull request:
https://github.com/apache/nutch/pull/45
Nutch 2059 - Unit test failures for protocol-http and protocol-httclient
This also incorporates the suggestion in NUTCH-1836, except that the
parameters used to do the suggested computation changed to use the current
parameter name.
Note that while this eliminates some exceptions that were logged during
protocol-httpclient testing, its not certain if this will make any material
difference regarding the Jenkins unit test failures. The test are passing on
my sandbox with or without these changes.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/PeterCiuffetti/nutch NUTCH-2059
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nutch/pull/45.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #45
----
commit 2ac9a4bf251d0b7b5b9d14bb7596b790d67bd785
Author: PeterCiuffetti <[email protected]>
Date: 2015-07-04T20:46:00Z
Eliminating java.lang.IllegalStateException: STREAM in unit tests for
protocol-httpclient. Removing unneessary white space sent to jsp output
commit 38ef6308268a1895a434c8bc6c311a964cf71bfc
Author: PeterCiuffetti <[email protected]>
Date: 2015-07-04T21:33:12Z
Change max thread computations as suggested by NUTCH-1836; code formatting
----
> Timeouts in protocol-httpclient when crawling same host with >2 threads
> NUTCH-1613 is not a complete solution
> -------------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-1836
> URL: https://issues.apache.org/jira/browse/NUTCH-1836
> Project: Nutch
> Issue Type: Improvement
> Components: protocol
> Affects Versions: 1.9
> Reporter: Adrian Newby
> Priority: Minor
>
> NUTCH-1613 provided a fix for the hardcoded limitation of 2 threads for
> protocol-httpclient. However, just extending the hardwired 10 max threads
> and allocating them all to a single host only provides a partial solution.
> It is still possible to exhaust the thread pool and observe timeouts
> depending on the settings of:
> - fetcher.threads.per.host (nutch-site.xml)
> - mapred.tasktracker.map.tasks.maximum (mapred-site.xml)
> It would perhaps be more robust to set the httpclient thread pool as a
> derivative of these two configuration parameters as below:
> {code}
> params.setMaxTotalConnections(maxThreadsTotal);
> // Add the following lines ...
> //
> --------------------------------------------------------------------------------
> // Modification to increase the number of available connections for
> // multi-threaded crawls.
> //
> --------------------------------------------------------------------------------
>
> connectionManager.setMaxConnectionsPerHost(conf.getInt("fetcher.threads.per.host",
> 10));
>
> connectionManager.setMaxTotalConnections(conf.getInt("mapred.tasktracker.map.tasks.maximum",
> 5) * conf.getInt("fetcher.threads.per.host", 10));
> LOG.debug("setMaxConnectionsPerHost: " +
> connectionManager.getMaxConnectionsPerHost());
> LOG.debug("setMaxTotalConnections : " +
> connectionManager.getMaxTotalConnections());
> //
> --------------------------------------------------------------------------------
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)