[
https://issues.apache.org/jira/browse/NUTCH-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894119#comment-15894119
]
ASF GitHub Bot commented on NUTCH-2364:
---------------------------------------
GitHub user sebastian-nagel opened a pull request:
https://github.com/apache/nutch/pull/179
NUTCH-2364 http.agent.rotate: IllegalArgumentException / last element of
agent names ignored
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sebastian-nagel/nutch NUTCH-2364
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nutch/pull/179.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #179
----
commit e5e67028251e5cc1fdd10ed94103fadff0c41a4a
Author: Sebastian Nagel <[email protected]>
Date: 2017-03-03T10:33:19Z
NUTCH-2364 http.agent.rotate: IllegalArgumentException / last element of
agent names ignored
----
> http.agent.rotate: IllegalArgumentException / last element of agent names
> ignored
> ---------------------------------------------------------------------------------
>
> Key: NUTCH-2364
> URL: https://issues.apache.org/jira/browse/NUTCH-2364
> Project: Nutch
> Issue Type: Bug
> Components: protocol
> Affects Versions: 1.10, 1.11, 1.12
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.13
>
>
> With http.agent.rotate == true and a one-element agent name list, the
> following exception is thrown:
> {noformat}
> % cat .../conf/agents.txt
> my-test-crawler/Nutch-1.13
> % .../bin/nutch parsechecker -Dhttp.agent.rotate=true http://nutch.apache.org/
> ...
> Fetch failed with protocol status: exception(16), lastModified=0:
> java.lang.IllegalArgumentException: bound must be positive
> % cat .../logs/hadoop.log
> ...
> 2017-03-03 11:17:19,750 ERROR http.Http - Failed to get protocol output
> java.lang.IllegalArgumentException: bound must be positive
> at
> java.util.concurrent.ThreadLocalRandom.nextInt(ThreadLocalRandom.java:352)
> at
> org.apache.nutch.protocol.http.api.HttpBase.getUserAgent(HttpBase.java:379)
> at
> org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:180)
> ...
> {noformat}
> Caused by
> {code}
> userAgentNames.get(ThreadLocalRandom.current().nextInt(userAgentNames.size()-1));
> {code}
> but nextInt(...) is defined as: "Returns a pseudorandom int value between
> zero (inclusive) and the specified bound (exclusive)."
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)