This is an automated email from the ASF dual-hosted git repository.
snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git
The following commit(s) were added to refs/heads/master by this push:
new 4c74bce NUTCH-2754 fetcher.max.crawl.delay ignored if exceeding 5
min. / 300 sec. - initialize crawler-commons's SimpleRobotRulesParser with the
longest possible internal maxDelay
new b8d1e4f Merge pull request #487 from
commoncrawl/NUTCH-2754-max-crawl-delay
4c74bce is described below
commit 4c74bcece7f743a4ec008550f709c259317c5aa4
Author: Sebastian Nagel <[email protected]>
AuthorDate: Thu Dec 5 13:49:34 2019 +0100
NUTCH-2754 fetcher.max.crawl.delay ignored if exceeding 5 min. / 300 sec.
- initialize crawler-commons's SimpleRobotRulesParser with the longest
possible internal maxDelay
---
src/java/org/apache/nutch/protocol/RobotRulesParser.java | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/java/org/apache/nutch/protocol/RobotRulesParser.java
b/src/java/org/apache/nutch/protocol/RobotRulesParser.java
index 0671a8f..159f34f 100644
--- a/src/java/org/apache/nutch/protocol/RobotRulesParser.java
+++ b/src/java/org/apache/nutch/protocol/RobotRulesParser.java
@@ -78,6 +78,10 @@ public abstract class RobotRulesParser implements Tool {
RobotRulesMode.ALLOW_NONE);
private static SimpleRobotRulesParser robotParser = new
SimpleRobotRulesParser();
+ static {
+ robotParser.setMaxCrawlDelay(Long.MAX_VALUE);
+ }
+
protected Configuration conf;
protected String agentNames;