This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git


The following commit(s) were added to refs/heads/master by this push:
     new 4c74bce  NUTCH-2754 fetcher.max.crawl.delay ignored if exceeding 5 
min. / 300 sec. - initialize crawler-commons's SimpleRobotRulesParser with the 
longest   possible internal maxDelay
     new b8d1e4f  Merge pull request #487 from 
commoncrawl/NUTCH-2754-max-crawl-delay
4c74bce is described below

commit 4c74bcece7f743a4ec008550f709c259317c5aa4
Author: Sebastian Nagel <[email protected]>
AuthorDate: Thu Dec 5 13:49:34 2019 +0100

    NUTCH-2754 fetcher.max.crawl.delay ignored if exceeding 5 min. / 300 sec.
    - initialize crawler-commons's SimpleRobotRulesParser with the longest
      possible internal maxDelay
---
 src/java/org/apache/nutch/protocol/RobotRulesParser.java | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/java/org/apache/nutch/protocol/RobotRulesParser.java 
b/src/java/org/apache/nutch/protocol/RobotRulesParser.java
index 0671a8f..159f34f 100644
--- a/src/java/org/apache/nutch/protocol/RobotRulesParser.java
+++ b/src/java/org/apache/nutch/protocol/RobotRulesParser.java
@@ -78,6 +78,10 @@ public abstract class RobotRulesParser implements Tool {
       RobotRulesMode.ALLOW_NONE);
 
   private static SimpleRobotRulesParser robotParser = new 
SimpleRobotRulesParser();
+  static {
+    robotParser.setMaxCrawlDelay(Long.MAX_VALUE);
+  }
+
   protected Configuration conf;
   protected String agentNames;
 

Reply via email to