[ https://issues.apache.org/jira/browse/NUTCH-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641806#comment-16641806 ]
ASF GitHub Bot commented on NUTCH-2630: --------------------------------------- sebastian-nagel opened a new pull request #387: NUTCH-2630 Fetcher to log skipped records by robots.txt URL: https://github.com/apache/nutch/pull/387 Change required log level to INFO (default) for messages reporting skipped URLs because of robots.txt rules (disallow or crawl delay larger than fetcher.max.crawl.delay). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fetcher to log skipped records by robots.txt > -------------------------------------------- > > Key: NUTCH-2630 > URL: https://issues.apache.org/jira/browse/NUTCH-2630 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.15 > Reporter: Markus Jelsma > Priority: Minor > Fix For: 1.16 > > > To analyze problems it would be helpful if fetcher logs URLs which are > disallowed in the robots.txt - see [discussion on user mailing > list|https://lists.apache.org/thread.html/7fe5b02104ea866aba183d009a5fad59ad4e4daf8954593ef0123dd6@%3Cuser.nutch.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)