[ 
https://issues.apache.org/jira/browse/NUTCH-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641806#comment-16641806
 ] 

ASF GitHub Bot commented on NUTCH-2630:
---------------------------------------

sebastian-nagel opened a new pull request #387: NUTCH-2630 Fetcher to log 
skipped records by robots.txt
URL: https://github.com/apache/nutch/pull/387
 
 
   Change required log level to INFO (default) for messages reporting skipped 
URLs because of robots.txt rules (disallow or crawl delay larger than 
fetcher.max.crawl.delay).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fetcher to log skipped records by robots.txt
> --------------------------------------------
>
>                 Key: NUTCH-2630
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2630
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.15
>            Reporter: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.16
>
>
> To analyze problems it would be helpful if fetcher logs URLs which are 
> disallowed in the robots.txt - see [discussion on user mailing 
> list|https://lists.apache.org/thread.html/7fe5b02104ea866aba183d009a5fad59ad4e4daf8954593ef0123dd6@%3Cuser.nutch.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to