[ https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doğacan Güney updated NUTCH-446: -------------------------------- Attachment: crawl-delay.patch > RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt > ----------------------------------------------------------------------------- > > Key: NUTCH-446 > URL: https://issues.apache.org/jira/browse/NUTCH-446 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.9.0 > Reporter: Doğacan Güney > Priority: Minor > Fix For: 0.9.0 > > Attachments: crawl-delay.patch > > > RobotRulesParser doesn't check for addRules when reading the crawl-delay > value, so the nutch bot will get the crawl-delay value of another robot's > crawl-delay in robots.txt. > Let me try to be more clear: > User-agent: foobot > Crawl-delay: 3600 > User-agent: * > Disallow: /baz > In such a robots.txt file, nutch bot will get 3600 as its crawl-delay > value, no matter what nutch bot's name actually is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers