[ https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492850 ]
Sami Siren commented on NUTCH-446: ---------------------------------- +1 > RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt > ----------------------------------------------------------------------------- > > Key: NUTCH-446 > URL: https://issues.apache.org/jira/browse/NUTCH-446 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.9.0 > Reporter: Doğacan Güney > Priority: Minor > Fix For: 1.0.0 > > Attachments: crawl-delay.patch, crawl-delay_test.patch > > > RobotRulesParser doesn't check for addRules when reading the crawl-delay > value, so the nutch bot will get the crawl-delay value of another robot's > crawl-delay in robots.txt. > Let me try to be more clear: > User-agent: foobot > Crawl-delay: 3600 > User-agent: * > Disallow: /baz > In such a robots.txt file, nutch bot will get 3600 as its crawl-delay > value, no matter what nutch bot's name actually is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers