Hi: I am unable to get the attached patch via mail. Its better if you create a JIra issue and attached the patch there.
Thank you. On 2/15/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
Hi, There seems to be two small bugs in lib-http's RobotRulesParser. First is about reading crawl-delay. The code doesn't check for addRules, so the nutch bot will get the crawl-delay value of another robot's crawl-delay in robots.txt. Let me try to be more clear: User-agent: foobot Crawl-delay: 3600 User-agent: * Disallow: In such a robots.txt file, nutch bot will get 3600 as its crawl-delay value, no matter what nutch bot's name actually is. Second is about main method. RobotRulesParser.main advertises its usage as "<robots-file> <url-file> <agent-name>+" but if you give it more than one agent time it refuses it. Trivial patch attached. -- Doğacan Güney
