Since it was bounced and after resend I got no replies for 10 days, I'm trying to post it to dev@ list...
----- Forwarded message from Martin Mačok <[EMAIL PROTECTED]> ----- Date: Thu, 10 Oct 2002 09:27:13 +0200 From: Martin Mačok <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: robots.txt URL matching (OK in 3.1.x, bad in 3.2.0b) Hi, I've (probably) found a bug (with a little help from wwwoffle author "Andrew M. Bishop" <amb(at)gedanken.demon.co.uk>) in ht://Dig 3.2.0b4-072201 (from Mandrake package) in robots.txt URL matching. When you disallow "/foo", htdig then rejects "/bar/foo" but according to http://www.robotstxt.org/wc/norobots.html it should reject only URLs _starting_ with (not just containing) disallowed string. I found it with wwwoffle cache indexing scripts. htdig 3.1.x worked well but after upgrading to 3.2.0b4-072201 it broke. The cached pages are under "/search/index" directory and "/index" is disallowed. You can see that 3.2.0b rejects "/search/index" in debug output: ------------------- Robots.txt line: Disallow: /index Found 'disallow' line: /index Pattern: /control|/configuration|/refresh|/monitor|/index [...] pushing http://localhost:8080/search/start3.html +href: http://localhost:8080/search/index/ (The WWWOFFLE searchable index of all cached web pages) Rejected: forbidden by server robots.txt! ------------------- I'm sorry for not sending a patch, I'm offline now and don't have the sources on my hdd (and dialup is expensive here through the day) but I think that it should be trivial to fix. Thanks a lot and have a nice day -- Martin Mačok http://underground.cz/ [EMAIL PROTECTED] http://Xtrmntr.org/ORBman/ Reclaim your rights! - http://www.digitalspeech.org/ ----- End forwarded message ----- -- Martin Mačok ------------------------------------------------------- This sf.net emial is sponsored by: Influence the future of Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) program now. http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev