Since it was bounced and after resend I got no replies for 10 days,
I'm trying to post it to dev@ list...

----- Forwarded message from Martin Mačok <[EMAIL PROTECTED]> -----

Date: Thu, 10 Oct 2002 09:27:13 +0200
From: Martin Mačok <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: robots.txt URL matching (OK in 3.1.x, bad in 3.2.0b)

Hi,
I've (probably) found a bug (with a little help from wwwoffle author
"Andrew M. Bishop" <amb(at)gedanken.demon.co.uk>) in ht://Dig
3.2.0b4-072201 (from Mandrake package) in robots.txt URL matching.

When you disallow "/foo", htdig then rejects "/bar/foo" but according
to http://www.robotstxt.org/wc/norobots.html it should reject only
URLs _starting_ with (not just containing) disallowed string.

I found it with wwwoffle cache indexing scripts. htdig 3.1.x worked
well but after upgrading to 3.2.0b4-072201 it broke. The cached pages
are under "/search/index" directory and "/index" is disallowed. You
can see that 3.2.0b rejects "/search/index" in debug output:

-------------------
Robots.txt line: Disallow: /index
Found 'disallow' line: /index
Pattern: /control|/configuration|/refresh|/monitor|/index
[...]
   pushing http://localhost:8080/search/start3.html
+href: http://localhost:8080/search/index/ (The WWWOFFLE searchable index of all
cached web
 pages)

   Rejected: forbidden by server robots.txt!
-------------------

I'm sorry for not sending a patch, I'm offline now and don't have the
sources on my hdd (and dialup is expensive here through the day) but
I think that it should be trivial to fix.

Thanks a lot and have a nice day

-- 
         Martin Mačok                 http://underground.cz/
   [EMAIL PROTECTED]        http://Xtrmntr.org/ORBman/

      Reclaim your rights!  -  http://www.digitalspeech.org/

----- End forwarded message -----

-- 
Martin Mačok


-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future
of Java(TM) technology. Join the Java Community
Process(SM) (JCP(SM)) program now.
http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to