Hey, > > Here is an example: > > > > An admin has a public webservice running with folders containing > > sensitive informations. Enter these folders in his robots.txt and > > "protect" them from the indexing process of spiders. As he doesn't > > want the /admin/ gui to appear in the search results he also puts his > > /admin in the robots text and finaly makes a backup to the folder > > /backup.
If no one would know about a folder, why would one add it to robots.txt in the first place? But that's missing the point anyway - robots.txt is not a security mechanism. If someone uses robots.txt as the only and last line of defense he plainly doesn't understand what he's doing (especially that it's one of the first files both pentesters & attackers look at). If someone has an /admin/ site (which is a really easily guessable name, checked by every web directory scanner out there) he cannot rely on concealment*, but on proper user authentication using mechanisms designed for such purpose (e.g. requiring a password). (* for historical reasons there is a Polish IT term for such attempts - "deep hiding", there's even a wiki page on that - http://pl.wikipedia.org/wiki/G%C5%82%C4%99bokie_ukrycie) > I'm wondering if, in perhaps .htaccess, one could allow ONLY site > crawlers access to the robots.txt file. Then add robots.txt to > robots.txt...would this mitigate some of the risk? 1. It's still missing the point. 2. No, it wouldn't work in case of scanners that try to impersonate robots. -- gynvael.coldwind//vx _______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/