On Mon, Dec 10, 2012 at 3:21 PM, James Lay <[email protected]> wrote:
> On 2012-12-10 12:25, Hurgel Bumpf wrote: > > Hi list, > > > > > > i tried to contact google, but as they didn't answer my email, i do > > forward this to FD. > > This "security" feature is not cleary a google vulnerability, but > > exposes websites informations that are not really intended to be > > public. > > > > (Additionally i have to say that i advocate robots.txt files without > > sensitive content and working security mechanisms.) > > > > Here is an example: > > > > An admin has a public webservice running with folders containing > > sensitive informations. Enter these folders in his robots.txt and > > "protect" them from the indexing process of spiders. As he doesn't > > want the /admin/ gui to appear in the search results he also puts his > > /admin in the robots text and finaly makes a backup to the folder > > /backup. > > > > Nevertheless these folders arent browsable but they might contain > > f(a)iles with easy to guess namestructures, non-encrypted > > authentications (simple AUTH) , you name it... > > > > Without a robots.txt nobody would know about the existance of these > > folders, but as some folders might be linked somewhere, these folders > > might appear in search results when not defined in the robots.txt > > The > > admin finds himself in a catch-22 situation where he seems to prefer > > the robots.txt file. > > > > Long story short. > > > > Although google accepts and respects the directives of the robots.txt > > file, google INDEXES these files. > > > > This my concern. > > > > > > > http://www.google.com/search?q=inurl:robots.txt+filetype%3Atxt+Disallow%3A+%2Fadmin > > > > > http://www.google.com/search?q=inurl:robots.txt+filetype%3Atxt+Disallow%3A+%2Fbackup > > > > > http://www.google.com/search?q=inurl:robots.txt+filetype%3Atxt+Disallow%3A+%2Fpassword > > > > As these searches can be used less for targeted attacks, they more > > can be used to find victims. > > > > > > > http://www.google.com/search?q=inurl:robots.txt+filetype%3Atxt+%2FDisallow%3A+wp-admin > > > > > http://www.google.com/search?q=inurl:robots.txt+filetype%3Atxt+%2FDisallow%3A+typo3 > > <Just be creative> > > > > This shouldn't be a discussion about bad practice but the google > > feature itself. > > > > Indexing a file which is used to prevent indexing.. isn't that just > > paradox and hypocrite? > > > > Thanks, > > > > > > Conan the bavarian > > > I'm wondering if, in perhaps .htaccess, one could allow ONLY site > crawlers access to the robots.txt file. Then add robots.txt to > robots.txt...would this mitigate some of the risk? > > James > You'd probably end up accidentally block bots though if you did it via ip range, and you wouldn't be safer at all if you did it by UserAgent string. I think it'd be safer to just assume robots.txt is going to be looked at by someone other than actual robots. > > _______________________________________________ > Full-Disclosure - We believe in it. > Charter: http://lists.grok.org.uk/full-disclosure-charter.html > Hosted and sponsored by Secunia - http://secunia.com/ > -- ==== Q. How many Prolog programmers does it take to change a lightbulb? A. No.
_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
