Is this the case even when there is an entry in robots.txt for robots.txt Philip Whitehouse
On 11 Dec 2012, at 12:22, Ulisses Montenegro <[email protected]> wrote: > If I understand the OP correctly, he is not stating that listing something in > robots.txt would make it inaccessible, but rather that Google indexes the > robots.txt files themselves, and makes the contexts of those available for > query. So, in a way, they make it easier for Google search results harvesters > to find sites which host files/directories of known applications, while > Google does not index those directories/files themselves because it follows > the robots.txt restrictions. In a nutshell: > > [Attacker] Google, show me sites that have public /wp-admin/ directories. > [Google] I don't know about that, I was not allowed to index those. > [Attacker] Ok, so show me the hosts that have robots.txt files which disallow > indexing /wp-admin/ directories, then... > [Google] Sure thing, here you go! > > Yes, the fact that those resources are out there in the open makes the effort > of hiding them from Google crawlers rather useless, but still Google should > not allow queries on the contents of robots.txt files, as it sort of beats > the purpose of disallowing stuff from being indexed... > > > On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson > <[email protected]> wrote: >> > /From/: Hurgel Bumpf <l0rd_lunatic () yahoo com> >> > /Date/: Mon, 10 Dec 2012 19:25:39 +0000 (GMT) >> > ------------------------------------------------------------------------ >> > Hi list, >> > >> > >> > i tried to contact google, but as they didn't answer my email, i do >> > forward this to FD. >> > This "security" feature is not cleary a google vulnerability, but exposes >> > websites informations that are not really >> > intended to be public. >> > >> > (Additionally i have to say that i advocate robots.txt files without >> > sensitive content and working security mechanisms.) >> > >> > Here is an example: >> > >> > An admin has a public webservice running with folders containing sensitive >> > informations. Enter these folders in his >> > robots.txt and "protect" them from the indexing process of spiders. As he >> > doesn't want the /admin/ gui to appear in the >> > search results he also puts his /admin in the robots text and finaly makes >> > a backup to the folder /backup. >> > >> > <snipped> >> > >> > This shouldn't be a discussion about bad practice but the google feature >> > itself. >> > >> > Indexing a file which is used to prevent indexing.. isn't that just >> > paradox and hypocrite? >> > >> > Thanks, >> > >> > >> > Conan the bavarian >> >> Your point eludes me - Google is indexing something which is publicly >> available. eg.:- curl http://somesite.tld/robots.txt >> So it seems the solution to the "question" your raise is, um, nonsensical. >> >> If you don't want something exposed on your web server *don't publish >> references to it*. >> >> The solution, which should be blindingly obvious, is don't create the >> problem in the first place. Password sensitive directories (htpasswd) - >> then they don't have to be excluded from search engines (because listing >> the inaccessible in robots.txt is redundant). You must of missed the >> first day of web school. >> >> Kind regards. >> >> >> _______________________________________________ >> Full-Disclosure - We believe in it. >> Charter: http://lists.grok.org.uk/full-disclosure-charter.html >> Hosted and sponsored by Secunia - http://secunia.com/ > > > > -- > “If debugging is the process of removing software bugs, then programming must > be the process of putting them in.” - Edsger Dijkstra > _______________________________________________ > Full-Disclosure - We believe in it. > Charter: http://lists.grok.org.uk/full-disclosure-charter.html > Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
