I wrote: > [...]
> Regarding robots.txt, I've started > https://gerrit.wikimedia.org/r/77916. Toolserver's > robots.txt is: > | User-agent: msnbot > | Disallow: / > | User-agent: * > | Disallow: /~magnus/geo/geohack.php > | Disallow: /~daniel/WikiSense > | Disallow: /~geohack/ > | Disallow: /~enwp10/ > | Disallow: /~cbm/cgi-bin/ > (WikiSense is CatScan IIRC.) Excluding Geohack is probably > a good idea. Do other tool authors have tools they do not > want to be crawled by search engine bots? There was (is) a *lot* of crawler traffic to various tools that are linked to from every article in Wikipedia and which perform expensive calculations when called. In an effort to cope with this, I resorted to robots.txt being: | User-agent: * | Disallow: / for the moment, i. e. disallow *all* crawler accesses *any- where*. Reconsidering, this is probably a better default :-). So: If your tool *needs* to be crawled by search engine bots or this causes other problems for you, please speak up. Also obviously we'd want the central homepage and the list of tools to be indexed, so this won't the final revision of robots.txt. Tim _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
