Much agree with these recommendations.

> On Oct 25, 2014, at 1:36 AM, "Federico Leva (Nemo)" <[email protected]> 
> wrote:
> 
> As Nuria, Billinghurst and others said, the tools are expected to be 
> discoverable. It's easy enough not to throw away the baby with the 
> bathwather*.
> * Dynamic pages generally have some URL parameters, usually indicated by ?. 
> In the general robots.txt, disallow Googlebot and friends** to crawl those, 
> with appropriate wildcards, as per 
> https://support.google.com/webmasters/answer/6062596
> * If it's not enough, add URL patterns with several /
> * If it's not enough, reduce the global crawl-delay (apparently not possible 
> per-folder) https://support.google.com/webmasters/answer/48620
> * If it's not enough, at the very least the main page for each tool should be 
> crawled, disallowing at most //tools.wmflabs.org/*/*
> 
> Nemo
> 
> (*) Even Toolserver managed, with way less resources.
> (**) But not ia_archiver if at all possible, please.
> 
> _______________________________________________
> Labs-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/labs-l

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to