Much agree with these recommendations. > On Oct 25, 2014, at 1:36 AM, "Federico Leva (Nemo)" <[email protected]> > wrote: > > As Nuria, Billinghurst and others said, the tools are expected to be > discoverable. It's easy enough not to throw away the baby with the > bathwather*. > * Dynamic pages generally have some URL parameters, usually indicated by ?. > In the general robots.txt, disallow Googlebot and friends** to crawl those, > with appropriate wildcards, as per > https://support.google.com/webmasters/answer/6062596 > * If it's not enough, add URL patterns with several / > * If it's not enough, reduce the global crawl-delay (apparently not possible > per-folder) https://support.google.com/webmasters/answer/48620 > * If it's not enough, at the very least the main page for each tool should be > crawled, disallowing at most //tools.wmflabs.org/*/* > > Nemo > > (*) Even Toolserver managed, with way less resources. > (**) But not ia_archiver if at all possible, please. > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
