I hope I have now solved the overload problem that massive crawling has caused to the wiki, and, in consequence, caused PyPI outage.
Following Laura's advice, I added Crawl-delay into robots.txt. Several robots have picked that up, not just msnbot and slurp, but also e.g. MJ12bot. For the others, I had to fine-tune my throttling code, after observing that the expensive URLs are those with a query string. They now account for 3 regular queries (might have to bump this to 5), so you can only do one of them every 6s. For statistics of the load, see http://ximinez.python.org/munin/localdomain/localhost.localdomain-pypitime.html I added accounting of moin.fcgi run times, which shows that Moin produced 15% CPU load on average (PyPI 3%, Postgres 2%) Regards, Martin _______________________________________________ Catalog-SIG mailing list [email protected] http://mail.python.org/mailman/listinfo/catalog-sig
