On Tue, 22 May 2007 15:05:48 +0100 Duncan Coutts <[EMAIL PROTECTED]> wrote:
> On Tue, 2007-05-22 at 14:40 +0100, Claus Reinke wrote: > > > so the situation for mailing lists and online docs seems to have > > improved, but there is still the wiki indexing/rogue bot issue, > > and lots of fine tuning (together with watching the logs to spot > > any issues arising out of relaxing those restrictions). perhaps > > someone on this list would be willing to volunteer to look into > > those robots/indexing issues on haskell.org?-) > > The main problem, and the reason for the original (temporary!) measure > was bots indexing all possible diffs between old versions of wiki > pages. URLs like: > > http://haskell.org/haskellwiki/?title=Quicksort&diff=9608&oldid=9607 > > For pages with long histories this O(n^2) number of requests starts to > get quite large and the wiki engine does not seem well optimised for > getting arbitrary diffs. So we ended up with bots holding open many > http server connections. They were not actually causing much server > cpu load or generating much traffic but once the number of nearly hung > connections got up to the http child process limit then we are > effectively in a DOS situation. > > So if we can ban bots from the page histories or turn them off for the > bot user agents or something then we might have a cure. Perhaps we > just need to upgrade our media wiki software or find out how other > sites using this software deal with the same issue of bots reading > page histories. http://en.wikipedia.org/robots.txt Wikipedia uses URLs starting with /w/ for "dynamic" pages (well, all pages are dynamic in a sense, but you know what I mean I hope.) And then puts /w/ in robots.txt. -- Robin _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe