Hello, I run a sort of semi busy wiki, and I have been experiencing difficulties with its CPU load lately, with load jumping to as high as 140 at noon (not 1.4, not 14, but ~140). Obviously this brought the site to a crawl. After investigation I have found the course- multiple diff3 comparisons were called at the same time.
To explain the cause of this needs a little background explanation. The wiki I run deals with the edit of large text files. It is common to see pages with hundreds of kb of pure text on any given wiki page. Normally my servers would be able to handle the edit requests of these pages. However, it seems that searchbots/crawlbots (from both search engines and individual users) have been hitting my wiki pretty hard lately. Each of these bots tries to copy all the pages, this include Revision History of each of these 100kb sized wiki text pages. Since each page could have potentially hundreds of edits, for every single large text files, hundreds of Revision history diff (from lighttpd/apache -> php5 -> diff3? ) are spawned. I have done some testing on my servers, and I found that each diff3 comparison of a typical large text page leads to a 3 increase of CPU load. Right now I have implemented a few temporary restrictions- 1. Limit # of conn per IP 2. Disallow all search bots 3. increase ram limit in php config file 4. Memcache wherever it's possible (not all servers have memcache) I have some problems with 1. and 2. . First of all, 1. doesn't really solve the load problem. The slowdown could still occur if multiple bots hit the site at the same time. 2. faces a similar problem. After I edited my rebots.txt, I discovered that some clowns are ignoring my robots.txt. Also, only Google supports regular expression in robots.txt, so I can't just use Disallow: *diff=* . I don't want to break these large text pages up because it makes it harder for scripts to compile the scripts together from the database directly. So I turn my attention to system level optimization. Does anyone have any experience with messing with diff3? Like for example switching to say libxdiff? Or renice the fcgi? (I use lighttpd) Or is it possible to disable Revision Comparison altogether for pages older than a certain age? Thanks for the help Tim _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
