You can ask (polite) bots to throttle their request rates and simultaneous requests. It think that you'd probably be quite interested in the crawl-delay directive:
http://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_direc tive This is respected by at least MSN and Yahoo. Unfortunately, it looks like google may not (or may?) respect it, they propose this alternative: http://www.google.com/support/webmasters/bin/answer.py?answer=48620 Of course, if you're being scraped by a bot that doesn't respect this directive or a more malicious scraper it won't help you at all. -JohnF > -----Original Message----- > From: Wout Mertens [mailto:[email protected]] > Sent: November 16, 2009 9:19 AM > To: John Lauro > Cc: [email protected] > Subject: Re: Preventing bots from starving other users? > > On Nov 16, 2009, at 2:43 PM, John Lauro wrote: > > > Oopps, my bad... It's actually tc and not iptables. > Google tc qdisc > > for some info. > > > > You could allow your local ips go unrestricted, and > throttle all other IPs > > to 512kb/sec for example. > > Hmmm... The problem isn't the data rate, it's the work > associated with incoming requests. As soon as a 500 byte > request hits, the web server has to do a lot of work. > > > What software is the running on? I assume it's not running > under apache or > > there would be some ways to tune apache. As other have > mentioned, telling > > the crawlers to behave themselves or totally ignore the > wiki with a robots > > file is probably best. > > Well the web server is Apache, but surprisingly Apache > doesn't allow for tuning this particular case. Suppose normal > request traffic looks like (A are users) > > Time -> > > A A AA A A AAA A AA A > > With the bot this becomes > > ABBBBBBBBBB A BBBBA BBA BBBBBA AABBBBBB > > So you can see that normal users are just swamped out of > "slots". The webserver can render about 9 pages at the same > time without impact, but it takes a second or more to render. > At first I set MaxClients to 9, which makes it so the web > server doesn't swap to death, but if the bots have 8 requests > queued up, and then another 8, and another 8, regular users > have no chance of decent interactivity... > > This may be a corner case due to slow serving, because I'm > having a hard time finding a way to throttle the bots. I > suppose that normally you'd just add servers... > > Wout. >

