There is a robots.txt setting that may be of some use.

User-agent: *
Crawl-delay: 0.5

Tells all bots to only hit two pages per second.

I'm pretty sure Google does not follow this particular command, and I
know from sad experience that there are plenty of rogues out there who
will either pay lip service to or ignore the setting.  Google
Webmaster's Tools has a setting inside of it that will allow you ask
nicely to please consider throttling down some IIRC ... but the
reality I have found is - if you have a lot of pages that are
bot-popular... to truly solve the problem, you have to rethink what
you are doing.

A client of mine had a vehicle multiple listing service consisting of
tens of thousands of units up for sale, where each unit generated
three pages (a quick view, a full view and a picture page) and the
units available changed quite a bit in a given day ... bots knew this
and crawled and re-crawled him mercilessly despite all efforts to get
them to tone it down.  We kept throwing hardware at the problem after
increasing efficiency everywhere we could think of, until the next
step was a big one: Multiple CF Enterprise licenses and a cluster.

We found another solution:  Generation of static .html on the back end
as pages change instead of gratuitous use of .cfm's to effectively no
purpose, since the material only changed when the editor changed it
(very infrequent compared to the number of pages views) or the feed
from the third party came in overnight.

This approach increases the server's capacity to handle concurrent
traffic *immensely* but also poses multiple challenges.  Maintaining
session state is not the least of these, but also when dealing with
daily mammoth CSV and XML feeds from third parties, we had tens of
thousands of pages to generate or update (solution: use a second
server on a cheap VPS dedicated to feed processing and page creation).

Its definitely not for everyone.  We got away with it and for that
particular application it was a solution that allowed better overall
performance and low operating cost.  A rare win/win.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:351032
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm

Reply via email to