Sorry, meant to post the whole thing. :)
For the last few weeks I've been having some problems with House of Fusion.
The memory for the JRun.exe has been going through the roof and I didn't know
why. The code was tight, nothing had really changed on the site, so what was
up? The answer was Yahoo.
In the last 3 weeks Yahoo has ramped up their indexing of sites. For a site as
large as House of Fusion, this can take quite a bit of time. I've logged 2-4
yahoo bot hits per second at some times.
So how was yahoo the problem? Because of client variables. Not DB client
variables and not even the dreaded registry client variables. Just simple
cookie based client variables. It seems that when a client variable is set, a
memory structure is also set for CF. Now each bot hit is assumed to be it's own
session as it does not accept cookies. This mean each bot hit generates a
memory structure of about 1k. Now this is not really a lot, but when you have a
few 10's of thousands of hits from bots a day, it adds up.
I'm still waiting on word from Macromedia as to when a client memory structure
times out, but this seems to be the issue.
So what's the solution? There are 4.
1. Increase your ram. If you can do this, then ramp up your memory as high as
you can. This is not a perfect solution but it saves throwing time at the
problem and gives you a 'buffer' against problems of this sort.
2. Set a robots.txt with a Crawl-delay setting. Mine is set to 1 second but you
can set yours to something higher
3. set a different cfapplication for the most common bots. I use a simple
regular expression to find key words that only exist in bots:
<CFIF
REFindNoCase('Slurp|Googlebot|BecomeBot|msnbot|Mediapartners-Google|ZyBorg|RufusBot|EMonitor',
cgi.http_user_agent)>
<CFAPPLICATION name="FusionA" clientmanagement="no" sessionmanagement="no"
setclientcookies="no" setdomaincookies="no" clientstorage="Cookie">
<CFELSE>
<CFAPPLICATION name="FusionA" clientmanagement="yes" sessionmanagement="no"
setclientcookies="yes" setdomaincookies="no" clientstorage="Cookie">
</CFIF>
This will make sure that a client structure is NOT created for one of these
bots.
4. Use the same regex to clean out the client structure after the bot finishes
the page. Use structclear(client) to remove the data in the onRequestEnd.cfm,
the onRequestEnd method of the application.cfc or in the template itself.
Bottom line is that while bots are great for indexing your content, they can
cause havoc on your system when a lot of memory is assigned to what is
essentially a 'dead session'.
http://www.blogoffusion.com/index.cfm/2005/11/28/pseudomemory-leak
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Logware (www.logware.us): a new and convenient web-based time tracking
application. Start tracking and documenting hours spent on a project or with a
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:225442
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54