Have you thought about using a web cache system like Coral? http://www.scs.cs.nyu.edu/coral/
If you set the no-cache http header to true it caches pages for 5 minutes before requesting a new copy. With a bit of experimentation it should be possible to track the user agent their bot uses and set the no-cache header only for that. You'd probably have to do a bit of testing to make sure searches hit your site directly rather than the coral cache, but it might be worth a try. Spike Jim Davis wrote: > I've a site that while not slow, is most likely not going to take the traffic > it's facing. The site is www.firstnight.org - it will get absolutely pounded > on New Year's Eve. > > Since this will be the warnest New Year's Eve in recent memory AND is a > weekend AND is a holiday for nearly everybody now I fully expect this to be > one of the busiest years ever. > > Unfortunately we're still a non-profit. We can't afford the kind of iron > that this kind of traffic would require for only one or two days a year. > Right now we're on a shared hosting plan at CrystalTech which we nearly > overran last year. > > I'm just going through and looking for savings here. > > 1) An average page runs anywhere from 80-300 ticks. I plan to address some > of that by caching the navigation HTML (right now it's dynamically generated > from a cached CFC). > > What do you think? Too high? Way too high? Way, way too high? > > Many of the pages are quite large and complex (for example the "All Events" > list here: > http://www.firstnight.org/Content/NewYears/Artists/Explore/Events.cfm?Type=All > ) but it's exactly those pages that are the most popular. > > As an aside, you can see the current number of active sessions and the > current page's tick count at the bottom of any page. > > 2) My session manager is worrying me. The site doesn't use CFs built in > Session management. This allows me to capture user information at the end of > a session, but means that I have to manually check and destroy sessions. > When a session ends it's saved in a database along with the pages viewed > during the visit, information about the user agent and several other things. > > This process requires several database calls (perhaps a minimum of 8, but a > maximum determined by the number of pages visited) and averages in the range > of 40-80 ticks per session cleaned. > > That would be fine, except I may be cleaning several thousand sessions at a > shot on the 31st. The system is SQL Server and I've optimized it about as > much as I know how (there are indexes on the major columns, I've cleaned out > all unneeded data, etc). > > Any thoughts on using multiple CFQuery statements vrs one more complex SQL > call? Right now, for example, I make a call to the DB to see if the session > already exists, if it does I do an update, if not I do an insert. > > Could it actually be faster to do an IF statement in the SQL using only one > CFQUERY tag? It seems to me that with "maintain connections" on this > wouldn't make a difference... but I'm not sure (and want to use the time I've > left wisely). > > I am using CFQUERYPARAM and caching what queries make sense. > > > Some other thoughts: > > I've actually considered placing some of the site on another CrystalTech > account (on a different server of course) and using redirects to move the > traffic off. Of course there's no way I could get the URLs to stay the same > (outside of frames) and I wouldn't want it there all the time - just for a > day or two. > > It would also royally screw up my log statistics. > > Any other ideas for kludgy, cheap load balancing? > > I can easily turn off the end-of-session handlers. I know that this process > will take a while, but I'm not sure if it's really the performance hog that I > fear. It is, after all, spending nearly all of it's time waiting for the > database - sitting effectly idle. So what if the clean up process takes two > minutes if the thread isn't dominating the CPU? (I will also decrease the > number of clean-ups to one every 15 minutes or so in an attempt to clear out > old sessions as quickly as possible.) > > Many (sometimes VERY many) of the sessions in memory are generated from bots. > I'm considering creating a ROBOTS.TXT that would prevent bots from indexing > the site for our busy time, but I fear that would inhibit them more than I > want - if you tell robot's to bugger off do they come back? > > I'm open to any other ideas. My gut says that with the resources we have > we'll just have to live with overloads unless they want to create a much > simpler site (and they don't want to do that). > > Anybody got some heavy iron and bandwidth they're willing to donate for three > days a year. ;^) > > Sorry for the babbling - I'm entering my normal end-of-year paranoid phase. > > Jim Davis > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Special thanks to the CF Community Suite Silver Sponsor - CFDynamics http://www.cfdynamics.com Message: http://www.houseoffusion.com/lists.cfm/link=i:4:188890 Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4 Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

