some thoughts on the issue: 1) it may very well be a spider from a chinese search engine indexing your pages... i'll leave it up to you to decide if you want none of 1 billion chinese people to be able to find your site... in case it's a search spider, you can try adding appropriate robots.txt file and <meta> tags to prevent access to and indexing of pages you do not want accessed/indexed. you can also add rel="nofollow" to any links you do not want followed by robots. of course, not all robots obey these rules - if this bot does not then you can consider banning it completely using the options you outlined. another impact of robots accessing your site is that your app will still try and create session vars for their sessions, filling your server memory with useless sessions. you can try to lessen this burden on your server by setting super-short session timeout for robots' sessions. irc, ben nadel had a good blog post on how to do this over on http://www.bennadel.com/.
2) >> However, this may not work as the offending behavior is probably generated by a bot or desktop app that doesn't store session variables. if you use j2ee sessions then session vars are in-memory only and are stored in server's memory, not on client;s computer... 3) >> This may put an extra small hit on the server, but over all not as much as an extra 2200 page views in an hour every couple of weeks. i think you may be wrong here... 2200 page requests once every couple of weeks seems like a less tax on your server than parsing your logs on every page request... but then it depends on how loaded your sites are... just some quick thoughts that i hope may help you... Azadi Saryev Sabai-dee.com http://www.sabai-dee.com/ On 16/10/2009 22:12, Michael Muller wrote: > Hey all, > > Every once in a while I'll notice in my logs that someone comes to one of my > sites and hits thousands of pages in a short span and then leaves. This > annoys me for a few reasons: > > (a) It's an unecessary tax on my server (and we all hate taxes) > > (b) It artificially inflates my page hits > > (c) What the hell are they doing? Scraping my pages and hosting them on some > site? The current offending IP reverses to China. > > > So, to avoid this, I'm considering the following: > > o Add a session variable that stores the last page view time down to the > second. However, this may not work as the offending behavior is probably > generated by a bot or desktop app that doesn't store session variables. > > o Review my databased logs for the current IP's last twenty page views. This > may put an extra small hit on the server, but over all not as much as an > extra 2200 page views in an hour every couple of weeks. > > If the requesting IP has requested more than twenty pages from the website in > the current minute, I block the IP for a period of time, say, an hour or two. > > I have about a dozen sites running the same software, each with multiple > thousands of pages (community sites, with 21,000 messages and > > > I've posed this question a couple of times before on this list and it hasn't > prompted any response. I will try again, hoping someone will either tell me > I'm worrying too much, or that this is a smart idea. > > Thanks, > > Mik > > > > -------- > Michael Muller > office (413) 863-6455 > cell (413) 320-5336 > skype: michaelBmuller > http://MontagueWebWorks.com > > Information is not knowledge > Knowlege is not wisdom > > Eschew Obfuscation > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:327270 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

