some thoughts on the issue:

1) it may very well be a spider from a chinese search engine indexing
your pages... i'll leave it up to you to decide if you want none of 1
billion chinese people to be able to find your site...
in case it's a search spider, you can try adding appropriate robots.txt
file and <meta> tags to prevent access to and indexing of pages you do
not want accessed/indexed. you can also add rel="nofollow" to any links
you do not want followed by robots. of course, not all robots obey these
rules - if this bot does not then you can consider banning it completely
using the options you outlined.
another impact of robots accessing your site is that your app will still
try and create session vars for their sessions, filling your server
memory with useless sessions. you can try to lessen this burden on your
server by setting super-short session timeout for robots' sessions. irc,
ben nadel had a good blog post on how to do this over on
http://www.bennadel.com/.

2) >> However, this may not work as the offending behavior is probably
generated by a bot or desktop app that doesn't store session variables.
if you use j2ee sessions then session vars are in-memory only and are
stored in server's memory, not on client;s computer...

3) >> This may put an extra small hit on the server, but over all not as
much as an extra 2200 page views in an hour every couple of weeks.
i think you may be wrong here... 2200 page requests once every couple of
weeks seems like a less tax on your server than parsing your logs on
every page request... but then it depends on how loaded your sites are...

just some quick thoughts that i hope may help you...

Azadi Saryev
Sabai-dee.com
http://www.sabai-dee.com/


On 16/10/2009 22:12, Michael Muller wrote:
> Hey all,
>
> Every once in a while I'll notice in my logs that someone comes to one of my 
> sites and hits thousands of pages in a short span and then leaves.  This 
> annoys me for a few reasons:
>
> (a) It's an unecessary tax on my server (and we all hate taxes)
>
> (b) It artificially inflates my page hits
>
> (c) What the hell are they doing? Scraping my pages and hosting them on some 
> site? The current offending IP reverses to China.
>
>
> So, to avoid this, I'm considering the following:
>
> o Add a session variable that stores the last page view time down to the 
> second.  However, this may not work as the offending behavior is probably 
> generated by a bot or desktop app that doesn't store session variables.
>
> o Review my databased logs for the current IP's last twenty page views.  This 
> may put an extra small hit on the server, but over all not as much as an 
> extra 2200 page views in an hour every couple of weeks.
>
> If the requesting IP has requested more than twenty pages from the website in 
> the current minute, I block the IP for a period of time, say, an hour or two.
>
> I have about a dozen sites running the same software, each with multiple 
> thousands of pages (community sites, with 21,000 messages and 
>
>
> I've posed this question a couple of times before on this list and it hasn't 
> prompted any response.  I will try again, hoping someone will either tell me 
> I'm worrying too much, or that this is a smart idea.
>
> Thanks,
>
> Mik
>
>
>
> --------
> Michael Muller
> office (413) 863-6455
> cell (413) 320-5336
> skype: michaelBmuller
> http://MontagueWebWorks.com
>
> Information is not knowledge
> Knowlege is not wisdom
>
> Eschew Obfuscation
>
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:327270
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to