Re: URL scanning by bots

Marian Marinov Wed, 01 May 2013 05:34:20 -0700

On 05/01/2013 03:22 PM, André Warnier wrote:

Dirk-Willem van Gulik wrote:

On 1 mei 2013, at 13:31, Graham Leggett <[email protected]> wrote:

The evidence was just explained - a bot that does not get an answer quick 
enough gives up and looks elsewhere.
The key words are "looks elsewhere".



For what it is worth - I've been experimenting with this (up till about 6 
months ago) on a machine of mine. Having the
200, 403, 404, 500 etc determined by an entirely unscientific 'modulo' of the 
IP address. Both on the main URL as well
as on a few PHP/plesk hole URLs. And have ignored/behaved normal for any source 
IP that has (ever) fetched robot.txt
from the same IP masked by the first 20 bits.

That showed that bot's indeed slowdown/do-not-come back so soon if you give 
them a 403 or similar - but I saw no
differences as to which non 200 you give them (not tried slow reply or no 
reply). Do note though that I was focusing
on naughty non-robot.txt fetching bots.

For what it's worth also, thank you.

This kind of response really helps, even if/when it would contradict the 
proposal that I am trying to push.  It helps
because it provides some *evidence* which I am having difficulties collecting 
by myself, and which would allow to
*really* judge the proposal on its merits, not just on unsubstantiated opinions.

At another level, I would add this : if implementing my proposal turns out to 
have no effect, or a very small effect on
the Internet at large, but effectively helps the server where it is active to 
avoid some of these scans, then I believe
that considering the ease and very low cost of implementing this proposal, it 
would still be worth the trouble.

If the majority of web servers start slowing down the bots, this will simply make the bot authors make them stick to theIPs for more time. Once something becomes the standard they can very easily adopt to the new standard.

Re: URL scanning by bots

Reply via email to