Re: URL scanning by bots

André Warnier Wed, 01 May 2013 04:32:21 -0700

Tom Evans wrote:

On Wed, May 1, 2013 at 1:47 AM, André Warnier <[email protected]> wrote:

Christian Folini wrote:

Hey André,


I do not think your protection mechanism is very good (for reasons
mentioned before) But you can try it out for yourself easily with 2-3
ModSecurity rules and the "pause" directive.

Regs,

Christian

Hi Christian.

With respect, I think that you misunderstood the purpose of the proposal.
It is not a protection mechanism for any server in particular.
And installing the delay on one server is not going to achieve much.


Putting in any kind of delay means using more resources to deal with
the same number of requests, even if you use a dedicated 'slow down'
worker to deal especially just with this.

The truth of the matter is that these sorts of spidering requests are
irrelevant noise on the internet. It's not a targeted attack, it is
simply someone looking for easy access to any machine.


I agree with the last statement.

But why is this "irrelevant noise" ? It is noise, and like all noise it is at leastannoying, as it interferes with normal information flow. It is exactly like spam, whichhas been estimated as representing at some moments as constituting up to 50% of the totalInternet bandwidth.My 25 unremarkable servers, collectively, have been for years on the receiving end of suchnoise, at the aggregated rate of several hundred or thousand of requests per day. That is25 servers on an Internet total of about 600 Million. If my servers are not beingspecially targeted - and in principle I cannot imagine why they would - then I have toimagine that in aggregate over the Internet, we are talking about several hundred millionHTTP requests per day. Is that "irrelevant noise" ?

It is something that, if it is installed on enough webservers on the
Internet, may slow down the URL-scanning bots (hopefully a lot), and thereby
inconvenience their botmasters. Hopefully to the point where they would
decide that it is not worth scanning that way anymore.  And if it dos not
inconvenience them enough to achieve that, at least it should reduce the
effectiveness of these bots, and diminish the number of systems that they
can scan over any given time period with the same number of bots.


Well, no, actually this is not accurate. You are assuming that these
bots are written using blocking io semantics; that if a bot is delayed
by 2 seconds when getting a 404 from your server, it is not able to do
anything else in those 2 seconds. This is just incorrect.
Each bot process could launch multiple requests to multiple unrelated
hosts simultaneously, and select whatever ones are available to read
from. If you could globally add a delay to bots on all servers in the
world, all the bot owner needs to do to maintain the same throughput
is to raise the concurrency level of the bot's requests. The bot does
the same amount of work in the same amount of time, but now all our
servers use extra resources and are slow for clients on 404.


I believe that this line of reasoning is deeply flawed.

If you use blocking I/O, then while your process is waiting, the scheduler can allocatethe resources to another process in te meantime.If you do not use blocking I/O, then you use CPU time polling the socket(s) to find out ifthey have something to read from, and this CPU time cannot be re-allocated to anotherprocess. You do not get something for nothing.Opening 200 sockets to send 200 parallel requests, and then cycling through those 200sockets to see which one has a response yet may improve the *apparent* speed at which youare processing these requests/responses, but it will also dramatically raise theresource-usage profile of such a bot on the host it is running on.

Re: URL scanning by bots

Reply via email to