Tom Evans wrote:
On Wed, May 1, 2013 at 1:47 AM, André Warnier <[email protected]> wrote:
Christian Folini wrote:
Hey André,

I do not think your protection mechanism is very good (for reasons
mentioned before) But you can try it out for yourself easily with 2-3
ModSecurity rules and the "pause" directive.

Regs,

Christian

Hi Christian.

With respect, I think that you misunderstood the purpose of the proposal.
It is not a protection mechanism for any server in particular.
And installing the delay on one server is not going to achieve much.


Putting in any kind of delay means using more resources to deal with
the same number of requests, even if you use a dedicated 'slow down'
worker to deal especially just with this.

The truth of the matter is that these sorts of spidering requests are
irrelevant noise on the internet. It's not a targeted attack, it is
simply someone looking for easy access to any machine.

I agree with the last statement.
But why is this "irrelevant noise" ? It is noise, and like all noise it is at least annoying, as it interferes with normal information flow. It is exactly like spam, which has been estimated as representing at some moments as constituting up to 50% of the total Internet bandwidth. My 25 unremarkable servers, collectively, have been for years on the receiving end of such noise, at the aggregated rate of several hundred or thousand of requests per day. That is 25 servers on an Internet total of about 600 Million. If my servers are not being specially targeted - and in principle I cannot imagine why they would - then I have to imagine that in aggregate over the Internet, we are talking about several hundred million HTTP requests per day. Is that "irrelevant noise" ?


It is something that, if it is installed on enough webservers on the
Internet, may slow down the URL-scanning bots (hopefully a lot), and thereby
inconvenience their botmasters. Hopefully to the point where they would
decide that it is not worth scanning that way anymore.  And if it dos not
inconvenience them enough to achieve that, at least it should reduce the
effectiveness of these bots, and diminish the number of systems that they
can scan over any given time period with the same number of bots.


Well, no, actually this is not accurate. You are assuming that these
bots are written using blocking io semantics; that if a bot is delayed
by 2 seconds when getting a 404 from your server, it is not able to do
anything else in those 2 seconds. This is just incorrect.
Each bot process could launch multiple requests to multiple unrelated
hosts simultaneously, and select whatever ones are available to read
from. If you could globally add a delay to bots on all servers in the
world, all the bot owner needs to do to maintain the same throughput
is to raise the concurrency level of the bot's requests. The bot does
the same amount of work in the same amount of time, but now all our
servers use extra resources and are slow for clients on 404.


I believe that this line of reasoning is deeply flawed.
If you use blocking I/O, then while your process is waiting, the scheduler can allocate the resources to another process in te meantime. If you do not use blocking I/O, then you use CPU time polling the socket(s) to find out if they have something to read from, and this CPU time cannot be re-allocated to another process. You do not get something for nothing. Opening 200 sockets to send 200 parallel requests, and then cycling through those 200 sockets to see which one has a response yet may improve the *apparent* speed at which you are processing these requests/responses, but it will also dramatically raise the resource-usage profile of such a bot on the host it is running on.

Reply via email to