Re: URL scanning by bots

André Warnier Tue, 30 Apr 2013 16:10:26 -0700

Graham Leggett wrote:

On 30 Apr 2013, at 12:03 PM, André Warnier <a...@ice-sa.com> wrote:

The only cost would a relatively small change to the Apache webservers, which 
is what my
suggestion consists of : adding a variable delay (say between 100 ms and 2000 
ms) to any
404 response.


This would have no real effect.

Bots are patient, slowing them down isn't going to inconvenience a bot in any 
way. The simple workaround if the bot does take too long is to simply send the 
requests in parallel. At the same time, slowing down 404s would break real 
websites, as 404 isn't necessarily an error, but rather simply a notice that 
says the resource isn't found.

Hello.
Thank you for your response.
You make several points above, and I would like to respond to them separately.

1) This would have no real effect.
A: yes, it would.

This is a facetious response, of course. I am making it just in order to illustrate akind of objection which I have encountered before : an "a priori" objection, without areal justification. So I am responding in kind.

This was just for illustration, I hope that you don't mind.

But, you /do/ provide some arguments to justify that, so let me discuss them :

2) "Bots are patient, slowing them down isn't going to inconvenience a bot in any 
way"

A: I beg to disagree.

First, I would make a distinction between "the bot" (which is just a program runningsomewhere, and can obviously not be inconvenienced), and the "owner" of the bot, usuallycalled "bot-master". And I believe that the bot-master can be seriously inconvenienced.

And through the bots, he is the real target.

Here are my reasons for believing that he can be inconvenienced :

I may seem that creating a bot, distributing it and running it for malicious purposes isfree. But that is not true, it has a definite cost.Most countries now have laws defining this as criminal behaviour, and many countries nowhave dedicated officials which are trying to track down "bot-masters" and bring them tojustice.So the very first cost of running a botnet is the opportunity risk of getting caught,paying a big fine and maybe going to prison. And this is not just theory.There have been several items in the news in the last few of years that show this to betrue. Search Google for "botmaster jailed" e.g.

As a second argument, I would state that if it did not cost anything to create and run abotnet, then nobody would pay for it. And that it not true. Nowadays one can purchase botcode, or rent an existing botnet - or even parts of it - for a price. And the price is nottrivial. To rent a botnet of several thousand bots for a week can cost several thousand USDollars. And obviously, there is a market.See here :http://www.zdnet.com/blog/security/study-finds-the-average-price-for-renting-a-botnet/6528

or search Google for equivalent information.

If it does cost something to create and run a malicious botnet, then if someone does it,it is in order to get a return on his investment.The kind of desired return can vary (think Anonymous or some intelligence services), butit is obvious to me that if someone is running botnets which *do* scan my servers (andmost servers on the Internet) for vulnerable URLs, they are not doing this for the simplepleasure of doing it. They are expecting a return, or else they wouldn't do it.The faster that they can scan servers and identify likely targets for further mischief,the better the return compared to the costs.

As long as the likely return outweighs the costs, they will continue.

Raise the cost or lower the return below a certain treshold however, and it will becomeuneconomical, and they will stop.

At what point this would happen, I can't tell.

But I do know one thing : what I am suggesting /would/ slow them down, so it goes in theright direction : to raise their cost and/or diminish their return.

2) The simple workaround if the bot does take too long is to simply send the requests inparallel.

A: I already mentioned that point in my original suggestion and I tried to show that itdoesn't really matter, but let me add another aspect :

The people who run bots are not using their own computers or their own bandwidth to dothis. That would be really uneconomical, and really dangerous for them.Instead, they rely on discreetely "infecting" computers belonging to other people, andthen using those computers and their bandwidth to run their operation.

If your computer has been infected and is running a bot in the background, you may notnotice it, as long as the bot is using a small amount of resources.But if the bot running on your computer starts to use any significant amount of CPU orbandwidth, then the probability of you noticing will increase. And if you notice it, youwill kill it, won't you ? And if you do that, there is one less bot in the botnet.

What I am saying is that one can not just increase forever the amount of parallelism inthe scans that a bot is performing. There is a limit to the amount of resources that a botcan use in its host while remaining discreet.My original sample calculations used individual bots, each issuing 200 requests in 2seconds. How many more can one bot issue and remain discreet ?

So really, if you admit that the suggestion, if implemented, would slow down the action ofscanning a number of servers, then in order to keep scanning the same number of servers inthe same time, the only practical response would be to increase the number of bots doingthe scanning.

And then, we run back to the argument above : it increases the cost.

3) "At the same time, slowing down 404s would break real websites, as 404 isn'tnecessarily an error, but rather simply a notice that says the resource isn't found."


A: I believe that this is a more tricky objection.
I agree, a 404 is just an indication that the resource isn't found.

But I have been trying to figure out a real use case, where expecting 404 responses in thecourse of legitimate applications or website access would be a normal thing to do, and Iadmit that I haven't been able to think of any.Can you come up with an example where this would really be a user case and where delying404 responses would really "break something" ?

I would also like to offer a precision : my suggestion is to make this an *optional*feature, that can be easily tuned or disabled by a webserver administrator.

(Similarly to a number of other security-minded configuration directives in 
Apache httpd)

It would just be a lot more effective if it was enabled by default, in the standardconfiguration of the standard Apache httpd distributions.The reason for that is again a numbers game : there are about 600 Million webservers intotal, at least 60% of them (360,000,000) being "Apache" servers.

Of these 360 million, how many would you say are professionally installed and 
managed ?

(How many competent webserver administrators are there in the world, and how manywebservers can each one of them take care of ?)If I was to venture a number, I would say that the number of Apache webservers that areprofessionally installed and managed is probably not higher than a few milllions, maybe10% of the above.That leaves many more millions which are not so, and those are the target of thesugestion. If it was a default option, then over time, as new Apache httpd webservers areinstalled - or older ones upgraded - the proportion of servers where this option isactivated would automatically increase, without any further intervention.

And as I have already tried to show, any additional percent overall of the installedwebservers where this would be active, increases the total URL scan time by severalmillion seconds. No matter how parallel the scan is, that number doesn't change.


I hope to have provided convincing arguments in my responses to your objections.
And if not, I'll try harder.

There is also a limit for me though : I do not have the skills nor the resources toactually set up a working model of this. I cannot create (or rent) a real botnet andthousands of target servers in order to really prove my arguments.But maybe someone could think of a way to really prove or disprove this ? Whatever theresults, I would be really delighted.

Re: URL scanning by bots

Reply via email to