On Mon, Sep 11, 2006 at 05:19:04PM -0400, Daniel Ouellet wrote:
> Joachim Schipper wrote:
> >Your worries about losing proxies is correct; it looks like you have
> >that problem mostly covered. I'm not sure it would help much about
> >bandwidth hogs, though - I don't have any numbers on what programs are
> >most often used, but something like wget certainly does respect
> >robots.txt.
>
> Actually it does. There is many attacks going on right now as you know,
> but if you put them in category, you have the tones of variation of user
> pass value sanity check and you can now see that on Security focus. They
> release in the last three days, over a dozen so far. Even more now I am
> sure. I saw that started a few eeks ago if you look into the archive,
> but that's irrelevant anyway. The other is a virus that spread the same
> way, or similar. In that case they actually call big content page(s) on
> your site. When I mean big content, it's not with images, etc. But text
> stuff. The reason if their virus do not process the content and would
> need to be bigger to do so. This way, it still small and the web server
> see it as legit and will reply. But if you have pages that have .5MB of
> text on it as an example that comes from database back end, then they
> hope to bring your server down, your SQL back end down and if not make
> you waist as much bandwidth as possible. I notice it first on the HUGE
> increase on the GB of transfer each day. Just for you to get a picture
> of this effect. I have logged over 300,000 sources of virus doing this
> type of attack so far on my servers and they pull a series of pages that
> are pretty big in text content, between 150KB minimum to 750KB, or so
> excluding any other content. Each of the offending source will pull that
> content many times a day. I mean just think about it.
>
> So, if you go ONLY with an example of let say just for fun. One time an
> hour only from each one on and accessing an average page of 500KB. You
> get a waisted transfer for that day only of:
> 3,600,000,000,000 * 8 bits/Byte = 28,800,000,000,000 / (60 seconds * 60
> minutes * 24 hours) and get 333,333,333 bits/sec needed in capacity,
> just for this waisting stuff!
>
> And this is only based on one query per hours! Get the picture and the
> size of the problem. (:>
>
> So, what I put into place to counter that doesn't stop it as you can
> stop the source from coming in, but you need to find the good out of the
> bad and my reply to bad one happen to be only 5 bytes instead in the log
> anyway.
>
> All this is with forgetting all the overhead, etc.
>
> So, yes it's a BIG help for "bandwidth hogs"!
>
> And don't forget that's per destination under attack! (:>
>
> So, yes, it can be totally unmanageable if not stop from the start and
> on big scale.
I think we are misunderstanding each other. What I am saying is that
wget does respect robots.txt (so does not get blacklisted, unless
someone explicitly turns this off), and also follows 302 responses (so
does get at the real page even with your greylisting mechanism);
therefore, while your defenses do work against the kind of DoS you are
facing, they do not help against humans recursively wgetting your site
or similar - which is what I thought you meant with 'bandwidth hogs'.
It has, however, become clear that that's not what you meant; so my
comment simply doesn't make sense. Sorry!
> >>3. DDoS GET attacks & Bandwidth suckers defense. Multiple approach.
> >>
> >>3.1 Good users supply data check.
> >>
> >>So far most/all of the variations of attacks on web sites are with
> >>scripts trying to inject itself to your servers. Well, you need to do
> >>sanity checks on your code. Nothing can really protect you for that if
> >>you don't check what you expect to receive from users input. So, I have
> >>nothing for that. No idea anyway on how to, other then may be limiting
> >>the side of the argument a get can send, but even that is bad idea I
> >>think.
> >
> >This is not applicable to DDoS, really - though you are otherwise right,
> >of course.
>
> I provided a very simple way to not remove the problem, but to at a
> minimum stop it from getting infected based on all the latest series of
> security focus variations and it also have the benefit to point you to
> any possible source that your server might have install on them as well.
>
> Very simple really.
>
> >>3.2 Gray listing idea via 302 temporary return code.
>
> >This could be effective, indeed - though I am not sure it would block
> >many attackers.
>
> Work like a charm in real life so far. See number above for results.
> It's been use successfully so far for a few weeks and no bad side effect
> still, just HUGE benefits! And the servers still don't break into sweat yet!
Well, good to know. Not that I'm likely to be facing a DoS anytime soon,
but I'm pretty sure you'd have said the same a couple of weeks ago...
> >>3.4 What about the compromise user computer itself, or proxy server.
> >>
> >
> >Faking those headers is easily done, though; ideally, you'd want to
> >cross-check p0f and the headers. I'm not entirely sure it would hurt an
> >attacker more than it hurt you, though, and priviliged code is always
> >scary, and doubly so when close to essentially untrusted web apps.
>
> True for sure. But you still need a way to make the difference between
> good and bad passing through proxy, or you loose to much. Here
> obviously, I go with the fact that so far, yes these headers are fake
> and it's trivial to do as well, but none of the attack so far anyway
> generate random headers. In witch case it would be useless obviously.
>
> >>4. What about more intelligent attack.
>
> >You *should* consider some unconventional browsers before going to far
> >down this lane, though. Notably, your 1x1-image will show up quite
> >readable on text-mode browsers; be sure to, at least, add a 'don't
> >click' alt attribute.
>
> I know about the text one and tested with Lynx to see, but I did forget
> that I should add the "Do NOT CLICK HERE... Bot trap WARING" stuff, so I
> will do that.
>
> >Also, neither text-based browsers nor most legitimate bots will request
> >images.
>
> And that was the point. Allow legitimate bots, if you choose so
> obviously, and ban the bad one!
I understood from the text you first wrote,
> >> It's possible that more intelligent attacks would be develop as to
> >> read the incoming request and do the redirect. In witch case, most
> >> of the above would be useless. So, what could be done then. (...)
> >> Having an image on the page requested, all pages and wait until the
> >> request for that image comes in and then white list the request.
that you were planning on blocking any browser/bot that does not request
a certain image; however, as I pointed out, while the mainstream
legitimate browsers will request this image and your attackers most
likely won't, neither will legitimate bots or some less common browsers.
Joachim