OK, I went one better.  I now have CPU-percentage-based throttling.
The big problem on my site was not bandwidth, but how quickly the
loadav would go up when I got hammered by things like "Teleport Pro".
Hey, if you haven't seen that, go see it.  Be afraid.  Be Very Afraid.
See <URL:http://www.tenmax.com/teleport/pro/home.htm> -- and think
about what happens when something like that slams every one of your
Mason or Embperl pages for your shopping cart catalog.

So, I modified my throttler to look at the recent CPU usage over a
window for a given IP.  If the percentage exceeds a threshold, BOOM
they get a 503 error and a correct "Retry-After:" to tell them how
long they're banned.

I do this in two parts:

    an accesshandler that starts a CPU timer ticking, sets up a
    cleanup handler to do most of the dirty work, then checks for a
    "currently blocked" condition, returning 503 if needed.

    a cleanup handler that notes the elapsed CPU, storing it into a
    file.  Then, if not blocked already, counts the recent CPU usage,
    and starts or stops blocking based on that.

Nice thing is: no file locking (everything works no matter how many
people are doing things in parallel).  Also, by pushing most of the
logic down to the post-content phase, we keep the response time zippy!

So, here's source.  Peer review requested - I'm probably turning
this in for my next WebTechniques column...

    package Stonehenge::Throttle;
    use strict;

    ## usage: PerlAccessHandler Stonehenge::Throttle

    my $HISTORYDIR = "/home/merlyn/lib/Apache/Throttle";

    my $WINDOW = 15;                # seconds of interest
    my $DECLINE_CPU_PERCENT = 5; # CPU percent in window before we 503 error

    use vars qw($VERSION);
    $VERSION = (qw$Revision: 2.0 $ )[-1];

    use Apache::Constants qw(OK DECLINED);
    use Apache::File;
    use Apache::Log;

    use Stonehenge::Reload;

    sub handler {
      goto &handler if Stonehenge::Reload->reload_me;

      my $r = shift;                # closure var
      return DECLINED unless $r->is_initial_req;
      my $log = $r->server->log;    # closure var

      my $host = $r->get_remote_host; # closure var
      return DECLINED if $host =~ /\.(holdit|stonehenge)\.com$/;
      $host = "googlebot.com" if $host =~ /\.googlebot\.com$/;

      my $historyfile = "$HISTORYDIR/$host-times"; # closure var
      my $blockfile = "$HISTORYDIR/$host-blocked"; # closure var
      my @delta_times = times;      # closure var
      my $fh = Apache::File->new;   # closure var

      $r->register_cleanup
        (sub {

           ## record this CPU usage
           @delta_times = map { $_ - shift @delta_times } times;
           my $cpu_hundred = int 100*($delta_times[0] + $delta_times[1] + 0.01);
           ## $log->notice("throttle: $host got $cpu_hundred/100 in this slot"); # 
DEBUG
           open $fh, ">>$historyfile" or return DECLINED;
           my $time = time;
           syswrite $fh, pack "LL", $time, $cpu_hundred;
           close $fh;

           my $startwindow = $time - $WINDOW;

           if (my @stat = stat($blockfile)) {
             if ($stat[9] > $startwindow) {
               ## $log->notice("throttle: $blockfile is already blocking"); # DEBUG
               return OK;           # nothing further to see... move along
             } else {
               ## $log->notice("throttle: $blockfile is old, ignoring"); # DEBUG
             }
           }

           # figure out if we should be blocking
           my $totalcpu = 0;        # scaled by 100

           open $fh, $historyfile or return DECLINED;
           while ((read $fh, my $buf, 8) > 0) {
             my ($time, $cpu) = unpack "LL", $buf;
             next if $time < $startwindow;
             $totalcpu += $cpu;
           }
           close $fh;

           if ($totalcpu < $WINDOW * $DECLINE_CPU_PERCENT) {
             ## $log->notice("throttle: $host got $totalcpu/100 CPU in $WINDOW secs"); 
# DEBUG
             unlink $blockfile;
             return OK;
           }

           ## about to be nasty... let's see how bad it is:
           open $fh, "/proc/loadavg";
           chomp(my $loadavg = <$fh>);
           close $fh;

           my $useragent = $r->header_in('User-Agent') || "unknown";

           $log->notice("throttle: $host got $totalcpu/100 CPU in $WINDOW secs, 
enabling block [loadavg $loadavg, agent $useragent]");
           open $fh, ">$blockfile";
           close $fh;

           return OK;
         });

      ## back in the access handler:

      if (my @stat = stat($blockfile)) {
        if ($stat[9] > time - $WINDOW) {
          $log->warn("throttle access: $blockfile is blocking");
          $r->header_out("Retry-After", $WINDOW);
          return 503;               # Service Unavailable
        } else {
          ## $log->notice("throttle access: $blockfile is old, ignoring"); # DEBUG
          return DECLINED;
        }
      }

      return DECLINED;
    }
    1;



-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Reply via email to