I was very disappointed when Jonathan Helfman's research disappeared
from the Web --- it was there a week or two ago.  I tried to browse it
through the Wayback Machine, but the Wayback Machine doesn't rewrite
links.  (It seems to try, but it relies on browser-side JavaScript to
rewrite links and inline image URLs.  I was using a browser that
didn't support JavaScript.)

So I wrote this (single-threaded, slow, stupid, fragile,
HTTP/1.0-only, but interesting) non-caching HTTP proxy.  It lets you
browse the web through Google's cache or the Wayback Machine by
rewriting your requests on the fly.

#!/usr/bin/perl -w
use strict;
use IO::Socket;

my $port = $ARGV[0] || 8080;
$ARGV[1] ||= 'google';

my $config;

if ($ARGV[1] eq 'google') {
  $config = { PeerAddr => 'www.google.com:80',
              nosub => quotemeta('http://www.google.com'),
              replacement => '/search?q=cache:'};
} else {
  my $date = $ARGV[2] || '19970205193002';
  $config = { PeerAddr => 'web.archive.org:80', 
              nosub => quotemeta('http://web.archive.org'),
              replacement => "/web/$date/"};
}

my $server = new IO::Socket::INET(LocalPort => $port, Listen => 42, 
                                  Reuse => 1);

$| = 1;

my $brokenpipe = 0;
sub brokenpipe {
  $brokenpipe = 1;
}
$SIG{PIPE} = \&brokenpipe;

for (;;) {
  $brokenpipe = 0;
  my $socket = $server->accept();
  my $outsocket = new IO::Socket::INET(PeerAddr => $config->{PeerAddr});
  my $reqline = <$socket>;
  $reqline =~ s| http://| $config->{replacement}| 
    unless $reqline =~ m| $config->{nosub}|;
  $reqline =~ s|HTTP/1.*|HTTP/1.0|;
  print $reqline;
  print $outsocket $reqline;
  while (not $brokenpipe and defined($_ = <$socket>)) {
    $_ = "Host: $config->{PeerAddr}\r\n" if /^Host: .*/;
    #print;
    print $outsocket $_;
    last if /^\r?$/;
  }
  print "-- now reading from outsocket\n";
  my $gotstuff = 0;
  while (not $brokenpipe and defined($_ = <$outsocket>)) {
    $gotstuff++;
    print ".";
    print $socket $_;
  }
  print "\n";
  if (not $gotstuff) {
    print "No data; brokenpipe is $brokenpipe and errno is $!\n";
  }
  close $outsocket;
  close $socket;
}

-- 
<[EMAIL PROTECTED]>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
I don't do .INI, .BAT, .DLL or .SYS files. I don't assign apps to files. I 
don't configure peripherals or networks before using them. I have a computer 
to do all that. I have a Macintosh, not a hobby. -- Fritz Anderson


Reply via email to