Hey guys - I've got an idea for a new feature I'd like to add to wget. I'd like a way to specify a program to be run that can filter the URLs just before they are fetched. I'd like this so that I could use wget to do recursive retrievals against Google's web cache. This would be useful for restoring deleted web sites, reading sites under heavy load, etc. Something like this was my first shot: wget -r "http://www.google.com/search?q=cache:www.tregar.com/" That works fine for the first page but the page that comes back contains links that refer to www.tregar.com, not Google's cache. My solution, given the proposed feature, would be something like: wget -r --url-filter=google.pl \ http://www.google.com/search?q=cache:www.tregar.com/" Where google.pl would be something like (assuming the url comes in through STDIN and goes out through STDOUT and minus error checking): #!/usr/bin/perl while(<STDIN>) { s!^http://!!; print "http://www.google.com/search?q=cache:$_\n"; Another possible implementation would be to include a regex engine in wget and allow the user to specify the filter as a regex. This obviously makes for less powerful filters but might be more UNIXy. Reactions? -sam