It just occurred to me that since wget will perform this task properly if
it gets the rule from robots.txt, maybe this issue could be worked around
by proxying or spoofing the remote site's robots.txt file locally?  That
is, I write

User-agent: *
Disallow: wgettest/links2.html

into a file, save it in my home directory, and then somehow tell wget that
davidskalinder.com/robots.txt is actually located at
/home/user/robots.txt?

Does anybody know a convenient way of doing this?  Or is there an easier
workaround I'm overlooking?


Reply via email to