Hi, Paul Wratt wrote: > if it does not obey - server admins will ban it > > the work around: > 1) get single html file first - edit out meta tag - re-get with > --no-clobber (usually only in landing pages) > 2) empty robots.txt (or allow all - search net) > > possible solutions: > A) command line option > B) ./configure --disable-robots-check > > Paul
The best solution is surely for wget, when fetching page requisites, to always ignore robots.txt (and <META NAME="ROBOTS"... in the HTML). (It would still by default obey robots.txt when downloading anything other than page requisites.) After all, if you go to the URL using a web browser, the browser fetches all page requisites. So wget wouldn't be downloading any more than the web site owner expects. -- Mark
