in paths

Micah Cowan Thu, 20 Aug 2009 11:06:46 -0700

Nigel Horne wrote:
> Use your favourite browser to visit the page
> http://web.archive.org/web/20080207072124/http://barry-white.members.beeb.net/
> and look at its source and you'll find URLs such as
> http://web.archive.org/web/20080207072124/http://barry-white.members.beeb.net/registers/pr_birch_c1.pdf
> 
> Now run wget -m -k -K -E
> http://web.archive.org/web/20080207072124/http://barry-white.members.beeb.net
> and look at the index.html that's been retrieved and
> you'll find that the above URL has been changed to
> http://barry-white.members.beeb.net.wstub.archive.org/registers/pr_birch_c1.pdf
> which is entirely different.


Guess what? Wget's not doing that, archive.org is. In fact, if you look
closer at those sources, you'll see that the html BASE tag is set as
Wget sees it; archive.org inserts JavaScript to replace that tag after
the fact.

Please also see http://www.archive.org/about/faqs.php#28

-- 
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Bug#542581: wget: Mirror command gets confused by http:// in paths

Reply via email to