Dear wget developers,

it seems that "wget -r -k" is a bit careless with creating relative
URLs that start with “something:”, which would then be mis-interpreted
as the protocol specification of an URL.

For example, downloading these two files:

/tmp/wget/input $ head *
==> file:with:colon.html <==
<html>
<body>
<a href="./file:with:colon.html">Foo</a>
<a href="./file_without_colon.html">Bar</a>
</body>
</html>

==> file_without_colon.html <==
<html>
<body>
<a href="./file:with:colon.html">Foo</a>
<a href="./file_without_colon.html">Bar</a>
</body>
</html>

with "wget -k -r" produces this output:

==> localhost:8000/file:with:colon.html <==
<html>
<body>
<a href="file:with:colon.html">Foo</a>
<a href="file_without_colon.html">Bar</a>
</body>
</html>

==> localhost:8000/file_without_colon.html <==
<html>
<body>
<a href="file:with:colon.html">Foo</a>
<a href="file_without_colon.html">Bar</a>
</body>
</html>

and the browser will not be able to follow the link to Foo.

This is a practical problem when trying to mirror a mediawiki
installation.
I suggest to avoid the issue by prepending relative links with "./",
either always (why not?), or when there relative file name started with
something that looks like “foo:”.


Thanks,
Joachim
-- 
Joachim “nomeata” Breitner
  [email protected]http://www.joachim-breitner.de/
  Jabber: [email protected]  • GPG-Key: 0xF0FBF51F
  Debian Developer: [email protected]

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to