Hi Joachim,
could please test the attached patch if it works for you ?
Could anyone else review it !?
Tim
On Monday 26 October 2015 13:42:41 Joachim Breitner wrote:
> Dear wget developers,
>
> it seems that "wget -r -k" is a bit careless with creating relative
> URLs that start with “something:”, which would then be mis-interpreted
> as the protocol specification of an URL.
>
> For example, downloading these two files:
>
> /tmp/wget/input $ head *
> ==> file:with:colon.html <==
> <html>
> <body>
> <a href="./file:with:colon.html">Foo</a>
> <a href="./file_without_colon.html">Bar</a>
> </body>
> </html>
>
> ==> file_without_colon.html <==
> <html>
> <body>
> <a href="./file:with:colon.html">Foo</a>
> <a href="./file_without_colon.html">Bar</a>
> </body>
> </html>
>
> with "wget -k -r" produces this output:
>
> ==> localhost:8000/file:with:colon.html <==
> <html>
> <body>
> <a href="file:with:colon.html">Foo</a>
> <a href="file_without_colon.html">Bar</a>
> </body>
> </html>
>
> ==> localhost:8000/file_without_colon.html <==
> <html>
> <body>
> <a href="file:with:colon.html">Foo</a>
> <a href="file_without_colon.html">Bar</a>
> </body>
> </html>
>
> and the browser will not be able to follow the link to Foo.
>
> This is a practical problem when trying to mirror a mediawiki
> installation.
> I suggest to avoid the issue by prepending relative links with "./",
> either always (why not?), or when there relative file name started with
> something that looks like “foo:”.
>
>
> Thanks,
> Joachim
>From b14eeb5aeeae709b73dba39d8439e0d46c4f10a0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim=20R=C3=BChsen?= <[email protected]>
Date: Tue, 27 Oct 2015 13:13:54 +0100
Subject: [PATCH] Fix URL conversion for colons in filenames
* src/convert.c (construct_relative): Prepend './' to filename
* tests/Test-k.px: Amend test to succeed
---
src/convert.c | 20 +++++++++++++++-----
tests/Test-k.px | 2 +-
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/convert.c b/src/convert.c
index 8e9aa60..df8d58d 100644
--- a/src/convert.c
+++ b/src/convert.c
@@ -441,11 +441,21 @@ construct_relative (const char *basefile, const char *linkfile)
++basedirs;
}
- /* Construct LINK as explained above. */
- link = xmalloc (3 * basedirs + strlen (linkfile) + 1);
- for (i = 0; i < basedirs; i++)
- memcpy (link + 3 * i, "../", 3);
- strcpy (link + 3 * i, linkfile);
+ if (!basedirs && (b = strpbrk (linkfile, "/:")) && *b == ':')
+ {
+ link = xmalloc (2 + strlen (linkfile) + 1);
+ memcpy (link, "./", 2);
+ strcpy (link + 2, linkfile);
+ }
+ else
+ {
+ /* Construct LINK as explained above. */
+ link = xmalloc (3 * basedirs + strlen (linkfile) + 1);
+ for (i = 0; i < basedirs; i++)
+ memcpy (link + 3 * i, "../", 3);
+ strcpy (link + 3 * i, linkfile);
+ }
+
return link;
}
diff --git a/tests/Test-k.px b/tests/Test-k.px
index 1258e14..9005c5f 100755
--- a/tests/Test-k.px
+++ b/tests/Test-k.px
@@ -25,7 +25,7 @@ my $converted = <<EOF;
<title>Index</title>
</head>
<body>
- <a href="site%3Bsub:.html">Site</a>
+ <a href="./site%3Bsub:.html">Site</a>
</body>
</html>
EOF
--
2.6.2