"Gisle Vanem" <[EMAIL PROTECTED]> writes: > I have to use the ACE form www.xn--troms-zua.no > which is a bit of a pain. > Ref. http://www.norid.no/domenenavnbaser/ace/?language=en > > Why is wget munging the hostname here? Seem it calls > reencode_escapes() on the hostname part. Why I don't know.
Calling reencode_escapes() is correct anywhere in the URL; what Wget needs to do is unescape the host part of the URL before using it further. > If it where not for the "Host:" header, the name could remain > un-escaped. I don't know what the standard say about this case. > Should the header contain "Host:www.xn--troms-zua.no" ? The Host header is (I think) not URL-escaped, so we can simply send the 8-bit characters as we received them. Here's a patch; please let me know if it works for you. 2004-03-19 Hrvoje Niksic <[EMAIL PROTECTED]> * url.c (url_parse): Decode %HH sequences in host name. Index: src/url.c =================================================================== RCS file: /pack/anoncvs/wget/src/url.c,v retrieving revision 1.110 diff -u -r1.110 url.c --- src/url.c 2003/12/15 10:22:54 1.110 +++ src/url.c 2004/03/19 20:57:43 @@ -999,6 +999,15 @@ host_modified = lowercase_str (u->host); + /* Decode %HH sequences in host name. This is important not so much + to support %HH sequences, but to support binary characters (which + will have been converted to %HH by reencode_escapes). */ + if (strchr (u->host, '%')) + { + url_unescape (u->host); + host_modified = 1; + } + if (params_b) u->params = strdupdelim (params_b, params_e); if (query_b)