Thanks, I pushed your changes to master. Tim
On Tuesday 15 December 2015 18:52:01 Eli Zaretskii wrote: > > From: Tim Ruehsen <[email protected]> > > Cc: Eli Zaretskii <[email protected]> > > Date: Tue, 15 Dec 2015 11:02:21 +0100 > > > > I pushed a conversion fix to master. > > Thanks! > > > There is another bug in wget that comes out with > > wget -d --local-encoding=cp1255 > > 'http://he.wikipedia.org/wiki/%F9._%F9%F4%F8%E4' > > > > Wget double escapes/converts to UTF-8... Maybe you can address this when > > you are working on the code !? > > You mean, because http redirects to https? Yes, I've seen that > already. The simple patch below fixes that. The problem seems to be > that wget assumes the redirected URL to be encoded in the same > encoding as the original one (which, as described earlier, starts with > the local encoding), whereas it is much more reasonable to use the > value provided by --remote-encoding. > > And if the 'if' in the patch looks strange to you, it's rightfully > so. Look at this strange logic in set_uri_encoding: > > /* Set uri_encoding of struct iri i. If a remote encoding was specified, > use it unless force is true. */ > void > set_uri_encoding (struct iri *i, const char *charset, bool force) > { > DEBUGP (("URI encoding = %s\n", charset ? quote (charset) : "None")); > if (!force && opt.encoding_remote) > return; > > I understand the reason to prefer opt.encoding_remote when the 'force' > flag is false -- the user-provided remote encoding should take > preference. But why return without making sure the URI's encoding is > in fact set to that?? I guess there's some assumption that > iri->uri_encoding is already set to opt.encoding_remote, but this > assumption is certainly false in this case. So I tyhink this function > should be changed to actually use opt.encoding_remote, if non-NULL, > and otherwise use 'charset' even if 'force' is false. Then the patch > below could be simplify to avoid the test. WDYT? > > Here's the patch I promised. With it, wget survives redirection from > http to https and successful retrieves that page. > > > diff --git a/src/retr.c b/src/retr.c > index a6a9bd7..6af26a0 100644 > --- a/src/retr.c > +++ b/src/retr.c > @@ -872,9 +872,11 @@ retrieve_url (struct url * orig_parsed, const char > *origurl, char **file, xfree (mynewloc); > mynewloc = construced_newloc; > > - /* Reset UTF-8 encoding state, keep the URI encoding and reset > + /* Reset UTF-8 encoding state, set the URI encoding and reset > the content encoding. */ > iri->utf8_encode = opt.enable_iri; > + if (opt.encoding_remote) > + set_uri_encoding (iri, opt.encoding_remote, true); > set_content_encoding (iri, NULL); > xfree (iri->orig_url);
