Hi, we support the meta charset tag in Wget2.
Best is when you build it from source. If you don't fell comfortable with that, some GNU/Linux distros have wget2 packaged (e.g. Debian). See here for a tarball build: https://gitlab.com/gnuwget/wget2/-/blob/master/README.md#downloading-and-building-from-tarball Regards, Tim On 15.07.20 13:39, Sho Amano wrote: > Hi! I've been using the first version of wget for a long time and first of > all, > I want to say thank you to all of the maintainers and contributors of > this project! > > I was looking at the code recently to find that it doesn't support > "<meta charset=...>" tag yet. > I don't see any issues in bug tracker related to this, so I created a patch. > I'm hoping it helps. > > > I also provide sample HTML files to verify. > https://www.dropbox.com/s/zgicd05jdppq781/charset_test_utf_8.html?dl=0 > https://www.dropbox.com/s/001pklu3itad6sw/charset_test_shift_jis.html?dl=0 > > The first file specifies a Japanese file name in UTF-8. The second file does > in Shift-JIS. Save these files, serve them on localhost:8080, and let wget > follow the link. (e.g. `wget -d --recursive --level=2 > http://localhost:8080/charset_test_shift_jis.html`) > Verify that in both cases, wget tries to download > http://localhost:8080/%E6%97%A5%E6%9C%AC%E8%AA%9E.html. > > Thanks! > Sho Amano > > --- > src/html-url.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/src/html-url.c b/src/html-url.c > index b80cf269..5324d244 100644 > --- a/src/html-url.c > +++ b/src/html-url.c > @@ -182,6 +182,7 @@ static const char *additional_attributes[] = { > "http-equiv", /* used by tag_handle_meta */ > "name", /* used by tag_handle_meta */ > "content", /* used by tag_handle_meta */ > + "charset", /* used by tag_handle_meta */ > "action", /* used by tag_handle_form */ > "style", /* used by check_style_attr */ > "srcset", /* used by tag_handle_img */ > @@ -191,7 +192,7 @@ static struct hash_table *interesting_tags; > static struct hash_table *interesting_attributes; > > /* Will contains the (last) charset found in 'http-equiv=content-type' > - meta tags */ > + or 'charset' meta tags */ > static char *meta_charset; > > static void > @@ -574,6 +575,7 @@ tag_handle_meta (int tagid _GL_UNUSED, struct > taginfo *tag, struct map_context * > { > char *name = find_attr (tag, "name", NULL); > char *http_equiv = find_attr (tag, "http-equiv", NULL); > + char *charset = find_attr (tag, "charset", NULL); > > if (http_equiv && 0 == c_strcasecmp (http_equiv, "refresh")) > { > @@ -673,6 +675,20 @@ tag_handle_meta (int tagid _GL_UNUSED, struct > taginfo *tag, struct map_context * > } > } > } > + else if (charset) > + { > + /* Handle stuff like: > + <meta charset="CHARSET"> > + If charset is acquired from http-equiv then it is overwritten. */ > + > + /* Do a minimum check on the charset value */ > + if (check_encoding_name (charset)) > + { > + char *mcharset = xstrdup (charset); > + xfree (meta_charset); > + meta_charset = mcharset; > + } > + } > } > > /* Handle the IMG tag. This requires special handling for the srcset attr, >
signature.asc
Description: OpenPGP digital signature
