On Montag, 13. November 2017 18:32:46 CET Eli Zaretskii wrote: > > Cc: [email protected], [email protected] > > From: Tim Rühsen <[email protected]> > > Date: Mon, 13 Nov 2017 16:36:39 +0100 > > > > > I don't think it's a Gnulib issue. The problem is that on Windows, > > > the implicit call at the beginning of Wget > > > > > > setlocale (LC_ALL, "C"); > > > > Why is there an explicit call with "C" ? There is an explicit call with > > "". > > I said "implicit", not "explicit". Such an implicit call is made at > the beginning of every C program, per ANSI C Standard. Right? > > The MSDN documentation says it clearly: > > At program startup, the equivalent of the following statement is executed: > > setlocale( LC_ALL, "C" ); > > > From the man page: > > "If locale is an empty string, "", each part of the locale that should > > be modified is set according to the environment variables." > > The call with a locale of "" is only done in a build that has > ENABLE_NLS defined. I was talking about a build which didn't define > ENABLE_NLS. > > > > is not good enough to work in multibyte locales of the Far East, > > > because the Windows runtime assumes a single-byte locale after that > > > call. And since Wget happens to need to display text and create files > > > with non-ASCII characters, it gets hit more than other programs. > > > > I (hopefully) can understand why this doesn't work. NTFS uses UTF-16 for > > the filenames. If your environment specifies a single-character encoding > > (e.g. C) and we use at some point a multi-character encoding (e.g. > > utf-8), then any automatic conversion to UTF-16 filenames are likely to > > fail. For me the question is: a) does wget has a bug (e.g. creating a > > filename with a wrong encoded name string or b) does the Windows API has > > a problem. > > > > > The proposed solution is to add a special call to setlocale which gets > > > this right on Windows. > > > > Why can't we just convert the filename string into the correct encoding > > and then create the file ? What do I miss ? > > I guess you are missing a short introduction to the Windows l10n/i18n > mess. Let me try. > > First, the fact that NTFS uses UTF-16 is not really relevant. Wget > uses 'char *' strings, not 'wchar *' strings to store file names and > call C library functions that accept file names. So we cannot use the > UTF-16 encoding of non-ASCII file names directly. Instead, we use the > locale's codepage (the C library and the OS APIs then convert to > UTF-16 before hitting the disk, but that's not important now). > > Next, creating and opening file names is not the only problem: we need > also to display these file names and URLs, and that also needs to use > the encoding expected by the Windows console. > > Now, in any locale which uses single-byte encoding of non-ASCII > characters, the C locale will support those characters, both for I/O > and for functions like strcmp, strlen, strcoll, etc. But not in > double-byte locales of the Far East: there, you must explicitly call > setlocale with the correct codepage, to have the local character set > supported. This support includes manipulating file names, calling C > library functions to access files, and displaying non-ASCII text, such > as file names and URLs, on the console. > > IOW, this is a Windows runtime subtlety that unfortunately needs to be > fixed in the application code. > > (UTF-8 is not relevant at all here, because Windows doesn't support > UTF-8 as the locale's codeset; if you try to call setlocale to set > UTF-8 as the codeset, setlocale will simply fail. So if we have a > UTF-8 encoded URL or file name inside wget, we must convert it to the > current codepage by calling libiconv functions.) > > Does the above make sense? Let me know if I have to explain some > more.
Thank you, Eli. I just wonder if we have the same problem on Linux console as well. I mean, *not* calling setlocale(LC_ALL, "") (when ENABLE_NLS is undefined) would leave the program with the C locale, even if the console/environment has something else. But no one complained so far... so my question: did you test the patch and does it work for you ? If yes, I am going to apply it. Regards, Tim
signature.asc
Description: This is a digitally signed message part.
