> Date: Tue, 18 Aug 2015 22:28:21 +0200 > From: "Andries E. Brouwer" <[email protected]> > Cc: "Andries E. Brouwer" <[email protected]>, [email protected], > [email protected] > > > What is needed to have a full Unicode support in wget on Windows is to > > provide replacements for all the file-name related libc functions > > ('fopen', 'open', 'stat', 'access', etc.) which will accept file names > > encoded in UTF-8, convert them internally into UTF-16, and call the > > wchar_t equivalents of those functions ('_wfopen', '_wopen', '_wstat', > > '_waccess', etc.) with the converted file name. Another thing that is > > needed is similar replacements for 'printf', 'puts', 'fprintf', > > etc. when they are used for writing file names to the console -- > > because we cannot write UTF-8 sequences to the Windows console. > > Aha. That reminds me of a patch by I think Aleksey Bykov. > Yes - see http://lists.gnu.org/archive/html/bug-wget/2014-04/msg00080.html > > There we had a similar discussion, and he wrote mswindows.diff with > > +int > +wc_utime (unsigned char *filename, struct _utimbuf *times) > +{ > + wchar_t *w_filename; > + int buffer_size; > + > + buffer_size = sizeof (wchar_t) * MultiByteToWideChar(65001, 0, filename, > -1, > w_filename, 0); > + w_filename = alloca (buffer_size); > + MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size); > + return _wutime (w_filename, times); > +} > > and similar for stat, open, etc. Something similar is what would be needed on > Windows?
Yes, thanks for pointing out those patches. Any reasons they weren't accepted back then? > Is his patch usable? It needs some minor polishing, but in general it should do the job, yes. I admit that I don't understand the need for the url.c patch. Why do we need to convert to wchar_t when the locale's codeset is already UTF-8? (I could understand that for non-UTF-8 locales, but the patch explicitly limits the conversion to wchar_t and back to UTF-8 locales, where the normal string functions should do the job.) Is this only for converting to upper/lower-case? There's still the part with writing UTF-8 encoded file/URL names to the Windows console; that will have to be added.
