On 8/5/2016 6:16 AM, Sergei Nikulov wrote:
2016-08-05 12:11 GMT+03:00 Rod Widdowson<[email protected]>:
>Aside, but curious minds need to know.
>
>As a newcomer here - can someone help me what "Unicode for windows" means?  I 
have to assume it is in URL handling, not files?  The word UTF8 has to be the give-away 
since UTF8 is a pretty alien concept for windows at the k-mode interface (where I mostly 
hang out).
+1

UTF-16 (wide character) encoding, which is the most common encoding of
Unicode and the one used for native Unicode encoding on Windows
operating systems.

So I also wondering how it can encode UTF-8 in file names.


Supporting Unicode in Windows has been discussed in #345 [1]. While I acknowledge UTF-16 is the native choice I thought it would be easier to pass around UTF-8 in the library, that way we wouldn't have to implement a bunch of sister libcurl functions for wide characters. The problem with that is because UTF-8 is not properly supported as a locale (except maybe cygwin) by the underlying MS C runtime (CRT) it won't do the conversions automatically. For example before we call a function like fopen with a UTF-8 filename we'd have to convert to UTF-16 stored in wide chars and instead call _wfopen [2] since there is no way to set the locale to UTF-8. We'd have to handle that for a lot of CRT functions basically making a layer over the CRT and doing something also painful. It seems like either way we'd have to create a bunch of functions, but I suspected the latter would be easier to maintain since they're essentially just wrappers. But how do we know in many of our library functions whether a string we're passed is UTF-8 or just ANSI? That's another problem. And another one is displaying Unicode characters in the console, which didn't always work well, although with Consolas it has gotten better.

A few people have shown interest in this but it waned. Make no mistake it will take a lot of time to implement properly in a way that is maintainable, which is very important. The issues have essentially been abandoned because nobody has the time, but feel free to resurrect them if you want to do the work. Going forward, I think it is important that we all have some consensus on a design before any other work is put in. It could be done in a way that is piecemeal, like only for filenames first, but we should agree on some sort of ultimate plan first.


[1]: https://github.com/curl/curl/issues/345
[2]: https://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

-------------------------------------------------------------------
List admin: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:  https://curl.haxx.se/mail/etiquette.html

Reply via email to