2016-08-05 22:27 GMT+03:00 Ray Satiro via curl-library <[email protected]>: > On 8/5/2016 6:16 AM, Sergei Nikulov wrote: > > 2016-08-05 12:11 GMT+03:00 Rod Widdowson <[email protected]>: > >> Aside, but curious minds need to know. >> >> As a newcomer here - can someone help me what "Unicode for windows" means? >> I have to assume it is in URL handling, not files? The word UTF8 has to be >> the give-away since UTF8 is a pretty alien concept for windows at the k-mode >> interface (where I mostly hang out). > > +1 > > UTF-16 (wide character) encoding, which is the most common encoding of > Unicode and the one used for native Unicode encoding on Windows > operating systems. > > So I also wondering how it can encode UTF-8 in file names. > > > Supporting Unicode in Windows has been discussed in #345 [1]. While I > acknowledge UTF-16 is the native choice I thought it would be easier to pass > around UTF-8 in the library, that way we wouldn't have to implement a bunch > of sister libcurl functions for wide characters. The problem with that is > because UTF-8 is not properly supported as a locale (except maybe cygwin) by > the underlying MS C runtime (CRT) it won't do the conversions automatically. > For example before we call a function like fopen with a UTF-8 filename we'd > have to convert to UTF-16 stored in wide chars and instead call _wfopen [2]
All conversion is used to be done automatically by defining UNICODE and _UNICODE for Windows. So my idea is simple - typedef some kind of CURL_CHAR and use it instead of plain char. This typedef will be simple char on Unix/Linux variants and TCHAR for Windows. This also will save some typing for ex. in ldap.c where I see a lot of #ifdef WIN32 ... If you'll need UTF-8 on Windows you should build with Windows Unicode (-DUNICODE -D_UNICODE) and use WCHAR -> UTF-8 code page conversion. > since there is no way to set the locale to UTF-8. We'd have to handle that > for a lot of CRT functions basically making a layer over the CRT and doing > something also painful. It seems like either way we'd have to create a bunch > of functions, but I suspected the latter would be easier to maintain since > they're essentially just wrappers. But how do we know in many of our library > functions whether a string we're passed is UTF-8 or just ANSI? That's > another problem. And another one is displaying Unicode characters in the > console, which didn't always work well, although with Consolas it has gotten > better. > > A few people have shown interest in this but it waned. Make no mistake it > will take a lot of time to implement properly in a way that is maintainable, > which is very important. The issues have essentially been abandoned because > nobody has the time, but feel free to resurrect them if you want to do the > work. Going forward, I think it is important that we all have some consensus > on a design before any other work is put in. It could be done in a way that > is piecemeal, like only for filenames first, but we should agree on some > sort of ultimate plan first. > > > [1]: https://github.com/curl/curl/issues/345 > [2]: https://msdn.microsoft.com/en-us/library/yeby3zcb.aspx > > > ------------------------------------------------------------------- > List admin: https://cool.haxx.se/list/listinfo/curl-library > Etiquette: https://curl.haxx.se/mail/etiquette.html -- Best Regards, Sergei Nikulov ------------------------------------------------------------------- List admin: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
