Hi Andrew, I still think that we are in the right path. So, the most promising situation is:
> Action: So then I set UTF8MODE=Yes and UTF8MODE_FILE=Yes and I get the following: > Result: All Unicode characters displays properly but I still cannot open the file with fopen() using the filename provided by IupFileDlg(). The problem, for now, resumes to convert the string returned by IupFileDlg to a format that fopen can successfully use. When UTF8MODE_FILE=Yes, the string returned by IupFileDlg is in UTF-8, so you need to convert it to the filesystem encoding. When UTF8MODE_FILE=No, the string returned by IupFileDlg is in the filesystem encoding, there is not need for conversion, but you can not display the returned value in an interface element, so you need to convert it to UTF-8. Either way, you need conversion functions to/from UTF-8 and the filesystem encoding. Would be nice to have a solution for that inside IUP, but for now we still don't have one. The GTK documentation describe this in: https://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html Best, Scuri Em sex., 22 de nov. de 2019 às 19:16, Andrew Robinson <arobinso...@cox.net> escreveu: > Antonio, > > I have a file on a Windows 7 computer (with the default code page for > English), that I rename by cutting and pasting a UTF-8 encoded character > from an Internet page into the filename. What happens then? The browser > copies and converts it to UTF-16 and places it in the clipboard, which I > then paste into the filename. > > *Good News*: Even though my version Windows is English only, the filename > displays properly in Windows > > *Bad News*: IUP will not display Unicode properly in all textboxes > *Comment*: IUP displays the ellipsis (… U+2026) correctly, but the > textarea doesn't display the right-pointer arrow (➔ U+279c) properly > > *Action*: What happens when I set UTF8MODE=Yes? > *Result*: The first textbox incorrectly displays the ellipsis character > as unknown/error, but now the textarea displays the correct > right-pointer character, although the character incorrectly overlap other > characters. > > *Action*: If I set UTF8MODE_FILE=Yes, I get the following: > *Bad News*:: No Unicode characters display correctly anywhere. Also, I > cannot open the file with fopen() using the filename provided by > IupFileDlg(). > > *Action*: So then I set UTF8MODE=Yes and UTF8MODE_FILE=Yes and I get the > following: > *Result*: All Unicode characters displays properly but I still cannot > open the file with fopen() using the filename provided by IupFileDlg(). > > *Discussion*: Windows NTFS supports UTF-16 and only UTF-16 filenames, so > no matter what your code page is set to, Windows will > always translate filenames from UTF-16 to your code page encoding (and back > again). If I retrieve that filename using IUP's very convenient > IupFileDlg(), it returns the filename string in ANSI or UTF-8, but Windows > fopen() does not accept UTF-8 or UTF-16, it only accepts the current ANSI > code page. I can use _wfopen(), but only if the string is encoded as > UTF-16, which Iup does not support for this. It would be nice to have > everything in the Windows version of Iup in native UTF-16, as their are > hundreds of internal functions that directly support that format. I would > only have to worry about translating filenames or certain file contents > when supporting other languages and code pages. > > Per https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8, it > explains how, "Microsoft Windows has a code page designated for UTF-8, code > page 65001. Prior to Windows 10 insider build 17035 (November 2017), it was > impossible to set the locale code page to 65001, leaving this code page > only available for (a) explicit conversion functions such as > MultiByteToWideChar and/or (b) the Win32 console command chcp 65001 to > translate stdin/out between UTF-8 and UTF-16. > > Microsoft said that a UTF-8 locale might break some functions (a possible > example is _mbsrev) as they were written to assume multibyte encodings used > no more than 2 bytes per character, thus code pages with more bytes such as > GB 18030 (cp54936) and UTF-8 could not be set as the locale. > > This means that "narrow" functions, in particular fopen (which opens > files), cannot be called with UTF-8 strings, and in fact there is no way to > open all possible files using fopen no matter what the locale is set to > and/or what bytes are put in the string, as none of the available locales > can produce all possible UTF-16 characters. This problem also applies to > all other api that takes or returns 8 bit strings, including Windows ones > such as SetWindowText" > > *Recommendation*: Get rid of UTF-8 Windows support since it isn't very > useful (Linux should be okay though). Use Microsoft's internal default of > UTF-16. There are hundreds of functions in Windows that support ASCII or > UTF-16, but there are none that natively support any other encoding. Doing > this will allow me to use GB-18035 encoding because all filenames are > encoded as UTF-16 in Windows and it is easy to translate between UTF-16 and > UTF-GB. > _______________________________________________ > Iup-users mailing list > Iup-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/iup-users >
_______________________________________________ Iup-users mailing list Iup-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/iup-users