Re: [Iup-users] Request

Andrew Robinson Sat, 30 Nov 2019 11:50:54 -0800

Hi Antonio,

Thanks for the professional reply.

"The problem, for now, resumes to convert the string returned by IupFileDlg to
a format that fopen can successfully use"

The problem there is using fopen instead of _wfopen. Those are the only two
basic C-STD choices.

"When ... you need to convert it"

My point exactly! Why do all that back and forth encoding and decoding, when
you can just keep everything in UTF-16 and have hundreds of Unicode specific
functions at your disposal in all versions of Windows, with no excessive
encoding and decoding needed? That way they only other time you have to worry
about Unicode, is for reading and writing foreign text files.

"The GTK documentation describe this"

GTK isn't Windows. In Windows, a great many functions end in either an "A" or
a "W", such as 
AddAtom, AddConsoleAlias, AddLocalAlternateComputerName,
BeginUpdateResourceBuildCommDCBAndTimeouts, BuildCommDCB, CallNamedPipe,
CheckNameLegalDOS8Dot3CommConfigDialog, CompareString, CopyFileEx, CopyFile,
CreateActCtx, ... etcWindows is UTF-16 centric but Linux is only UTF-8
capable, but I'm just interested in the Windows API for now, since Linux
represents less than 3% of the desktop market. Internally Windows is either
ASCII or UTF-16 and I want to work with Windows, not against it. So ... UTF-16
for Windows and UTF-8 for Linux. I think this would be a huge step to making
IUP internationally friendly as well, as it would cut down the considerable
amount of time it can sometimes take for language translations, especially for
UTF-GB18030.

I'm not sure about menu items, as I haven't experimented with that. Yet. I'll
let you know unless you beat me to it.

Much Thanks,
Andrew

On 2019-11-30 at 11:49 AM, Antonio Scuri <[email protected]> wrote:
  Hi Andrew, 

  I still think that we are in the right path. So, the most promising
situation is:

> Action: So then I set UTF8MODE=Yes and UTF8MODE_FILE=Yes and I get the
following:
> Result: All Unicode characters displays properly but I still cannot open the
file with fopen() using the filename provided by IupFileDlg().

  The problem, for now, resumes to convert the string returned by IupFileDlg
to a format that fopen can successfully use. 

  When UTF8MODE_FILE=Yes, the string returned by IupFileDlg is in UTF-8, so
you need to convert it to the filesystem encoding.

  When UTF8MODE_FILE=No, the string returned by IupFileDlg is in the
filesystem encoding, there is not need for conversion, but you can not display
the returned value in an interface element, so you need to convert it to
UTF-8.

  Either way, you need conversion functions to/from UTF-8 and the filesystem
encoding. Would be nice to have a solution for that inside IUP, but for now we
still don't have one.

The GTK documentation describe this in:

https://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html  

Best,
Scuri

Em sex., 22 de nov. de 2019 às 19:16, Andrew Robinson <[email protected]>
escreveu:

Antonio,

I have a file on a Windows 7 computer (with the default code page for
English), that I rename by cutting and pasting a UTF-8 encoded character from
an Internet page into the filename. What happens then? The browser copies and
converts it to UTF-16 and places it in the clipboard, which I then paste into
the filename.

Good News: Even though my version Windows is English only, the filename
displays properly in Windows

Bad News: IUP will not display Unicode properly in all textboxes

Comment: IUP displays the ellipsis (… U+2026) correctly, but the textarea
doesn't display the right-pointer arrow (➔ U+279c) properly

Action: What happens when I set UTF8MODE=Yes?

Result: The first textbox incorrectly displays the ellipsis character as
unknown/error, but now the textarea displays the correct right-pointer
character, although the character incorrectly overlap other characters.

Action: If I set UTF8MODE_FILE=Yes, I get the following:

Bad News:: No Unicode characters display correctly anywhere. Also, I cannot
open the file with fopen() using the filename provided by IupFileDlg().

Action: So then I set UTF8MODE=Yes and UTF8MODE_FILE=Yes and I get the
following:

Result: All Unicode characters displays properly but I still cannot open the
file with fopen() using the filename provided by IupFileDlg().

Discussion: Windows NTFS supports UTF-16 and only UTF-16 filenames, so no
matter what your code page is set to, Windows will always translate filenames
from UTF-16 to your code page encoding (and back again). If I retrieve that
filename using IUP's very convenient IupFileDlg(), it returns the filename
string in ANSI or UTF-8, but Windows fopen() does not accept UTF-8 or UTF-16,
it only accepts the current ANSI code page. I can use _wfopen(), but only if
the string is encoded as UTF-16, which Iup does not support for this. It would
be nice to have everything in the Windows version of Iup in native UTF-16, as
their are hundreds of internal functions that directly support that format. I
would only have to worry about translating filenames or certain file contents
when supporting other languages and code pages.

Per https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8, it
explains how, "Microsoft Windows has a code page designated for UTF-8, code
page 65001. Prior to Windows 10 insider build 17035 (November 2017), it was
impossible to set the locale code page to 65001, leaving this code page only
available for (a) explicit conversion functions such as MultiByteToWideChar
and/or (b) the Win32 console command chcp 65001 to translate stdin/out between
UTF-8 and UTF-16.

Microsoft said that a UTF-8 locale might break some functions (a possible
example is _mbsrev) as they were written to assume multibyte encodings used no
more than 2 bytes per character, thus code pages with more bytes such as GB
18030 (cp54936) and UTF-8 could not be set as the locale.

This means that "narrow" functions, in particular fopen (which opens files),
cannot be called with UTF-8 strings, and in fact there is no way to open all
possible files using fopen no matter what the locale is set to and/or what
bytes are put in the string, as none of the available locales can produce all
possible UTF-16 characters. This problem also applies to all other api that
takes or returns 8 bit strings, including Windows ones such as SetWindowText"

Recommendation: Get rid of UTF-8 Windows support since it isn't very useful
(Linux should be okay though). Use Microsoft's internal default of UTF-16.
There are hundreds of functions in Windows that support ASCII or UTF-16, but
there are none that natively support any other encoding. Doing this will allow
me to use GB-18035 encoding because all filenames are encoded as UTF-16 in
Windows and it is easy to translate between UTF-16 and UTF-GB.

_______________________________________________
Iup-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/iup-users

_______________________________________________
Iup-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/iup-users

Re: [Iup-users] Request

Reply via email to