Hi, Egmont,
The example from Markus' page that you show actually shows "source code"
written using ASCII but with a C-style static string in UTF-8.
There is no problem with this code!
However, if you try to write some code like this:
void Ãcrire(const char *myCString); // Function name has Latin-1 chars *in
UTF-8 encoding*
void åå(const char *myCString); // Function name has Chinese chars *in
UTF-8 encoding*
... instead of:
void myWriteFunction(const char *myCString); // Function name *limited to basic
ASCII Latin*
... THEN You will get into trouble not only with GCC but probably with other
compilers as
well.
So:
1. Keep your code --all parts of it that are actually parsed by the compiler--
limited only
to ASCII. (Most people suggest the code be in English with English comments
for world-wide
comprehension).
2. Although the strings in your program can be in any encoding you want, UTF-8
certainly makes
the most sense.
I have real-life production code that contains message strings encoded in UTF-8
that compiles
and executes just fine on numerous platforms. I have never had a problem with
this code
with either GCC or Intel's ICC on Linux, GCC on other Free *Nix platforms like
FreeBSD and
OpenBSD, or Sun's Forte compiler on Solaris 8. I *never* use special compiler
#pragmas, nor
resort to wide-character (multibyte) strings. I always just use UTF-8 encoding
in simple
C-style "char *" strings or, for C++ code, in the standard C++ "String" class.
- Ed Trager
On Friday 2004.11.12 18:45:08 +0100, Egmont Koblinger wrote:
> Hi,
>
> I was reading Markus's page and found the example:
> printf("%ls\n", L"SchÃne GrÃÃe");
> and noticed that gcc always interprets the source code according to Latin-1.
>
> Then I googled a bit and found this reported to the gcc folks by Markus:
> http://sources.redhat.com/ml/libc-alpha/2000-09/msg00337.html
>
> However, this happened four years ago, and I haven't found more recent
> pieces of information on this topic.
>
> So my questions:
>
> - Is there a proper solution where I can write my source code in UTF-8?
> I have linux with gcc 3.3.4 and it's not necessary for the code to be
> portable to older or different systems.
>
> - Some people were discussing a cpp #pragma charset. Is it already
> implemented? If yes, where can I find docs about it?
>
> - Does recompiling gcc with --enable-c-mbchar solve this issue? Will gcc
> then honour my locale settings? Is it a stable, ready-for-production-use
> option of gcc?
>
> - Are there any applications which are known to miscompile with a c-mbchar
> gcc if I have a non-Latin1 (e.g. Latin-2 or UTF-8) locale settings?
>
>
>
> thanks,
>
> Egmont
>
> --
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/linux-utf8/
>
>
>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/