[Bug c/67224] UTF-8 support for identifier names in GCC

ejolson at unr dot edu Mon, 17 Aug 2015 11:46:10 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224


--- Comment #7 from Eric <ejolson at unr dot edu> ---
Please look at the Raspberry Pi forum post linked in the original report for
more information about testing this patch.  As the text describes there, the
command line options 

    -finput-charset=UTF-8 -fextended-identifiers

are both needed in order to compile a UTF-8 input file containing unicode
identifiers.  I have included a small test program as another attachment. 
Searching on UTF-8 Identifiers in GCC will turn up a number of people asking
for this feature and additional example codes that use UTF-8 identifers.  The
document "Unicode for the PCC C99 Compiler" available at

    http://pcc.ludd.ltu.se/documentation/

also contains example UTF-8 C99 input files which can be used to test the
compiler.  The one-line patch submitted above has also been tested in the sense
that the compiler still bootstraps and has no trouble compiling thousands of
lines of standard ASCII C input.

The patch inserts "C99" in only one place as the uses of SOURCE_CHARSET are
conflicted and changing them all to "C99" doesn't yield a working solution.  In
particular, the "C99" in _cpp_convert_input should not be considered the source
character set appearing in the input files but rather an internal character set
suitable for later parsing.  As iconv is already a well debugged library, it
would appear the risks of this patch are minor.

Note however, the following problem:  "C99" is probably not the correct for
EBCDIC hosts.  In that case it might be possible to write UCNs using trigraphs
of the form ??/uXXXX and ??/UXXXXXXXX, however, as the number of people wanting
to compile C source files with identifiers encoded using UTF-EBCDIC is likely
zero, the easiest solution going forward is to modify the patch so it only
applies to non-EBCDIC hosts.  As there are already #ifdef's in the code to
check for this, this does not add any new complexity to the code base.

[Bug c/67224] UTF-8 support for identifier names in GCC

Reply via email to