https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224
--- Comment #7 from Eric <ejolson at unr dot edu> --- Please look at the Raspberry Pi forum post linked in the original report for more information about testing this patch. As the text describes there, the command line options -finput-charset=UTF-8 -fextended-identifiers are both needed in order to compile a UTF-8 input file containing unicode identifiers. I have included a small test program as another attachment. Searching on UTF-8 Identifiers in GCC will turn up a number of people asking for this feature and additional example codes that use UTF-8 identifers. The document "Unicode for the PCC C99 Compiler" available at http://pcc.ludd.ltu.se/documentation/ also contains example UTF-8 C99 input files which can be used to test the compiler. The one-line patch submitted above has also been tested in the sense that the compiler still bootstraps and has no trouble compiling thousands of lines of standard ASCII C input. The patch inserts "C99" in only one place as the uses of SOURCE_CHARSET are conflicted and changing them all to "C99" doesn't yield a working solution. In particular, the "C99" in _cpp_convert_input should not be considered the source character set appearing in the input files but rather an internal character set suitable for later parsing. As iconv is already a well debugged library, it would appear the risks of this patch are minor. Note however, the following problem: "C99" is probably not the correct for EBCDIC hosts. In that case it might be possible to write UCNs using trigraphs of the form ??/uXXXX and ??/UXXXXXXXX, however, as the number of people wanting to compile C source files with identifiers encoded using UTF-EBCDIC is likely zero, the easiest solution going forward is to modify the patch so it only applies to non-EBCDIC hosts. As there are already #ifdef's in the code to check for this, this does not add any new complexity to the code base.