https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224
--- Comment #14 from Eric <ejolson at unr dot edu> --- While there may not be current demand for gcc to accept UTF-8 identifiers, the fact that clang and Visual Studio support this C99 feature means source code using Greek and accented characters in variable names is likely to become more prevalent over time. I have done a little testing to check by default whether string literals can contain arbitrary 8-bit data. This is used, for example, in legacy code which directly includes graphics characters from CP437. The original preprocessor specifies "UTF-8" as the default input character set and "UTF-8" as the internal character set. Then, if the internal and working character sets are identical no translation is done and arbitrary 8-bit data is passed through cleanly. A slight modification to my patch needs to be made to retain the same behavior. In particular, the patch now specifies both the internal and default input character sets to be "C99" so no translation is done by default. The improved patch also includes consideration of EBCDIC hosts. As iconv was installed on every GNU/Linux system I've tried, I'm not sure what is wrong with using the C99 mode present in newer releases. This achieves exactly the suggested result of converting all UTF-8 input to UCNs in the preprocessor while directly allowing other potentially useful conversions. Perhaps the configure script should be modified to check for a compatibile version of iconv and if one is not found resort to a manual conversion. Testing is still underway. After the standard regression tests are finished I will create new tests utf8id-.* which will be versions of the uncid-.* tests for native utf-8 files. I will also include a new test for arbitrary 8-bit string literals, to verify further compatibility.