https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224
--- Comment #18 from Eric <ejolson at unr dot edu> --- Thanks Joseph for the clarification about the two different versions of iconv. I was admittedly confused about this until moments ago. Anyway, I just discovered that libiconv doesn't support conversions to and from the IBM1047 EBCDIC character set and this causes some of the regression tests to fail. Coupled with the fact that C99 isn't supported in the glibc version of iconv this creates a little problem with my patch. You mention a bigger problem which I had not thought about: the C++ semantics of raw strings. Processing UCNs in C++ code apparently requires surprisingly deep syntactic analysis. Raw literals seem to appear in the gnu99 and gnu11 extensions to C as well. Amusingly, if I understand the C++ specifications www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf trigraphs are supposed to be interpreted before any other processing takes place. However, the simple code #include <stdio.h> int main(){ char p1[]="??/u00E4"; char p2[]=R"(??/u00E4)"; char p3[]=R"(\u00E4)"; printf("%s or %s or %s\n",p1,p2,p3); return 0; } compiled with $ g++ -std=c++11 pp.c produces output รค or ??/u00E4 or \u00E4 which illustrates that g++ does not process trigraphs inside raw string literals. Admittedly I'm looking at the draft standard, but I don't think this is something which changed suddenly in the final draft. Clearly, my patch makes a further mess of raw string literals in gcc. My first reaction is that raw string literals were not well thought out, but I suppose bad standards are sometimes better than no standards. At anyrate, there appears no easy way of supporting both UTF-8 identifiers and raw literal strings. My plan for now is to take a break and keep my UTF-8 identifier support as a one-line patch reliant on libiconv which breaks EBCDIC encodings and raw string literals.