[Bug c/67224] UTF-8 support for identifier names in GCC

manu at gcc dot gnu.org Tue, 18 Aug 2015 03:08:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224


--- Comment #13 from Manuel López-Ibáñez <manu at gcc dot gnu.org> ---
(In reply to Eric from comment #12)
> I'm glad to know people like Joseph are working on UTF-8 in gcc. 

I think at the moment, neither Joseph nor anyone else is planning to work on
this. There doesn't seem to be sufficient demand for this feature so that
companies fund it or volunteers step up to implement it (you are the first one
to do an attempt that I am aware of).

> I spent a week adding UTF-8 input support to pcc.  At that time Microsoft
> Studio and clang already supported UTF-8 input files and I expected that gcc
> would do so in the next release.

Unfortunately, GCC has very few developers compared to Microsoft or Clang. Many
things in GCC will never get done if new people do not contribute to its
development. This is why if you want to see this feature, you are the best and
perhaps the only person to make it happen.

The problem is that this cannot be fixed by one-line patch, otherwise it would
have been fixed a long time ago.

* GCC cannot rely on libiconv being always present. It has to work with glibc's
iconv, which is what is used in GNU/Linux.

* Even if glibc's supported C99 conversion, this will break other things. 

* You need to add tests explicitly for various things (see Joseph's comments).
The tests will be added to the GCC testsuite to prove that your patch works as
it should and to make sure future changes do not break the tests.

* At a minimum, look at all the gcc.dg/cpp/ucnid-*.c g++.dg/cpp/ucnid-*.c and
see what happens if you replace the \uNNN with actual extended characters.

* Joseph thinks that the best approach is to do the conversion from UTF-8 to
UCNs "manually" within cpplib, such that you can handle all the corner cases of
C/C++ (quoted strings, \µ, macro names,...)

[Bug c/67224] UTF-8 support for identifier names in GCC

Reply via email to