Lijuan Hai wrote:
>
> I have a plan to convert UCN to alphabet instead of UTF8 in
> GCC-4.2.0, and already handled it in libcpp.

I would like to offer advice, but I don't understand what you are
trying to do.  You say you want to "convert UCN[s] to [an] alphabet
instead of UTF8" but that doesn't make any sense.  Alphabets are
abstract sets of glyphs commonly used to write a language.  They are
not alternatives to UTF8 (a scheme for encoding integers as sequences
of bytes) or even to Unicode (a mapping from integers to glyphs).

The only thing I can guess is that you want to convert UCNs to some
specific character set other than Unicode, like EUC-JP or ISO8859.n.
In that case the first thing I must ask you is to read up on the
-fexec-charset option, and to explain why that doesn't do what you
need it to do.

> But I encountered a problem when compiling the code like following:
> -------------------cut-------------------
> 1:  #define str(t) #t
> 2:  int foo()
> 3:  {
> 4:    char* cc = str(\u1234);
> 5:    if (!strcmp(cc, "\u1234"))
> 6:      abort();
> 7: }
> -------------------cut-------------------
>   With my changes, \u1234 is converted to alphabet in line 4 while
> kept in line 5. It's incorrect and also unexpected to convert it in
> line 4 for '#' makes it different from plain identifiers.

As I don't know what you mean by "converted to alphabet", I can't say
for sure, but if I had to guess, I'd say you inserted your code into
the routines for scanning identifiers?  But at that point there is no
way to know that there is a '#' in effect.  You need to postpone the
conversion, whatever it is, until much later; the point where cpplib
hands off identifiers to the compiler proper, or perhaps even the
assembly output macros, depending on your goal.

(Have you read the long comment at the top of libcpp/charset.c?  Do
you understand all of the fine distinctions made there?)

zw

Reply via email to