Joseph S. Myers writes:
> since there is no documentation I can't tell what the patch is
> supposed to do.
Their documentation is in the same directory but in a proprietary
format; you find it appended below.
> Systems for string literals in specified character sets have been
> discussed on the WG14 reflector, but AFAICT without any working papers yet
> even in the WG14 document register
At least their patch is something in the direction of portably
written and still legible multilingual strings.
The L"wide string literal" syntax suffers from non-portability across
systems and across locales, because ISO C fails to mandate that
wchar_t is 32-bit ISO 10646.
But the 'u' prefix would better be used for UTF-8 string literals, not
UTF-16 string literals. So I'm proposing the following syntax
u"UTF-8 string literal"
This way no extra 16-bit string functions are needed - the 8-bit str*
functions in libc will do.
Bruno
Title: Modification of GNU-Compiler to support UTF16-String literals
Modification of GNU-Compiler to support UTF16-String literals
Purpose
Support uc and uUTF16 string literal analogue to Lc and Lwide string literal.
Specification
(see chapter 6.1.4 String literals of the C89 standard)
u-string-literal:
us-char-sequenceopt
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except
the double-quote , backslash \, or new-line character
escape-sequence
Implementation
Our approach was to search all places where L-literals where handled explicitly and to add analogue coding for UTF16 string literals.
Step 1: Scanning u-literals
We identified the place in the compiler where L-literals where handled by the scanner and added analogue handling for the u-literals.
As type for the scanned u-literals we used an alias of type unsigned short or unsigned short array, respectively.
Step 2: Parser
In the parser there was only one location where special handling of L-literals occurred:
the routine that concatenated several strictly adjacent literals into one literal.
Step 3: Semantic analysis
In the semantic analysis we found three places where special handling of L-literals occurred:
the check, if char/wide char pointers are initialized with a string literal of the proper type
the permission to convert string literals implicitly to non-const char/wide char pointer
the initialization of string arrays without the terminating 0 of the string literal (only allowed in C)
Step 4: Code generation
In the code generation and optimization there was no location, where special handling of L-literals occurred.
