[EMAIL PROTECTED] (Bruno Haible) wrote on 13.07.01 in
<[EMAIL PROTECTED]>:
> Christoph Rohland writes:
>
> > Yes, but perhaps we could try to make that standard?
>
> There is a chance to make the u"..." syntax(es) standard.
That would be nice. One syntax each for strings/chars in
* "native" char * encoding
* UTF-8
* UTF-16
* UTF-32
* Do we need a "native" wide char encoding, too (mostly for Win32 where
it's UTF-16, but possibly also some Asian thing)?
Also, GCC folks might want to have a way to use each of these for
Objective-C string objects - currently, only the first works with a @"..."
syntax. (Prefacing every version with @ might work.)
As for string concatenation, whatever C99 says reasonably extended should
usually work.
> > No, I think one of the biggest mistakes in the C standard is that
> > char/wchar_t is not fixed. We need an exact 16 bit type with a defined
> > encoding.
>
> Joseph Myers explained why you won't get such a type (and why ISO C 99
> section 7.18.1.1.(3) says that uint8_t, uint16_t and uint32_t are
> optional): Some hardware has a word size of 9, 16, 32, or 36 bit, and
> GCC and C99 support such hardware.
You won't get a C standard exact type, but then you don't need that. What
you do need is "the type used on this platform for this application, or if
there is none, a compiler-specific type".
Just picking UTF-16 as an example:
On a reasonable platform, that will be an exact type. If you're trying to
have UTF-16 on a 36 bit platform, well, you'll get *something*, and
hopefully other people using UTF-16 on that platform will get the same
something, but it probably won't be an exact type - nor would anyone
expect it to be.
This means it's not necessary reasonable to define these as "least" types,
either, as the "common" type for this application (if there is one) might
not be the same as the relevant "least" type.
So, you'll want something like
utf16_t unsigned integer type with at least the range 0-65535;
preferrably matches the usual way to encode UTF-16 on this
platform (but the standard doesn't guarantee that)
(Do we also need a signed type that includes all of these plus -1 for
return values?)
MfG Kai
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/