Re: Proposal for 2 Byte Unicode implementation in gcc and glibc

Marcin 'Qrczak' Kowalczyk Tue, 08 Aug 2000 09:28:14 -0700

> > >>       For utf16_t literals, we suggest the prefix u (similar to the
> > >>       prefix L for the type wchar_t):
> > >>  
> > >>          utf16_t s[] = u"someText"; 
> > >>          utf16_t c = u's'; 

IMHO the whole thing creates much more complexity than solves problems.
It's hard enough with two kinds of characters.

> > 16-bit Unicode is being used in existing software. Java is 16-bit
> > Unicode.  On AIX and Windows NT, wchar_t has 16 bits.

But on Linux wchar_t has 32 bits. UTF-8 has 31 bits.

Although characters currently proposed outside 16 bits are not
common, I don't expect many programs to care to deal with UTF-16 -
they would ignore surrogates. That's why I don't think that promoting
16 bit chars is a good idea: it would mostly work but eventually fail
in rare cases when higher characters are needed. And it would not be
enough to convince authors that it's worth fixing. OTOH 32 bits will
be enough forever, at least for what we currently call 'characters'.

> > But there are no literals. The programmer has to write something like
> > 
> >   unsigned short s[] = {'H', 'e', 'l', 'l', 'o', 0 };
> >   myfunc( (unsigned short*)"H\000e\000l\000l\000o\000\000" );

In many languages, including C++, literals could be written in any
unambiguous way (e.g. UTF-8) and converted to the right string type
by appropriate functions.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Re: Proposal for 2 Byte Unicode implementation in gcc and glibc

Reply via email to