On Mon, Jun 04, 2001 at 04:01:51PM +0000, Marcin 'Qrczak' Kowalczyk wrote:
> Sun, 3 Jun 2001 04:04:05 -0400, Michael B. Allen <[EMAIL PROTECTED]> pisze:
>
> > I have written a string ADT that normalizes ASCII, ISO-8859-1,
> > UCS-2, UCS-2LE, and UTF-8 to uint16_t UCS codes(UCS-2 or UCS-2LE
> > depending on the byte order of the machine) using mainly iconv.
>
> How does it deal with characters above U+FFFF?
Okay, I guess I'd better explain why I'm only doing UCS-2 when others
on this list as well have suggested that I should be using a 32 bit
type. The application I'm working on is an SMB server. Now, Microsoft
cannot switch to UCS-4 because it would thuroughly break all their
existing clients and piss off EMC, NetApp, and a whole bunch of other
influentials to no end. UTF-16 maybe, but theres no space in the wire
format for anything more.
However, I would like this ADT to be usefull outside of my SMB
encoding/decoding library so If you guys can tell me that wchar_t won't
slow things down much then I'd be happy to use whar_t. Is it rendered
useless without a 32 bit type? I thought the high codes were for exotic
characters?
> How it should be used when I have a text in e.g. ISO-8859-2?
By adding about 4 lines of code. I left out the other ISO-8859s because
their just as easy as ISO-8859-1 and I just wanted to see the basics
work first.
Now what I am worried about is combining characters mucking with the
str_length and str_size functions. Is that something I need to worry
about?
Mike
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/