> > I wonder, how many people really want to use Unicode codepoints
beyond
> > U+FFFF?
>
> I don't want to make it incorrect by design just because cases it
doesn't
> handle are rare.

It's unnecessary to handle ALL cases.  You could address only issues
encountered/expected by your end users.  IMHO, it is more important to
make an application be light-weight and run in 99% cases.  Or, you may
find your language used by, say, 10000 people, and none uses the extra
features that you spend 40% of your development labour.  And it is
always possible to refactor. --- However, it is your freedom.  I just
give my thoughts.

> > '\0' is not a valid character; it is a byte.  Don't confuse them.
>
> It can occur in a file. The library can't just silently truncate a
line when
> it encounters that - again, although it's rare, it would be broken by
design,
> so I won't do that. I won't be mistake-compatible with C.

What if something occur in the file and does not form a valid, say,
UTF-16 sequence?  If you do things this way, you will have a lot of
headaches.  Western Visual Basic programmers often uses characters to
represent bytes, which make applications break when the default encoding
changes from Latin-1 to UTF-8 or some DBCSs.

Best regards,

Wu Yongwei

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to