> > I wonder, how many people really want to use Unicode codepoints beyond > > U+FFFF? > > I don't want to make it incorrect by design just because cases it doesn't > handle are rare.
It's unnecessary to handle ALL cases. You could address only issues encountered/expected by your end users. IMHO, it is more important to make an application be light-weight and run in 99% cases. Or, you may find your language used by, say, 10000 people, and none uses the extra features that you spend 40% of your development labour. And it is always possible to refactor. --- However, it is your freedom. I just give my thoughts. > > '\0' is not a valid character; it is a byte. Don't confuse them. > > It can occur in a file. The library can't just silently truncate a line when > it encounters that - again, although it's rare, it would be broken by design, > so I won't do that. I won't be mistake-compatible with C. What if something occur in the file and does not form a valid, say, UTF-16 sequence? If you do things this way, you will have a lot of headaches. Western Visual Basic programmers often uses characters to represent bytes, which make applications break when the default encoding changes from Latin-1 to UTF-8 or some DBCSs. Best regards, Wu Yongwei -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
