On 5/27/2016 11:27 AM, Andrei Alexandrescu wrote:
On 5/27/16 1:11 PM, Walter Bright wrote:
They mean code units.

Always valid or potentially invalid as well? -- Andrei

Some years ago I would have said always valid. Experience, however, says that Unicode is often dirty and code should be tolerant of that.

Consider Unicode in a text editor. You can't have it throwing exceptions, silently changing things to replacement characters, etc., when there's a few invalid sequences in it. You also can't just say "the file isn't Unicode" and refuse to display the Unicode in it.

It isn't hard to deal with invalid Unicode in a user friendly manner.

Reply via email to