On 03/13/2011 04:43 PM, ZY Zhou wrote:
If a invalid utf8 or utf16 code need to be converted to utf32, then it should be
converted to an invalid utf32. that's why D800~DFFF are marked as invalid points
in unicode standard.

You are wrong on both points.
First, there is no definition of invalid source conversion into another format/encoding; instead it should be treated as invalid, that's all. A language or string-processing library should certainly *not* provide any way to do that. Instead, it should just signal invalidity by crashing or throwing. Second, the range you mention is not intended for application use; instead it is reserved for special use by utf16; and, as such, invalid.

Since the beginning of this thread, you are demanding for D standard features (the *string types or *char[] arrays) to cope with your particular needs of the moment, doing your job; at the price of all other use cases of those features potentially becoming unsecure or incorrect; crashing loads of existing code which rely on correct behaviour; and breaking the standard.
Strange.

Denis

== Quote from spir (denis.s...@gmail.com)'s article
This is not a good idea, imo. Surrogate values /are/ invalid code points. (For
the ones who guess, there are a range of /code unit/ values used to code in
utf16 code points>  0xFFFF.) They should never appear in a string of dchar[];
and a string of char[] code units should never encode a non-code point in the



--
_________________
vita es estrany
spir.wikidot.com

Reply via email to