Re: string need to be robust

spir Sun, 13 Mar 2011 13:23:28 -0700

On 03/13/2011 04:43 PM, ZY Zhou wrote:

If a invalid utf8 or utf16 code need to be converted to utf32, then it should be
converted to an invalid utf32. that's why D800~DFFF are marked as invalid points
in unicode standard.


You are wrong on both points.

First, there is no definition of invalid source conversion into anotherformat/encoding; instead it should be treated as invalid, that's all. Alanguage or string-processing library should certainly *not* provide any way todo that. Instead, it should just signal invalidity by crashing or throwing.Second, the range you mention is not intended for application use; instead itis reserved for special use by utf16; and, as such, invalid.

Since the beginning of this thread, you are demanding for D standard features(the *string types or *char[] arrays) to cope with your particular needs of themoment, doing your job; at the price of all other use cases of those featurespotentially becoming unsecure or incorrect; crashing loads of existing codewhich rely on correct behaviour; and breaking the standard.

Strange.

Denis

== Quote from spir (denis.s...@gmail.com)'s article

This is not a good idea, imo. Surrogate values /are/ invalid code points. (For
the ones who guess, there are a range of /code unit/ values used to code in
utf16 code points>  0xFFFF.) They should never appear in a string of dchar[];
and a string of char[] code units should never encode a non-code point in the


--
_________________
vita es estrany
spir.wikidot.com

Re: string need to be robust

Reply via email to