Re: string need to be robust

spir Sun, 13 Mar 2011 05:51:51 -0700

On 03/13/2011 01:25 PM, ZY Zhou wrote:

but I think that it's completely unreasonable to expect
>  all of the string-based and/or range-based functions to be able to handle
>  invalid unicode.

As I explained in the first mail, if utf8 parser convert all invalid utf8 chars 
to
low surrogate code points(0x80~0xFF ->
0xDC80~0xDCFF), other string related functions will still work fine, and you can
also handle these error if you want


string s = "\xa0";
foreach(dchar d; s) {
   if (isValidUnicode(d)) {
     process(d);
   } else {
     handleError(d);
   }
}

This is not a good idea, imo. Surrogate values /are/ invalid code points. (Forthe ones who guess, there are a range of /code unit/ values used to code inutf16 code points > 0xFFFF.) They should never appear in a string of dchar[];and a string of char[] code units should never encode a non-code point in thesurrogate range.


Denis
--
_________________
vita es estrany
spir.wikidot.com

Re: string need to be robust

Reply via email to