On 03/13/2011 01:25 PM, ZY Zhou wrote:
but I think that it's completely unreasonable to expect
>  all of the string-based and/or range-based functions to be able to handle
>  invalid unicode.
As I explained in the first mail, if utf8 parser convert all invalid utf8 chars 
to
low surrogate code points(0x80~0xFF ->
0xDC80~0xDCFF), other string related functions will still work fine, and you can
also handle these error if you want

string s = "\xa0";
foreach(dchar d; s) {
   if (isValidUnicode(d)) {
     process(d);
   } else {
     handleError(d);
   }
}

This is not a good idea, imo. Surrogate values /are/ invalid code points. (For the ones who guess, there are a range of /code unit/ values used to code in utf16 code points > 0xFFFF.) They should never appear in a string of dchar[]; and a string of char[] code units should never encode a non-code point in the surrogate range.

Denis
--
_________________
vita es estrany
spir.wikidot.com

Reply via email to