On 03/13/2011 01:25 PM, ZY Zhou wrote:
but I think that it's completely unreasonable to expect
> all of the string-based and/or range-based functions to be able to handle
> invalid unicode.
As I explained in the first mail, if utf8 parser convert all invalid utf8 chars
to
low surrogate code points(0x80~0xFF ->
0xDC80~0xDCFF), other string related functions will still work fine, and you can
also handle these error if you want
string s = "\xa0";
foreach(dchar d; s) {
if (isValidUnicode(d)) {
process(d);
} else {
handleError(d);
}
}
This is not a good idea, imo. Surrogate values /are/ invalid code points. (For
the ones who guess, there are a range of /code unit/ values used to code in
utf16 code points > 0xFFFF.) They should never appear in a string of dchar[];
and a string of char[] code units should never encode a non-code point in the
surrogate range.
Denis
--
_________________
vita es estrany
spir.wikidot.com