> but I think that it's completely unreasonable to expect > all of the string-based and/or range-based functions to be able to handle > invalid unicode.
As I explained in the first mail, if utf8 parser convert all invalid utf8 chars to low surrogate code points(0x80~0xFF -> 0xDC80~0xDCFF), other string related functions will still work fine, and you can also handle these error if you want string s = "\xa0"; foreach(dchar d; s) { if (isValidUnicode(d)) { process(d); } else { handleError(d); } } == Quote from Jonathan M Davis (jmdavisp...@gmx.com)'s article > On Sunday 13 March 2011 04:34:24 ZY Zhou wrote: > > std.utf throw exception instead of crash the program. but you still need to > > add try/catch everywhere. > > > > My point is: this simple code should work, instead of crash, it is supposed > > to leave all invalid codes untouched and just process the valid parts. > > > > Stream file = new BufferedFile("sample.txt"); > > foreach(char[] line; file) { > > string s = line.idup.tolower; > > } > I think that it's completely unreasonable to expect all string functions to > worry about whether they're dealing with valid unicode or not. And a lot of > string stuff would involve ranges which would require converting each code > point > to UTF-32. And how is it supposed to do _that_ with invalid UTF-8? > I don't know how you expect to really be able to do anything with invalid > UTF-8 > anyway. There may be something that could be added to std.utf to help better > handle the situation, but I think that it's completely unreasonable to expect > all of the string-based and/or range-based functions to be able to handle > invalid unicode. > - Jonathan M Davis