On Wednesday, November 20, 2013 11:45:57 Lars T. Kyllingstad wrote: > On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei > > Alexandrescu wrote: > > (c) A variety of text functions currently suffer because we > > don't make the difference between validated UTF strings and > > potentially invalid ones. > > I think it is fair to always assume that a char[] is a valid > UTF-8 string, and instead perform the validation when > creating/filling the string from a non-validated source.
That doesn't work when strings are being created via concatenation and the like inside the program rather than simply coming from outside the program. > Take std.file.read() as an example; it returns void[], but has a > validating counterpart in std.file.readText(). > > I think we should use ubyte[] to a greater extent for data which > is potentially *not* valid UTF. Well, we've already discussed the possibility of using ubyte[] to indicate ASCII strings, and that makes a lot more sense IMHO, because then no decoding occurs (which is precisely what you want for ASCII), whereas with a string that's potentially invalid UTF, it's not that we don't want to decode it. It's just that we need to validate it when decoding it. So, I'd argue that ubyte[] should be used when you want to operate on code units rather than code points rather than it having anything to do with validating code points. - Jonathan M Davis
