Am Fri, 07 Feb 2014 21:04:08 -0800 schrieb Jonathan M Davis <[email protected]>:
> On Saturday, February 08, 2014 05:29:35 Marco Leise wrote: > > I guess we just have two use cases here. One where invalid > > encoding is not an error (e.g. for sanitizing purposes) and > > one where you don't want to lose information and have to > > enforce correct encoding. > > Name the first one "decodeSubst" maybe and have decode call > > that and check for 0xFFFD? > > I think that that would call for us to have 3 related but distinct functions: > > 1. decode, which throws on invalid Unicode. We already have this. > > 2. isValidUnicode, which returns whether the string is valid Unicode and does > not throw. We don't yet have this. Rather, we have validate which does the > same job and then throws instead of returning bool. Yes, that's the one that needs to be added. > 3. sanitizeUnicode (or whatever would be a good name for it), which replaces > invalid Unicode with 0xFFFD (or whatever the appropriate character is) so > that > it can be operated on without causing decode to throw in spite of the fact > that it was invalid Unicode. We don't have anything like this yet. And oh wonder, we actually have that already! Problem solved: http://dlang.org/phobos/std_encoding.html#.sanitize (Not that I knew that before hand *cough*) Or does someone have a need to also sanitize code point by code point? > - Jonathan M Davis -- Marco
