Let's see: - Conversion to UTF-8: If the string isn't well-formed, you wouldn't refuse to convert it, so isValid doesn't really help. You still have to look at all code units, and convert unpaired surrogates to the UTF-8 sequence for U+FFFD.
- Conversion from UTF-8: For security reasons, you have to check for well-formedness before conversion, in particular to catch non-shortest forms [1]. - HTML form data: Same situation as conversion to UTF-8. - Base64 encodes binary data, so UTF-16 well-formedness rules don't apply. I don't think we'd add API just to flag an issue - that's what documentation is for. Norbert [1] http://www.unicode.org/reports/tr36/#UTF-8_Exploit On Mar 25, 2012, at 1:57 , Roger Andrews wrote: > I use something like String.isValid functionality in a transcoder that > converts Strings to/from UTF-8, HTML Formdata (MIME type > application/x-www-form-urlencoded -- not the same as URI encoding!), and > Base64. > > Admittedly these currently use 'encodeURI' to do the work, or it just drops > out naturally when considering UTF-8 sequences. > > (I considered testing the regexp > /^(?:[\u0000-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF])*$/ > against the input string.) > > Maybe the function is too obscure for general use, although its presence does > flag up the surrogate-pair issue to developers. > > -------------------------------------------------- > From: "Norbert Lindenberg" <[email protected]> >> >> It's easy to provide this function, but in which situations would it be >> useful? In most cases that I can think of you're interested in far more >> constrained definitions of validity: >> - what are valid ECMAScript identifiers? >> - what are valid BCP 47 language tags? >> - what are the characters allowed in a certain protocol? >> - what are the characters that my browser can render? >> >> Thanks, >> Norbert >> >> >> On Mar 24, 2012, at 12:12 , David Herman wrote: >> >>> On Mar 23, 2012, at 11:45 AM, Roger Andrews wrote: >>> >>>> Concerning UTF-16 surrogate pairs, how about a function like: >>>> String.isValid( str ) >>>> to discover whether surrogates are used correctly in 'str'? >>>> >>>> Something like Array.isArray(). >>> >>> No need for it to be a class method, since it only operates on strings. >>> We could simply have String.prototype.isValid(). Note that it would work >>> for primitive strings as well, thanks to JS's automatic promotion >>> semantics. >>> >>> Dave >>> _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

