On Sun, Mar 18, 2018 at 10:43 AM, C. Scott Ananian <[email protected]> wrote:
> On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren < > [email protected]> wrote: > >> Violently agree but do not understand (I guess I'm just dumb...) why (for >> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal >> (although the result would differ). >> > > Because there are JavaScript strings which do not form valid UTF-16 code > units. For example, the one-character string '\uD800'. On the input > validation side, there are 8-bit strings which can not be decoded as > UTF-8. A complete sorting spec needs to describe how these are to be > handled. For example, something like WTF-8: http://simonsapin. > github.io/wtf-8/ > Let's get terminology straight. "\uD800" is a valid string of UTF-16 code units. It is also a valid string of codepoints. It is not a valid string of scalar values. http://www.unicode.org/glossary/#code_point : Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. http://www.unicode.org/glossary/#code_unit : The minimal bit combination that can represent a unit of encoded text for processing or interchange. http://www.unicode.org/glossary/#unicode_scalar_value : Any Unicode *code point <http://www.unicode.org/glossary/#code_point>* except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive.
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

