Hi all, In the distant past, SpiderMonkey APIs consumed source text as two-byte UCS-2 or one-byte |const char*|. Was one-byte text ASCII? UTF-8? EBCDIC? Something else? Who could say; no one thought about text encodings then. *By happenstance* one-byte JS text was Latin-1: a byte is a code point. And so lots of people used Latin-1 for JS purely because SpiderMonkey's carelessness made it easy.
SpiderMonkey's UTF-8 source support is far better and clearer now. Most single-byte source users use UTF-8. So I'm changing the remaining Gecko Latin-1 users to UTF-8. The following scripts/script loaders now use exclusively UTF-8: * JS components/modules (bug 1492932) * subscripts via mozIJSSubScriptLoader.loadSubScript{,WithOptions} (bug 1492937) * mochitest-browser scripts, because they're subscripts (bug 1492937) * SJS scripts executed by httpd.js, because they're subscripts (bug 1513152, bug 1492937) [0] Also, proxy autoconfig scripts may now be valid UTF-8 (bug 1492938). (For compatibility reasons, invalid UTF-8 is treated as Latin-1, by inflating to UTF-16 and compiling that.) Every affected script in the tree used UTF-8, so this just makes reality match expectation. But it sometimes changes behavior and may affect patch backports: * You may use non-ASCII code points directly in scripts (even outside comments) without needing escape sequences. * If you *intend* to construct a string of the constituent UTF-8 code units of a non-ASCII code point, you must use hexadecimal escapes: "\xF0\x9F\x92\xA9". Another step toward fewer text encodings. \o/ Jeff 0. Note that until bug 1514075 lands, SJS scripts used in Android test runs will be interpreted as Latin-1 there (and only there). Hopefully we can fix that quickly! _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform