On Fri, Sep 20, 2013 at 6:28 AM, Erik Corry <[email protected]> wrote: > Just to be clear, V8 does not generate CESU-8 if you give it well formed > UTF-16.
Sure. > If you give it broken UTF-16 with unpaired surrogates you can either break > the data or emit CESU-8. In the first case, you overwrite the unpaired > surrogates with some sort of error character code. In the second case you > can generate three-byte UTF-8 sequences that are not strictly legal. The > second option will preserve the data if you round-trip it into V8 again (or > feed it to other apps that are liberal in what they accept), so that's what > V8 currently does. That's a bug. A utf-8 encoder should never emit byte sequences that are not valid utf-8. You should emit U+FFFD as a byte sequence instead for lone surrogates or terminate processing. Lone surrogates should not round-trip through the encoding layer as you can create down-level security bugs in unsuspecting decoders. -- http://annevankesteren.nl/ _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

