Thanks, I looked through StringDecoder, and it seems that apart from detecting character boundaries, it is relying on buffer.toString to decode the UTF8. I think buffer.toString is ultimately relying on V8 to do the decoding but I'm not sure.
I got hold of a good invalid UTF8 test data set and Node passes everything with only 3 exceptions: U+110000 (invalid code point + disallowed in UTF-8 per RFC 3629): Decoding '\xF4\x90\x80\x80' does not equal '\uFFFD\uFFFD\uFFFD\uFFFD'. U+DBFF U+DC00 Decoding '\xED\xAE\x80\xED\xBF\xBF' does not equal '\uDBFF\uDC00'. U+DBFF U+DFFF Decoding '\xED\xAF\xBF\xED\xBF\xBF' does not equal '\uDBBF\uDFFF'. I'm working on a Javascript decoder to match Node on this suite. On Tuesday, April 1, 2014 4:12:05 PM UTC+2, mscdex wrote: > > On Tuesday, April 1, 2014 2:13:32 AM UTC-4, Joran Dirk Greef wrote: >> >> I am writing a UTF8 decoder for browser use to decode a typed array into >> a string. >> >> I want it to handle invalid UTF8 in the same way as Node for various >> invalid inputs, as client and server need to produce identical output, for >> syncing and testing purposes. >> >> > node has StringDecoder: > https://github.com/joyent/node/blob/master/lib/string_decoder.js > -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en --- You received this message because you are subscribed to the Google Groups "nodejs" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
