[nodejs] Re: How does Node decode invalid UTF8?

Joran Dirk Greef Wed, 02 Apr 2014 00:38:19 -0700

Thanks, I looked through StringDecoder, and it seems that apart from 
detecting character boundaries, it is relying on buffer.toString to decode 
the UTF8. I think buffer.toString is ultimately relying on V8 to do the 
decoding but I'm not sure.


I got hold of a good invalid UTF8 test data set and Node passes everything 
with only 3 exceptions:

U+110000 (invalid code point + disallowed in UTF-8 per RFC 3629):
Decoding '\xF4\x90\x80\x80' does not equal '\uFFFD\uFFFD\uFFFD\uFFFD'.

U+DBFF U+DC00
Decoding '\xED\xAE\x80\xED\xBF\xBF' does not equal '\uDBFF\uDC00'.

U+DBFF U+DFFF
Decoding '\xED\xAF\xBF\xED\xBF\xBF' does not equal '\uDBBF\uDFFF'.

I'm working on a Javascript decoder to match Node on this suite.

On Tuesday, April 1, 2014 4:12:05 PM UTC+2, mscdex wrote:
>
> On Tuesday, April 1, 2014 2:13:32 AM UTC-4, Joran Dirk Greef wrote:
>>
>> I am writing a UTF8 decoder for browser use to decode a typed array into 
>> a string.
>>
>> I want it to handle invalid UTF8 in the same way as Node for various 
>> invalid inputs, as client and server need to produce identical output, for 
>> syncing and testing purposes.
>>
>>
> node has StringDecoder: 
> https://github.com/joyent/node/blob/master/lib/string_decoder.js
>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[nodejs] Re: How does Node decode invalid UTF8?

Reply via email to