How should node buffers/streams be handled if parsing/conversion of utf8 
encoded text data will be done?

This simple test shows that naive concatenation or processing of buffers 
will fail:

var str = "Hälöö!";

var b = new Buffer(str);

var b1 = b.slice(0, 5);
var b2 = b.slice(5);

console.log("b: " + b.toString());
console.log("b1: " + b1.toString());
console.log("b2: " + b2.toString());

var str2 = b1.toString() + b2.toString();

console.log("str2: " + str2);

 
So assume that a http response containing utf8 encoded text will be 
processed before forwarding it, to convert it to a string it must be 
concatenated first. Or is text manipulation better carried out directly on 
the buffer chunks? Which of course can be pretty hard compared to simply 
applying regex'es/substring manipulation on complete strings.

A quick fix is of course to filter all chunks through some decoder that 
keeps track of trailing utf8 sequences that is incomplete, 
maybe that's what the undocumented string decoder does? 
http://nodejs.org/api/string_decoder.html

//Mattias

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to