On Thu, May 17, 2012 at 9:56 PM, Mattias Ernelli <[email protected]> wrote:
> How should node buffers/streams be handled if parsing/conversion of utf8
> encoded text data will be done?
>
> This simple test shows that naive concatenation or processing of buffers
> will fail:
>
> var str = "Hälöö!";
>
> var b = new Buffer(str);
>
> var b1 = b.slice(0, 5);
> var b2 = b.slice(5);
>
> console.log("b: " + b.toString());
> console.log("b1: " + b1.toString());
> console.log("b2: " + b2.toString());
>
> var str2 = b1.toString() + b2.toString();
>
> console.log("str2: " + str2);
>
>
> So assume that a http response containing utf8 encoded text will be
> processed before forwarding it, to convert it to a string it must be
> concatenated first. Or is text manipulation better carried out directly on
> the buffer chunks? Which of course can be pretty hard compared to simply
> applying regex'es/substring manipulation on complete strings.
>
> A quick fix is of course to filter all chunks through some decoder that
> keeps track of trailing utf8 sequences that is incomplete, maybe that's what
> the undocumented string
> decoder does? http://nodejs.org/api/string_decoder.html

Yes, string_decoder will do what you want. Example usage:

  var StringDecoder = require('string_decoder').StringDecoder;
  var sd = new StringDecoder('utf8');

  var buf = new Buffer('Hälöö!');
  var buf1 = buf.slice(0, 5);
  var buf2 = buf.slice(5);

  var str1 = sd.write(buf1);
  var str2 = sd.write(buf2);
  var str3 = str1 + str2;

Alternatively, you can use node-buffertools (`npm install
buffertools`) to concatenate the buffers efficiently.

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to