Re: [nodejs] buffer toString with partial utf8 character?

Alex Kocharin Mon, 08 Sep 2014 08:35:34 -0700

05.09.2014, 13:32, "Mark Hahn" <[email protected]>:

So if I find \uFFFD as the last character of a valid but truncated utf8 buffer and I strip it, I should always end up with a valid string, right?

That was an awkward sentence. Let me try in code. If buf is the first 512 bytes of a long utf8 file will the following always produce a valid string?

str = buf.toString();
if (str[str.length-1] is '\uFFFD') str = str.slice(0, -1);

Nope, that's syntax error. It should be either:

if (str[str.length-1] === '\uFFFD') str = str.slice(0, -1)

Or:

if (str[str.length-1] is '\uFFFD') then str = str.slice(0, -1)

Other than that... If you're converting buffer to utf8, you *always* get a valid utf8 in the output.

But it could contain special characters that you don't want. They can appear in the middle of the string as well if your input isn't a valid utf8:

```

> Buffer([32,32,32,255,32,32,32]).toString('utf8').charCodeAt(3)

65533

```

Also BOM in the beginning, though that's rare now.

If you want to stream buffers, there is some tool in node.js core that takes care of ending characters... don't remember which one

--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/4019051409919251%40web19o.yandex.ru.
For more options, visit https://groups.google.com/d/optout.

Re: [nodejs] buffer toString with partial utf8 character?

Reply via email to