On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:
On 8/23/2014 10:42 AM, Sönke Ludwig wrote:
Am 23.08.2014 19:38, schrieb Walter Bright:
On 8/23/2014 9:36 AM, Sönke Ludwig wrote:
input types "string" and "immutable(ubyte)[]"

Why the immutable(ubyte)[] ?

I've adopted that basically from Andrei's module. The idea is to allow
processing data with arbitrary character encoding. However, the output
will
always be Unicode and JSON is defined to be encoded as Unicode, too,
so that
could probably be dropped...

I feel that non-UTF encodings should be handled by adapter algorithms,
not embedded into the JSON lexer, so yes, I'd drop that.

For performance purposes, determining encoding during lexing is useful. You can avoid any conversion costs when you know that the original string is ascii or utf-8 or other. The cost during lexing is essentially zero. The cost of storing that state might be a concern, or it might be free in otherwise unused padding space. The cost of re-scanning strings that can be avoided is non-trivial.

My past experience with this was in an http parser, where there's even more complex logic than json parsing, but the concepts still apply.

Reply via email to