On 8/22/2014 2:27 PM, Sönke Ludwig wrote:
Am 22.08.2014 20:08, schrieb Walter Bright:
1. There's no mention of what will happen if it is passed malformed JSON
strings. I presume an exception is thrown. Exceptions are both slow and
consume GC memory. I suggest an alternative would be to emit an "Error"
token instead; this would be much like how the UTF decoding algorithms
emit a "replacement char" for invalid UTF sequences.
The latest version now features a LexOptions.noThrow option which causes an
error token to be emitted instead. After popping the error token, the range is
always empty.
Having a nothrow option may prevent the functions from being attributed as
"nothrow".
But in any case, to worship at the Altar Of Composability, the error token could
always be emitted, and then provide another algorithm which passes through all
non-error tokens, and throws if it sees an error token.
2. The escape sequenced strings presumably consume GC memory. This will
be a problem for high performance code. I suggest either leaving them
undecoded in the token stream, and letting higher level code decide what
to do about them, or provide a hook that the user can override with his
own allocation scheme.
The problem is that it really depends on the use case and on the type of input
stream which approach is more efficient (storing the escaped version of a string
might require *two* allocations if the input range cannot be sliced and if the
decoded string is then requested by the parser). My current idea therefore is to
simply make this configurable, too.
Enabling the use of custom allocators should be easily possible as an add-on
functionality later on. At least my suggestion would be to wait with this until
we have a finished std.allocator module.
I'm worried that std.allocator is stalled and we'll be digging ourselves deeper
into needing to revise things later to remove GC usage. I'd really like to find
a way to abstract the allocation away from the algorithm.