On Sunday, 12 October 2014 at 18:17:29 UTC, Andrei Alexandrescu
wrote:
** The string after lexing is correctly scanned and stored in
raw format (escapes are not rewritten) and decoded on demand.
Problem with decoding is that it may allocate memory, and it
would be great (and not difficult) to make the lexer 100%
lazy/non-allocating. To achieve that, lexer.d should define TWO
"Kind"s of strings at the lexer level: regular string and
undecoded string. The former is lexer.d's way of saying "I got
lucky" in the sense that it didn't detect any '\\' so the raw
and decoded strings are identical. No need for anyone to do any
further processing in the majority of cases => win. The latter
means the lexer lexed the string, saw at least one '\\', and
leaves it to the caller to do the actual decoding.
I'd like to see unescapeStringLiteral() made public. Then I can
unescape multiple strings to the same preallocated destination,
or even unescape in place (guaranteed to work since the result
will always be smaller than the input).