On Tuesday, February 05, 2013 22:51:32 Andrei Alexandrescu wrote: > I think it would be reasonable for a lexer to require a range of ubyte > as input, and carry its own decoding. In the first approximation it may > even require a random-access range of ubyte.
I'd have to think about how you'd handle the Unicode stuff in that case, since I'm not quite sure what you mean by having it handle its own decoding if it's a range of code units, but what I've been working on works with all of the character types and is very careful about how it deals with decoding in order to avoid unnecessary decoding. And that wasn't all that hard as far as the lexer's code goes. The hard part with that was making std.utf work with ranges of code units rather than just strings, and that was committed months ago. - Jonathan M Davis
