On 2/5/13 10:29 PM, Jonathan M Davis wrote:
On Tuesday, February 05, 2013 08:34:29 Andrei Alexandrescu wrote:
As far as I could tell the dependencies of the lexer are fairly
contained (util, token, identifier) and conversion to input range is
immediate.
I don't remember all of the details at the moment, since it's been several
months since I looked at dmd's lexer, but a lot of the problem stems from the
fact that it's all written around the assumption that it's dealing with a
char*. Converting it to operate on string might be fairly straightforward, but
it gets more complicated when dealing with ranges. Also, both Walter and
others have stated that the lexer in D should be configurable in a number of
ways, and dmd's lexer isn't configurable at all. So, while a direct translation
would likely be quick, refactoring it to do what it's been asked to be able to
do would not be.
I'm quite a ways along with one that's written from scratch, but I need to find
the time to finish it. Also, doing it from scratch has had the added benefit of
helping me find bugs in the spec and in dmd.
I think it would be reasonable for a lexer to require a range of ubyte
as input, and carry its own decoding. In the first approximation it may
even require a random-access range of ubyte.
Andrei