On Saturday, 7 July 2012 at 20:29:26 UTC, Dmitry Olshansky wrote:
And given the backtracking nature of PEGs you'll do your distributed thing many times over or ... spend a lot of RAM to remember not to redo it. I recall lexing takes even more then parsing itself.
I think that your conclusions are about statistical evidences of PEG misuses and poor PEG parser implementations. My point was that there is nothing fundamentally worse in having lexer integrated with parser, but there are performance advantages of having to check less possible cases when the structural information is available (so that lexSmth could be called when Smth is expected, thus requiring less switch branches if switch is used).
As for lexing multiple times, simply use a free list of terminals (aka tokens). I still assume that grammar is properly defined, so that there is only one way to split source into tokens.
