On 08-Jul-12 00:09, Andrei Alexandrescu wrote:
On 7/7/12 3:17 PM, Dmitry Olshansky wrote:
I'll have to point out that the whole point about integrated lexing is
moot. It's more of liability then benefit. At very least it's just
implementation curiosity not advantage.

Interesting. I'll have to ask for more details on why.


I think I told that before but anyway the main point is:
the reason lexer exists is performance.

It's obvious that one can write grammar without ever using lexer. If the notation is good say regex or the one used in PEG (that is not quite the same it turns out) then token can be easily describe in grammar. Once we got lexer can give only 2 things: -manageability, it's jsut splitting grammar in 2 parts that could be maintained separately (in fact some "lexers" are not DFA nor regular) (sometimes it could go the opposite direction - it could be better to have one integrated grammar) - speed. The reason it could be faster is because lexer can use highly deterministic grammar.

Now back to PEGs and packrats. What I see everywhere I look is essentially integration of a backtracking-NFA lexer, that is not fast nor is particularly convenient. The notation is the whole other matter.

So my view on the matter: whether or not lexer is integrated is two-fold question: notation vs implementation.

E.g. it may make regular lexer-like things a part of notation thus having rules like:
        Identifier  -> Alpha (Alpha|Number|_)*

Yet it may or may use the same engine for _all_ _rules_, in fact if implementation optimize things by ripping off regular parts of grammar to small DFA engines it would make some kind of lexer behind the scenes!

Anyway highly hybrid parsers are all the rage now, and reasons are obvious - they runs fast and can be bend by some magic to quite convoluted grammars of modern languages ;)

--
Dmitry Olshansky


Reply via email to