Parsing

Ian Grant Mon, 15 Sep 2014 10:44:00 -0700

LALR parser generators are not the last word in parsing technology. Many
unambiguous context-free languages are not well suited to LALR parsing. The
C language is an exception: the C grammar from the ISO standard can be
typed almost verbatim into a YACC/Bison file, and it will produce just the
one well-documented shift/reduce conflict in the IF ELSE statement, which
is fortuitously resolved the right way. But this is achieved at some cost,
because the C grammar does not actually constrain the parsed statements to
those which are well-formed. In addition to the grammar, the language
definition includes a slew of extra conditions which need to be checked.
These checks must be coded into a post-parsing phase, because they are not
things that the LALR tables can encode.


Important examples of grammars which are not LALR parsable include ASN.1
and Standard ML. When Claudio Russo added higher order functors to Moscow
ML, he apparently wasn't able to figure out how to get the LALR parser to
parse the grammar for the language he'd invented. So the Moscow ML parser
produces 37 shift/reduce conflicts. I don't think anyone actually knows how
those higher order declarations are parsed in practice.

But I think the problem of parsing has actually been solved. Tom Ridge's
"p3" combinator parsers with oracles will parse any unambiguous CFG in
polynomial time complexity at most O^3, or possibly O^2, I need to check,
in the length of the input. The advantage of combinator parsers is that
they parse top-down, in a way which follows the natural structure of the
grammar. So errors can be explained to the user (and the developer!) in
terms of the actual derivations (parsing decisions) taken on the input,
This is not the case with bottom-up parsers such as LALR.

It would be really good to have a p3-like parser generator working in
Guile. The only technical part is integrating the oracle with the
combinator parsers. Ridge's example uses an Earley parser, but he points
out that any parser generator could be used as the oracle. So perhaps LALR
parsers would do. The grammar the oracles need to be able to parse is much
simpler than the general CFG's the combinators work on: the oracles are
just there to make the decisions about where to split the left-recursive
productions. But if the LALR oracle doesn't work out, then it is easy to
write an Early parser in scheme. See
http://www.cavar.me/damir/charty/scheme/ which is LGPLed.

See Ridge's papers at http://www.tom-ridge.com/parsing.html

Again, one cold implement the recursive parser combinators in
scheme-generated lightning. If this were done abstractly then Guile could
generate fast machine code parsers for any host language.

Ian

Parsing

Reply via email to