So I've been digging in to FParsec, and actually looking at the API
somewhat carefully to wrap my head around what it does. I have to say that
I'm more impressed by combinators (from an engineering perspective) than I
expected to be. If one assumes a suitable degree of optimization from the
compiler, it's clearly possible to do relatively high performance parsing
using a library like this. There are a few places where I think I would
make different choices than FParsec did, mainly down at the tokenization
level, but I'm not sure my choices are actually better rather than
different.

The main thing that still concerns me about combinators is the conflict
between deterministic choice and potential parse ambiguities. I have the
same concern about PEG parsers.

By deterministic choice, I mean that the parser conceptually tries
productions/matching in some particular order specified by the developer.
This has two very nice consequences:

1. Pre-existing ambiguities in the underlying language, if any, can be
resolved by suitable organization of the parser.
2. Parsers can be composed with unambiguous results.

But neither of these is a goal when designing a new language. We don't need
to compose parsers, and we'd like to ensure that parse ambiguities don't
exist in the first place. That is: we'd like the grammar to be a
context-free grammar (CFG).

It's not that there's any particular magic to CFGs, nor that there is
anything inherently wrong with context-dependence that is introduced
intentionally. The concern is that we don't want to introduce context
dependence *unintentionally*. Context dependence requires that we specify
the order of application of the grammar productions in the language
specification. This is one more thing to specify (and therefore one more
thing to possibly get wrong). Oblivious context dependence results in
specification failure. Finally, there is a concern about language
evolution. If a new production is introduced, it's very important to
understand whether it "masks" behavior of old productions.

Because they adopt deterministic choice, PEG parsers and combinator parsers
suppress discovery of *unintentional* context dependence.

It has occurred to me that this can be resolved, and I'm wondering if
anyone has already considered this approach somewhere.

The construction of an [F]Parsec parser effectively performs compositions
of sub-parsers, with leaf parsers for tokens that are implemented using
what amount to built-in primitives. The "parser" object, or type,  In the
case of FParsec, the Parser type is defined as:

    type Parser<'TResult, 'TUserState> = CharStream<'TUserState> ->
Reply<'TResult>

But there is nothing magical about this definition. While the developer *can*
introduce new compositing operators, they typically do not do so. If we
take the list of compositing operators as non-extensible, then it appears
to me that we could implement an alternative version of the library in
which (a) the Parser type is defined as an AST node, and (b) the
compositors combine their input ASTs into a suitable combining output AST.
Once an AST for the input grammar is in hand, we could test it against
various parse algorithms to determine whether it is context-free. Once we
*know* it is context-free, it's perfectly fine to implement it using
recursive descent or any other sensible implementation.


Has anybody actually tried this?



Jonathan
_______________________________________________
bitc-dev mailing list
bitc-dev@coyotos.org
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to