So I've been digging in to FParsec, and actually looking at the API somewhat carefully to wrap my head around what it does. I have to say that I'm more impressed by combinators (from an engineering perspective) than I expected to be. If one assumes a suitable degree of optimization from the compiler, it's clearly possible to do relatively high performance parsing using a library like this. There are a few places where I think I would make different choices than FParsec did, mainly down at the tokenization level, but I'm not sure my choices are actually better rather than different.
The main thing that still concerns me about combinators is the conflict between deterministic choice and potential parse ambiguities. I have the same concern about PEG parsers. By deterministic choice, I mean that the parser conceptually tries productions/matching in some particular order specified by the developer. This has two very nice consequences: 1. Pre-existing ambiguities in the underlying language, if any, can be resolved by suitable organization of the parser. 2. Parsers can be composed with unambiguous results. But neither of these is a goal when designing a new language. We don't need to compose parsers, and we'd like to ensure that parse ambiguities don't exist in the first place. That is: we'd like the grammar to be a context-free grammar (CFG). It's not that there's any particular magic to CFGs, nor that there is anything inherently wrong with context-dependence that is introduced intentionally. The concern is that we don't want to introduce context dependence *unintentionally*. Context dependence requires that we specify the order of application of the grammar productions in the language specification. This is one more thing to specify (and therefore one more thing to possibly get wrong). Oblivious context dependence results in specification failure. Finally, there is a concern about language evolution. If a new production is introduced, it's very important to understand whether it "masks" behavior of old productions. Because they adopt deterministic choice, PEG parsers and combinator parsers suppress discovery of *unintentional* context dependence. It has occurred to me that this can be resolved, and I'm wondering if anyone has already considered this approach somewhere. The construction of an [F]Parsec parser effectively performs compositions of sub-parsers, with leaf parsers for tokens that are implemented using what amount to built-in primitives. The "parser" object, or type, In the case of FParsec, the Parser type is defined as: type Parser<'TResult, 'TUserState> = CharStream<'TUserState> -> Reply<'TResult> But there is nothing magical about this definition. While the developer *can* introduce new compositing operators, they typically do not do so. If we take the list of compositing operators as non-extensible, then it appears to me that we could implement an alternative version of the library in which (a) the Parser type is defined as an AST node, and (b) the compositors combine their input ASTs into a suitable combining output AST. Once an AST for the input grammar is in hand, we could test it against various parse algorithms to determine whether it is context-free. Once we *know* it is context-free, it's perfectly fine to implement it using recursive descent or any other sensible implementation. Has anybody actually tried this? Jonathan
_______________________________________________ bitc-dev mailing list bitc-dev@coyotos.org http://www.coyotos.org/mailman/listinfo/bitc-dev