On Fri, May 15, 2015 at 12:40 PM, Matt Rice <ratm...@gmail.com> wrote:
> On Wed, May 13, 2015 at 7:38 AM, Jonathan S. Shapiro <s...@eros-os.org> > wrote: > > > type Parser<'TResult, 'TUserState> = CharStream<'TUserState> -> > > Reply<'TResult> > > > > But there is nothing magical about this definition. While the developer > > *can* introduce new compositing operators, they typically do not do so. > If > > we take the list of compositing operators as non-extensible, > > I guess i'm not sure how mixfix fits into this... It doesn't. Mixfix divides a conventional parser into a static parser and a dynamic parser. In most cases, the static parser doesn't have ability to deal with dynamically defined operator precedence, and therefore cannot deal with the syntax changes introduced dynamically by the mixfix mechanism. The result is that the static parser merely collects the expression tokens into a sequence (e.g. a list) and the dynamic parser then produces a parse tree for the expression. Now that you have me thinking about it, though, it *would* be possible to design an operator precedence combinator-style parser in which the actual precedence of the operators can be expressed dynamically. I had not previously considered this. I'm initially hesitant, for two mundane reasons and one serious one. The mundane reasons: 1. Making this all work in BitC required introducing some special-case mixfix operators into the master mixfix table so that "built in" expression syntax (mainly application) could be handled in the dynamic parser. I'm not sure how that plays out if we do it with a combinator. Probably easy to check. 2. I already have code that does what I think I want (from the v0 compiler), and there is a degree of "if it ain't broke..." in my mind. Also, I'm not very experienced with combinators and I don't want to screw this up. I should probably look at some existing operator precedence combinators and see if the MixFix engine can be expressed that way. The more serious concern: If we merge the static and dynamic parsers by building a custom combinator, we lose the ability to analyze the grammar for parse ambiguities. [Late addition: I now see how to avoid this issue, and I'm pretty convinced that I can straightforwardly turn the existing v0 mixfix parser into a custom combinator. In fact, turning it into a combinator in the right way resolves the "can't analyze" issue.] The ambiguity analysis issue isn't just a mundane issue. An unambiguous grammar guarantees a linear parse, where a general CFG is O(n^3). That's a difference that actually matters in practice. One of the weaknesses of the current BitC mixfix parser is that it uses backtracking and is therefore [potentially] O(n^3). I actually did some work in there to prune the search space, but I need to do some more work so that AST tree rearrangement in the presence of certain precedence combinations doesn't necessitate backtracking. The necessary restructuring is conceptually similar to what happens with balanced binary trees, but with a different ordering criteria. IIRC, the necessary tree rearrangement is actually described in the early operator precedence parsing papers. I think by Aho, but if not we can use some of the ideas that Vaughan Pratt used in top-down operator precedence parsing. It wasn't worth the bother for a quick implementation, and what with expressions being generally very short I wasn't that worried about it in practice for a v0 compiler. Optimizing mixfix using conventional optimizations seems a tricky business. Brzozozwksi derivatives can be computed incrementally (which is in fact how many modern regexp engines work). They have an extension to CFG's (Might, Derais, and Spiewak), and it *appears* to me that the extension can be made to operate incrementally. Especially in the special case of mixfix, where we know by construction that productions are never nullable. The catch is that the set of productions introduced by mixfix is constantly changing, and every time they change you need to revise any derivatives that you have already built. The simplest thing is to just flush the derivative cache, but at that point you've thrown away most of your optimization. So it doesn't seem at all clear what to do about optimizing that, nor is it clear that it's worth bothering. Might make a good undergraduate project for someone, since it's fairly well self-contained. Whether we optimize it or not, the presence of mixfix introduces a black box into the parse ambiguity analysis. We can do a meta-analysis to show that the sequence of tokens that participate in a mixfix sub-parse can be captured without ambiguity. The existing v0 parser does that already. But we then have to assert a leap of faith that the operator precedence parser is linear time (which we know it can be, so maybe that's not bad). ...it seems like with > mixfix you'd want to check for ambiguity at the time the parser is > compiled, but then also at the time when the mixfix operators have > been collected? does mixfix not fall into this realm of /they > typically do not do so/, or maybe its just a case of reserved words > are used in such a way that mixfix cannot introduce an ambiguity Mixfix cannot introduce ambiguity, but the reason isn't immediately obvious. Here are the pieces: t. The expression sub-grammar is completely partitioned from the surrounding grammar, and (by design) there is no overlap in the associated keywords. Therefore, the main grammar can be examined for ambiguity by viewing the expression non-terminal as "obtained by magic". 2. The mixfix introduction form is (by construction) a fully bracketed expression. This means that the operator precedence table is fixed for the duration of any given expression subtree. Which in turn means that you don't need to consider anything weird like precedence changing on the fly or keywords being rebound on the fly in mid-expression. That's actually pretty important. :-) 3. The mixfix sub-grammars are (a) non-nullable, (b) have (by construction) non-overlapping keywords. The only way you can really get in trouble with them is to have two mixfix definitions with identical precedence. That's generally a design mistake on the part of the programmer, and it can either be resolved by fiat (e.g. by lexical ordering of the tokens) or by signaling a compile-time error. shap
_______________________________________________ bitc-dev mailing list bitc-dev@coyotos.org http://www.coyotos.org/mailman/listinfo/bitc-dev