Re: [bitc-dev] best algorithms / libraries for parsing ambiguous grammars

Geoffrey Irving Tue, 14 Oct 2014 16:17:02 -0700

On Tue, Oct 14, 2014 at 2:57 PM, Jonathan S. Shapiro <[email protected]> wrote:
> On Tue, Oct 14, 2014 at 2:38 PM, Geoffrey Irving <[email protected]> wrote:
>>
>> Perfect, that answers even the more detailed form of my question.  Our
>> grammar will be intentionally ambiguous since we'll be trying to
>> detect and fix broken code.  E.g., we want to do things like parsing
>> f(x,y) as both applying f to two arguments and applying f to a single
>> tuple, then disambiguating using types later on.
>>
>> Thus, GLR seems good, and the IBM Eclipse parser seems like a
>> wonderful place to start.
>
> You may be saying incompatible things here, I'm not quite sure. It depends
> on what you mean by "broken".
>
> If "broken" means "it'll be one of several defined parses, but I don't know
> which one, and I may want to look at it in several ways", then GLR or
> something like it would be your first go-to solution. GLR will actually hand
> you back all possible parse trees for a given input. The intention is to run
> a resolving pass later, preserving the separation between parsing and
> semantics. But you're free to use the multiple trees directly.


Broken means at least the above, so we're good so far.

> GLR in and of itself may not help if you need to deal with input that is
> actually malformed. The way to think about that problem is that you're going
> to get out a sequence that is either (a) a set of valid parse trees in the
> style of GLR, or (b) a set of invalid parses, each of which consists of a
> sequence of parse sub-tree sets (due to possible ambiguities), some of which
> may consist solely of the distinguished "unrecognized token" parse subtree
> that serves as a placeholder for text that can't be assembled into any
> larger unit
>
> The thing you may want to look at in Eclipse is known as the CDT parser.
> Have a look at the talk slides here:
>
> http://wiki.eclipse.org/images/e/ec/McMaster_2012_invited_talk_cdt.pdf

Ew.  I'm extremely glad we're not dealing with a preprocessor.  Java
has at least that going for it.

> Pay particular attention starting at slide 38, where he starts talking about
> incomplete user input (which is to say: malformed input). Then have a look
> at the five-part series on building a CDT-based editor:

The bits about malformed input are very interesting.  Starting with
GLR is probably the way to go, but I'll keep the completion node trick
in mind in case that can't be hacked into a GLR framework later on.

> http://www.ibm.com/developerworks/views/opensource/libraryview.jsp?search_by=CDT+based+editor

Thanks, that's useful.  It seems like it would be pretty hard to get a
CDT-style parser working with the kind of downstream parse forest
filtering we have in mind.

> These days I wonder if GLR wouldn't be a better framework, mainly because it
> tolerates ambiguities and proceeds bottom up. CDT is top-down, so there are
> various kinds of errors at the beginnings of files that can make it
> difficult to get any partial progress. I haven't played with it enough to
> have a sense of how well they do on that sort of thing. The "Completion
> Node" idea may give them enough to get past that sort of thing in practice.

At least at first, we'll probably be applying GLR one line at a time
after chopping things up based on indentation.  This is a hack, but it
should be enough to get us off the ground and experimenting.  It also
makes it reasonably easy to reparse only the changed portion.

Geoffrey
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] best algorithms / libraries for parsing ambiguous grammars

Reply via email to