Resources: OMGrammar.st OMTest.st

Guenther Noack Fri, 22 Jan 2010 06:53:23 -0800

Hi!

On Fri, Jan 22, 2010 at 01:31:07PM +0000, David Chisnall wrote:
> For simplicity, I like the idea of having the methods do the consumption, but 
> having the methods return objects that do the parsing has two really big 
> advantages:
> 
> 1) It makes the whole thing more powerful from the perspective of 
> subclassing.  You can write a grammar for language X that has no actions.  
> You can then subclass this grammar (in Smalltalk or whatever), call super to 
> get the rules, and then just send an -instantiating message to attach actions 
> to them.  If we're using local variables for intermediate results then we 
> don't have any way of accessing them from subclasses.
> 
> It seems that none of the other OMeta implementations really use this much.  
> For example, every implementation of the OMeta grammar itself seems to copy 
> the grammar description entirely and add actions, which, to me, completely 
> defeats the point of having an OO grammar description framework to start with.

I agree that it sounds tempting to have these variables live beyond rule
method scopes. A contra is that you don't get any static guarantees that
the variables you're using in the dictionary actually exist. That's
better when mapping these variables down to host-language variables.

I agree that it's strange the OMeta grammar is copied and modified for
each language-dependent implementation. Here's a suggestion how to
separate host-language things from the actual grammar:

  Starting from the rule listOf:
  ometa ListOf {
    listOf :itemrule :delimrule
       = itemrule:x (delimrule itemrule)*:xs
       -> [x] + xs
  }

We can get rid of the semantic action [x] + xs by extracting it to
another rule which gets x and xs as arguments:

  ometa ListOf {
    listOf :itemrule :delimrule
      = itemrule:x (delimrule itemrule)*:xs
        hostDependentConcatList(x, xs),

    hostDependentConcatList :x :xs = !([x] + xs)
  }

Move the hostDependentConcatList rule into a (host language dependent)
subclass. Now the ListOf class itself is host language independent.
Simple. (I really should do that for easier bootstrapping myself, I
think. :-))

[Fun fact: Invoking listOf(expr, ',') in a rule translates to a call to
_applyWithArgs('listOf', lambda{...}, lambda{...}), which in turn pushes
the two lambda functions (or other things) passed to it *onto the input
stream* and invokes the listOf method without arguments. listOf then
executes its rules: ':itemrule' is the same as 'anything:itemrule' and
pops the next element from the stream and stores it in the 'itemrule'
variable. Same with ':delimrule'. The listOf rule method thus has no
arguments in the compiled form.]

For bootstrapping, the host-dependent things are probably easiest to do
when implemented in the host language itself:

  def RubyListOf < ListOf
    def hostDependentConcatList
      x = anything() # arg 1
      xs = anything() # arg 2
      return [x] + xs
    end
  end

> 2) It's easier to do recursive rules.  For example, my listOf() can take any 
> parsing expression as an argument, while the one from the original OMeta 
> implementation seems to only be able to do single-selector rules or 
> terminals.  This means that I can do a list of arbitrary things much more 
> easily than they can and I can more easily write very complex rules.

As far as I've understood it, the OMeta implementations for Ruby and
JavaScript allow all kinds of objects. Maybe I'm wrong about that,
though. It's possible that the actual arguments to a rule with
parameters are actually host expressions, so maybe my example above
won't actually work so well.

> 2.5) It's easier (potentially) to make concurrent.  All of the parsing state 
> is stored on the stack, so we can do 'or' rules in separate threads if we 
> want and see if any of them are matched.  This is really trivial to do with 
> the existing futures code in EtoileThread; memoise the parsing expressions, 
> send some of them an inNewThread message, and then replace the or: rule with 
> one that tries to match them all but doesn't test the results until after 
> sending the messages.  

I understand the idea. Do you think that will really work? At which AST
depth does it pay off to do multithreading at the top-level 'or'? How
does the 'or' method estimate the expected remaining AST depth? It
sounds very unlikely to me that this will give good results, so I don't
think it's worth planning for that scenario.

> My gut feeling is that we should actually keep most of the existing PEG code 
> (especially now I've got it working mostly how I want it to be working ;-) 
> but remove the combination rules and just inline them into rules.  Then it 
> becomes easier for the rules to collect the actions.  Most importantly, we 
> remove the OMCapturingExpression, and make each rule do the capturing itself, 
> so that each rule has its own capture namespace.

I'm still a big fan of the Ruby implementation. It's just very simple
and straightforward. Rule methods return parsing results, host-language
variables are used for the variables, errors are reported as exceptions.
No need for a dictionary, no need for the OMParseResult class, only one
method per rule, the rule methods are easily readable. Sometimes,
closures (usually Ruby blocks) are used, but they're only passed downwards
the stack. The "official" Ruby OMeta implementation linked on the OMeta
web page has just about 400 LOC (Ruby-translated Metagrammar for
bootstrapping not counted).

-G?nther

> David
> 
> On 22 Jan 2010, at 13:06, Guenther Noack wrote:
> 
> > Hi David,
> > 
> > Ok, no worries. I know these debugging sessions.
> > 
> > It's interesting that you talk about implementing chained parse
> > failures. I also did a little ad-hoc implementation of this in my Ruby
> > experiment. When implementing this, don't forget the case that the OR
> > rule fails. When that happens, I believe you probably want to see an
> > error message like "OR with 3 options failed, reasons were: (list of
> > three other parse failures)". I didn't think about that initially and
> > this currently makes things a bit complicated.
> > 
> > In both my and the other Ruby implementation, parse failures are
> > exceptions. It's surely debatable whether this is good style, but the
> > generated code for the rules is much more straightforward when it
> > doesn't need to care about failures so much. Failure backtraces on a
> > rule-level (instead of method level) can be done by encapsulating rule
> > calls in an "apply" method, which attaches rule name information to
> > exceptions as they fly by.
> > 
> > It's pretty exciting to see the OMeta stuff working. Since yesterday
> > evening, my little Ruby hack can parse all kinds of fancy recognizers
> > already. It still needs a Rubyish-language parser for the host
> > expressions in the OMeta grammar. Maybe that can be bootstrapped from
> > the JavaScript implementation as well. It's *exciting*! :-)
> > 
> > -G?nther

_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] [Etoile-cvs] r5857 - in /branches/guenther/ObjMeta/OMTest.tool/Resources: OMGrammar.st OMTest.st

Reply via email to