Resources: OMGrammar.st OMTest.st

David Chisnall Fri, 22 Jan 2010 07:22:53 -0800

On 22 Jan 2010, at 14:45, Guenther Noack wrote:

> Hi!
> 
> On Fri, Jan 22, 2010 at 01:31:07PM +0000, David Chisnall wrote:
>> For simplicity, I like the idea of having the methods do the consumption, 
>> but having the methods return objects that do the parsing has two really big 
>> advantages:
>> 
>> 1) It makes the whole thing more powerful from the perspective of 
>> subclassing.  You can write a grammar for language X that has no actions.  
>> You can then subclass this grammar (in Smalltalk or whatever), call super to 
>> get the rules, and then just send an -instantiating message to attach 
>> actions to them.  If we're using local variables for intermediate results 
>> then we don't have any way of accessing them from subclasses.
>> 
>> It seems that none of the other OMeta implementations really use this much.  
>> For example, every implementation of the OMeta grammar itself seems to copy 
>> the grammar description entirely and add actions, which, to me, completely 
>> defeats the point of having an OO grammar description framework to start 
>> with.
> 
> I agree that it sounds tempting to have these variables live beyond rule
> method scopes. A contra is that you don't get any static guarantees that
> the variables you're using in the dictionary actually exist. That's
> better when mapping these variables down to host-language variables.


Not really an issue; we can do static checking on the AST trivially to test 
that.

> I agree that it's strange the OMeta grammar is copied and modified for
> each language-dependent implementation. Here's a suggestion how to
> separate host-language things from the actual grammar:
> 
>  Starting from the rule listOf:
>  ometa ListOf {
>    listOf :itemrule :delimrule
>       = itemrule:x (delimrule itemrule)*:xs
>       -> [x] + xs
>  }
> 
> We can get rid of the semantic action [x] + xs by extracting it to
> another rule which gets x and xs as arguments:
> 
>  ometa ListOf {
>    listOf :itemrule :delimrule
>      = itemrule:x (delimrule itemrule)*:xs
>        hostDependentConcatList(x, xs),
> 
>    hostDependentConcatList :x :xs = !([x] + xs)
>  }

This strikes me as a very C++ approach.  You need to explicitly decide which 
bits might be subclassed later when you write the grammar originally.  The 
point of OO is that you don't need to do that...

> Move the hostDependentConcatList rule into a (host language dependent)
> subclass. Now the ListOf class itself is host language independent.
> Simple. (I really should do that for easier bootstrapping myself, I
> think. :-))
> 
> [Fun fact: Invoking listOf(expr, ',') in a rule translates to a call to
> _applyWithArgs('listOf', lambda{...}, lambda{...}), which in turn pushes
> the two lambda functions (or other things) passed to it *onto the input
> stream* and invokes the listOf method without arguments. listOf then
> executes its rules: ':itemrule' is the same as 'anything:itemrule' and
> pops the next element from the stream and stores it in the 'itemrule'
> variable. Same with ':delimrule'. The listOf rule method thus has no
> arguments in the compiled form.]

This sounds quite horrible.  So the parser is a stack machine internally?  It 
seems like they're just implementing a conventional parser in an OO language, 
rather than translating grammars into a natural OO representation, but maybe 
I'm missing something.

> For bootstrapping, the host-dependent things are probably easiest to do
> when implemented in the host language itself:
> 
>  def RubyListOf < ListOf
>    def hostDependentConcatList
>      x = anything() # arg 1
>      xs = anything() # arg 2
>      return [x] + xs
>    end
>  end
> 
>> 2) It's easier to do recursive rules.  For example, my listOf() can take any 
>> parsing expression as an argument, while the one from the original OMeta 
>> implementation seems to only be able to do single-selector rules or 
>> terminals.  This means that I can do a list of arbitrary things much more 
>> easily than they can and I can more easily write very complex rules.
> 
> As far as I've understood it, the OMeta implementations for Ruby and
> JavaScript allow all kinds of objects. Maybe I'm wrong about that,
> though. It's possible that the actual arguments to a rule with
> parameters are actually host expressions, so maybe my example above
> won't actually work so well.

Could be.  The grammar says all arguments are hostExprs.

>> 2.5) It's easier (potentially) to make concurrent.  All of the parsing state 
>> is stored on the stack, so we can do 'or' rules in separate threads if we 
>> want and see if any of them are matched.  This is really trivial to do with 
>> the existing futures code in EtoileThread; memoise the parsing expressions, 
>> send some of them an inNewThread message, and then replace the or: rule with 
>> one that tries to match them all but doesn't test the results until after 
>> sending the messages.  
> 
> I understand the idea. Do you think that will really work? At which AST
> depth does it pay off to do multithreading at the top-level 'or'? How
> does the 'or' method estimate the expected remaining AST depth? It
> sounds very unlikely to me that this will give good results, so I don't
> think it's worth planning for that scenario.

Well, since we have a JIT compiler, we can find that out by run-time profiling. 
 If an or: expression usually matches a result other than the first one then it 
makes sense (on a multicore system) to do them in parallel and discard the 
unneeded ones.  

>> My gut feeling is that we should actually keep most of the existing PEG code 
>> (especially now I've got it working mostly how I want it to be working ;-) 
>> but remove the combination rules and just inline them into rules.  Then it 
>> becomes easier for the rules to collect the actions.  Most importantly, we 
>> remove the OMCapturingExpression, and make each rule do the capturing 
>> itself, so that each rule has its own capture namespace.
> 
> I'm still a big fan of the Ruby implementation. It's just very simple
> and straightforward.

Well, simple and straightforward is nice...

> Rule methods return parsing results, host-language
> variables are used for the variables, errors are reported as exceptions.

Reporting errors as exceptions is fine for Ruby, because exceptions are 
implemented as multiple return values, so they're no more expensive than 
performing a test in the caller.  For us, exceptions are very expensive (they 
are designed to be used in exceptional circumstances, oddly enough) while parse 
failures are incredibly common (that's how backtracking works).  That means 
that using exceptions will be very slow.

> No need for a dictionary, no need for the OMParseResult class, only one
> method per rule, the rule methods are easily readable.

The rule methods are written in OMeta, so they're equally readable however 
they're implemented.  We're going to be emitting native code directly from the 
rules, so you'll never actually read them as Smalltalk code.

> Sometimes,
> closures (usually Ruby blocks) are used, but they're only passed downwards
> the stack. The "official" Ruby OMeta implementation linked on the OMeta
> web page has just about 400 LOC (Ruby-translated Metagrammar for
> bootstrapping not counted).

Well, we're currently at 1400 lines of code, although that's including some 
convenience methods for making OpenStep objects look the same as each other.

David

--
This email complies with ISO 3103


_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] [Etoile-cvs] r5857 - in /branches/guenther/ObjMeta/OMTest.tool/Resources: OMGrammar.st OMTest.st

Reply via email to