Next-gen Marpa DSL

iobass16 Fri, 05 Aug 2016 12:00:59 -0700

I saw a post somewhere that talked about the next-gen Marpa DSL based in 
Lua.  In case it helps, I started with a PEG tool (in Ruby) called Parslet. 
 I couldn't get rid of the left recursion in my grammar, so I abandoned the 
tool.  But I quite liked the DSL.  Following are some ideas from Parslet 
that may be helpful for Marpa's next-gen DSL.


[http://kschiess.github.io/parslet/get-started.html], take a look at the 
"class MiniP", for example.

1.  The DSL is executable Ruby code.  You could probably do something 
similar in Lua.  Using an in-language DSL has some disadvantages, e.g. a 
sequence requires an extra operator (">>") instead of just space separated. 
 However, I really liked that I could write "macros" as simple Ruby methods.

  def comma_sep_list(atom)
    atom >> ( str(',') >> atom ).repeat
  end
  def paren_enclosed(atom)
    str('(') >> atom >> str(')')
  end

I used these in ~20% of my rules, and they significantly improved the 
readability of my grammar.

2.  Another advantage of the in-language DSL is that there was no 
restriction on the complexity of rules.  I'm struggling at the moment to 
figure out by trial-and-error what I can put in a single SLIF rule.  In 
particular, I read Mr. Kegler's tutorial, the POD documentation, and two 
more tutorials before someone mentioned that a repeat construct ("*") must 
appear in a rule by itself. 

For example, consider a VHDL identifier.  It must start with a letter, may 
contain letters, underscores, and digits in the interior, and end with a 
letter or number.  And no two underscores in a row.  (Yes, it's odd.  I 
didn't make it up.)  Here is my best crack at this lexer rule in different 
tools.

Marpa SLIF (I haven't tried to run this yet, so it's probably not valid):

  id_first ~ [a-zA-Z]
  alnum ~ [a-zA-Z0-9]
  id_mid_p1 ~ '_' alnum | alnum
  id_mid ~ id_mid_p1*
  id_last ~ alnum
  id ~ id_first | id_first id_mid id_last

Parslet (essentially direct translation of EBNF from my book):

  rule(:id) { match['a-zA-Z'] >> ( str('_').maybe >> match['a-zA-Z0-9'] 
).repeat }

Regex:

  /[a-zA-Z](_?[a-zA-Z0-9])*/    direct translation of EBNF, analogous to 
Parslet

I'm happy to try to clarify if desired.  My point is that SLIF feels a 
little like assembly language compared to Parslet or even regular 
expressions where you can freely mix subexpressions and repetition.  In 
addition, I miss the '?' regex construct and regex grouping with 
parentheses (both of which Parslet supports).

Hopefully that's clear enough.  Please understand that I'm not trying to 
demean Marpa or SLIF at all.  I'm simply providing first impressions of a 
Marpa newbie -- my own personal preferences -- in case they might be useful 
for the next-gen DSL.  

One of the things that attracted me to Marpa is the "batteries included" 
parser algorithm.  I don't have to refactor my grammar, eliminate epsilon 
transitions and left recursion, etc.  The Marpa algorithm is a fundamental 
technical advance that allows efficient parsing of languages the way we 
think of them -- the way I find a grammar in my textbook or language spec. 
 SLIF is a nice, lightweight, clear DSL.  I just wish it were a little more 
sophisticated.

One last quick thought.  Given that regexes are the "competition" and were 
used heavily for lexing pre-SLIF, why not allow arbitrary regexes in the 
lexing rules?  Why limit their use to character classes?  These questions 
are rhetorical, of course.  I assume there are good technical reasons, 
perhaps to avoid bloating the SLIF grammar.  My goal is simply to raise the 
idea to implementors for reconsideration.

Thanks for a great tool!

- Ryan

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Next-gen Marpa DSL

Reply via email to