Re: Next-gen Marpa DSL

Jeffrey Kegler Tue, 09 Aug 2016 18:43:32 -0700

Thanks for the input.  Some responses:

1.) Rules of more complexity -- in fact arbitrary complexity -- will be
allowed in Marpa::R3.  (Marpa::R3 is how the Lua-based DSL will appear.)
 Often overlooked, is that with Marpa, unlike with regexes, a rule or a
subrule is expected to be associated with a semantics.  While the syntax of
a regular expression can show the additional complexity quite conveniently,
tagging all the subrules with a name and/or semantics clutters things up,
making for diminishing returns.  Nonetheless, this will be available in
Marpa::R3.


2.) It could be available today if anyone was willing to write a wrapper
around Marpa::R2.  This is where the SLIF came from -- Andrew Rodland and
Peter Stuifzand building on top of earlier versions of Marpa.  They showed
me that there was real interest, as well as how a user might solve some of
the semantics and syntactic issues.

Again, thanks! -- jeffrey

On Fri, Aug 5, 2016 at 12:00 PM, <[email protected]> wrote:

> I saw a post somewhere that talked about the next-gen Marpa DSL based in
> Lua.  In case it helps, I started with a PEG tool (in Ruby) called
> Parslet.  I couldn't get rid of the left recursion in my grammar, so I
> abandoned the tool.  But I quite liked the DSL.  Following are some ideas
> from Parslet that may be helpful for Marpa's next-gen DSL.
>
> [http://kschiess.github.io/parslet/get-started.html], take a look at the
> "class MiniP", for example.
>
> 1.  The DSL is executable Ruby code.  You could probably do something
> similar in Lua.  Using an in-language DSL has some disadvantages, e.g. a
> sequence requires an extra operator (">>") instead of just space
> separated.  However, I really liked that I could write "macros" as simple
> Ruby methods.
>
>   def comma_sep_list(atom)
>     atom >> ( str(',') >> atom ).repeat
>   end
>   def paren_enclosed(atom)
>     str('(') >> atom >> str(')')
>   end
>
> I used these in ~20% of my rules, and they significantly improved the
> readability of my grammar.
>
> 2.  Another advantage of the in-language DSL is that there was no
> restriction on the complexity of rules.  I'm struggling at the moment to
> figure out by trial-and-error what I can put in a single SLIF rule.  In
> particular, I read Mr. Kegler's tutorial, the POD documentation, and two
> more tutorials before someone mentioned that a repeat construct ("*") must
> appear in a rule by itself.
>
> For example, consider a VHDL identifier.  It must start with a letter, may
> contain letters, underscores, and digits in the interior, and end with a
> letter or number.  And no two underscores in a row.  (Yes, it's odd.  I
> didn't make it up.)  Here is my best crack at this lexer rule in different
> tools.
>
> Marpa SLIF (I haven't tried to run this yet, so it's probably not valid):
>
>   id_first ~ [a-zA-Z]
>   alnum ~ [a-zA-Z0-9]
>   id_mid_p1 ~ '_' alnum | alnum
>   id_mid ~ id_mid_p1*
>   id_last ~ alnum
>   id ~ id_first | id_first id_mid id_last
>
> Parslet (essentially direct translation of EBNF from my book):
>
>   rule(:id) { match['a-zA-Z'] >> ( str('_').maybe >> match['a-zA-Z0-9']
> ).repeat }
>
> Regex:
>
>   /[a-zA-Z](_?[a-zA-Z0-9])*/    direct translation of EBNF, analogous to
> Parslet
>
> I'm happy to try to clarify if desired.  My point is that SLIF feels a
> little like assembly language compared to Parslet or even regular
> expressions where you can freely mix subexpressions and repetition.  In
> addition, I miss the '?' regex construct and regex grouping with
> parentheses (both of which Parslet supports).
>
> Hopefully that's clear enough.  Please understand that I'm not trying to
> demean Marpa or SLIF at all.  I'm simply providing first impressions of a
> Marpa newbie -- my own personal preferences -- in case they might be useful
> for the next-gen DSL.
>
> One of the things that attracted me to Marpa is the "batteries included"
> parser algorithm.  I don't have to refactor my grammar, eliminate epsilon
> transitions and left recursion, etc.  The Marpa algorithm is a fundamental
> technical advance that allows efficient parsing of languages the way we
> think of them -- the way I find a grammar in my textbook or language spec.
> SLIF is a nice, lightweight, clear DSL.  I just wish it were a little more
> sophisticated.
>
> One last quick thought.  Given that regexes are the "competition" and were
> used heavily for lexing pre-SLIF, why not allow arbitrary regexes in the
> lexing rules?  Why limit their use to character classes?  These questions
> are rhetorical, of course.  I assume there are good technical reasons,
> perhaps to avoid bloating the SLIF grammar.  My goal is simply to raise the
> idea to implementors for reconsideration.
>
> Thanks for a great tool!
>
> - Ryan
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Next-gen Marpa DSL

Reply via email to