Thanks for the input. Some responses: 1.) Rules of more complexity -- in fact arbitrary complexity -- will be allowed in Marpa::R3. (Marpa::R3 is how the Lua-based DSL will appear.) Often overlooked, is that with Marpa, unlike with regexes, a rule or a subrule is expected to be associated with a semantics. While the syntax of a regular expression can show the additional complexity quite conveniently, tagging all the subrules with a name and/or semantics clutters things up, making for diminishing returns. Nonetheless, this will be available in Marpa::R3.
2.) It could be available today if anyone was willing to write a wrapper around Marpa::R2. This is where the SLIF came from -- Andrew Rodland and Peter Stuifzand building on top of earlier versions of Marpa. They showed me that there was real interest, as well as how a user might solve some of the semantics and syntactic issues. Again, thanks! -- jeffrey On Fri, Aug 5, 2016 at 12:00 PM, <[email protected]> wrote: > I saw a post somewhere that talked about the next-gen Marpa DSL based in > Lua. In case it helps, I started with a PEG tool (in Ruby) called > Parslet. I couldn't get rid of the left recursion in my grammar, so I > abandoned the tool. But I quite liked the DSL. Following are some ideas > from Parslet that may be helpful for Marpa's next-gen DSL. > > [http://kschiess.github.io/parslet/get-started.html], take a look at the > "class MiniP", for example. > > 1. The DSL is executable Ruby code. You could probably do something > similar in Lua. Using an in-language DSL has some disadvantages, e.g. a > sequence requires an extra operator (">>") instead of just space > separated. However, I really liked that I could write "macros" as simple > Ruby methods. > > def comma_sep_list(atom) > atom >> ( str(',') >> atom ).repeat > end > def paren_enclosed(atom) > str('(') >> atom >> str(')') > end > > I used these in ~20% of my rules, and they significantly improved the > readability of my grammar. > > 2. Another advantage of the in-language DSL is that there was no > restriction on the complexity of rules. I'm struggling at the moment to > figure out by trial-and-error what I can put in a single SLIF rule. In > particular, I read Mr. Kegler's tutorial, the POD documentation, and two > more tutorials before someone mentioned that a repeat construct ("*") must > appear in a rule by itself. > > For example, consider a VHDL identifier. It must start with a letter, may > contain letters, underscores, and digits in the interior, and end with a > letter or number. And no two underscores in a row. (Yes, it's odd. I > didn't make it up.) Here is my best crack at this lexer rule in different > tools. > > Marpa SLIF (I haven't tried to run this yet, so it's probably not valid): > > id_first ~ [a-zA-Z] > alnum ~ [a-zA-Z0-9] > id_mid_p1 ~ '_' alnum | alnum > id_mid ~ id_mid_p1* > id_last ~ alnum > id ~ id_first | id_first id_mid id_last > > Parslet (essentially direct translation of EBNF from my book): > > rule(:id) { match['a-zA-Z'] >> ( str('_').maybe >> match['a-zA-Z0-9'] > ).repeat } > > Regex: > > /[a-zA-Z](_?[a-zA-Z0-9])*/ direct translation of EBNF, analogous to > Parslet > > I'm happy to try to clarify if desired. My point is that SLIF feels a > little like assembly language compared to Parslet or even regular > expressions where you can freely mix subexpressions and repetition. In > addition, I miss the '?' regex construct and regex grouping with > parentheses (both of which Parslet supports). > > Hopefully that's clear enough. Please understand that I'm not trying to > demean Marpa or SLIF at all. I'm simply providing first impressions of a > Marpa newbie -- my own personal preferences -- in case they might be useful > for the next-gen DSL. > > One of the things that attracted me to Marpa is the "batteries included" > parser algorithm. I don't have to refactor my grammar, eliminate epsilon > transitions and left recursion, etc. The Marpa algorithm is a fundamental > technical advance that allows efficient parsing of languages the way we > think of them -- the way I find a grammar in my textbook or language spec. > SLIF is a nice, lightweight, clear DSL. I just wish it were a little more > sophisticated. > > One last quick thought. Given that regexes are the "competition" and were > used heavily for lexing pre-SLIF, why not allow arbitrary regexes in the > lexing rules? Why limit their use to character classes? These questions > are rhetorical, of course. I assume there are good technical reasons, > perhaps to avoid bloating the SLIF grammar. My goal is simply to raise the > idea to implementors for reconsideration. > > Thanks for a great tool! > > - Ryan > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
