I saw a post somewhere that talked about the next-gen Marpa DSL based in Lua. In case it helps, I started with a PEG tool (in Ruby) called Parslet. I couldn't get rid of the left recursion in my grammar, so I abandoned the tool. But I quite liked the DSL. Following are some ideas from Parslet that may be helpful for Marpa's next-gen DSL.
[http://kschiess.github.io/parslet/get-started.html], take a look at the "class MiniP", for example. 1. The DSL is executable Ruby code. You could probably do something similar in Lua. Using an in-language DSL has some disadvantages, e.g. a sequence requires an extra operator (">>") instead of just space separated. However, I really liked that I could write "macros" as simple Ruby methods. def comma_sep_list(atom) atom >> ( str(',') >> atom ).repeat end def paren_enclosed(atom) str('(') >> atom >> str(')') end I used these in ~20% of my rules, and they significantly improved the readability of my grammar. 2. Another advantage of the in-language DSL is that there was no restriction on the complexity of rules. I'm struggling at the moment to figure out by trial-and-error what I can put in a single SLIF rule. In particular, I read Mr. Kegler's tutorial, the POD documentation, and two more tutorials before someone mentioned that a repeat construct ("*") must appear in a rule by itself. For example, consider a VHDL identifier. It must start with a letter, may contain letters, underscores, and digits in the interior, and end with a letter or number. And no two underscores in a row. (Yes, it's odd. I didn't make it up.) Here is my best crack at this lexer rule in different tools. Marpa SLIF (I haven't tried to run this yet, so it's probably not valid): id_first ~ [a-zA-Z] alnum ~ [a-zA-Z0-9] id_mid_p1 ~ '_' alnum | alnum id_mid ~ id_mid_p1* id_last ~ alnum id ~ id_first | id_first id_mid id_last Parslet (essentially direct translation of EBNF from my book): rule(:id) { match['a-zA-Z'] >> ( str('_').maybe >> match['a-zA-Z0-9'] ).repeat } Regex: /[a-zA-Z](_?[a-zA-Z0-9])*/ direct translation of EBNF, analogous to Parslet I'm happy to try to clarify if desired. My point is that SLIF feels a little like assembly language compared to Parslet or even regular expressions where you can freely mix subexpressions and repetition. In addition, I miss the '?' regex construct and regex grouping with parentheses (both of which Parslet supports). Hopefully that's clear enough. Please understand that I'm not trying to demean Marpa or SLIF at all. I'm simply providing first impressions of a Marpa newbie -- my own personal preferences -- in case they might be useful for the next-gen DSL. One of the things that attracted me to Marpa is the "batteries included" parser algorithm. I don't have to refactor my grammar, eliminate epsilon transitions and left recursion, etc. The Marpa algorithm is a fundamental technical advance that allows efficient parsing of languages the way we think of them -- the way I find a grammar in my textbook or language spec. SLIF is a nice, lightweight, clear DSL. I just wish it were a little more sophisticated. One last quick thought. Given that regexes are the "competition" and were used heavily for lexing pre-SLIF, why not allow arbitrary regexes in the lexing rules? Why limit their use to character classes? These questions are rhetorical, of course. I assume there are good technical reasons, perhaps to avoid bloating the SLIF grammar. My goal is simply to raise the idea to implementors for reconsideration. Thanks for a great tool! - Ryan -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
