How many commands? One approach that I just thought of and have never tested is to have fixed length variables, prioritized versus the commands of that length.
Off the top of my head: var2 ~ <varchar><varchar> priority=>1 PA_command ~ 'PA' priority=>2 PR_command ~ 'PR' priority=>2 include a catch-all var for lengths not specifically accounted for var_catch ~ <varchar>+ priority=>0 <var_catch> will always lose to lexemes of the same length, but will catch those variables whose length is not the same as any command. A little hack-ish, but should be very fast, and perhaps no more hack-ish than alternatives. Again, not tested, so you'll be a pioneer! If you try it, let me know! Hope this helps, jeffrey On Wed, Dec 2, 2020 at 5:42 PM Dean S <[email protected]> wrote: > > Thank you very much for your response! > > 1) Multiline: The language does have a "_" line-joining character, but the > grammar wouldn't have to support that - it could be done with a trivial > preprocessor. Once joined, commands may not span multiple lines. > > 2) Command/variable upper-case: Commands are always upper case, but there > are no case restrictions on variables. > > > So it sounds, however, like there isn't a straight-forward grammar or > option tweak. That's ok. The language has fancy expressions (algebraic > expressions, function calls, strings, comments, and arrays), but its > statement structure is extremely simplistic. The terminators (newline and > semi-colon) are not allowed anywhere except as terminators (no escapes, not > in strings, not in comments). So, as a practical solution, I should be able > to dumb-split a program on terminators, look at the first characters of > each statement, strip off the command or variable assignment part and parse > the rest as an expression - which follows more reasonable rules that the > LATM will like. > > So, I guess this falls to the "handled in easier faster ways" approach > which I guess should have been obvious but I failed to think of. > > Thank you for your time, and a great library! > > - Dean > > > On 12/2/20 4:28 PM, Jeffrey Kegler wrote: > > I'll first describe your immediate problem, then ask a couple Q's. > > > > The problem: Lexing is LATM -- *Longest* Acceptable Token Matching. The > lexeme priority is a tie breaker, used when tokens are the same length. > When your grammar fails, "PAx" is your longest token, and the only choice > at length 3. "PA" is only 2 chars long, and lexemes of different length > are not compared for priority. > > > > (Btw the reason for this is, as implemented, lexeme priorities can be > (and are) tested in a few machine instructions. If Marpa needed to look at > earlier possibilities, the logic gets vastly very complex, efficiency goes > out the window, and you get into the territory when the grammar can often > be handled in easier faster ways.) > > > > Now the questions: > > > > 1.) I notice statements cannot be multiline. Is that the intent going > forward? > > > > 2.) In the example, commands always begin with a capital letter, > variables never do. Will that continue to be the case? (If so, it points > to an easy, fast solution.) > > > > Possible solutions, depending, include finding something that > distinguishes commands from variables in the lexer; custom lexers; using > events to guide custom lexing; and character-by-character lexing, whereby > you handle your own whitespace. > > > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/marpa-parser/270ecda3-6917-f717-593b-051ded20629d%40gmail.com > . > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/marpa-parser/CA%2B2Wrv-AVsKwhAp7-aVwe96b_-JtrZucu010YmJEFgSO2NrViA%40mail.gmail.com.
