Your specific questions, of the top of my head: 1.) You may want to look at lexeme priorities. If not, yes, external lexing may be what you need.
2.) There are several examples of ways to write multi-line comments. One is in the FAQ: http://savage.net.au/Perl-modules/html/marpa.faq/faq.html#q110 3.) Yes, but only via external lexing. 4.) Not sure this answers your question, but L0 rules allow full Marpa syntax. 5.) For a large language, this can be a very hard call. Note that you *can* switch back and forth -- you can use the SLIF for some lexemes, and use events to switch to external processing for others. Quick answers, but I hope they help, jeffrey On Thu, Oct 13, 2016 at 9:24 AM, Jeffrey Kegler < [email protected]> wrote: > Javascript is not Java I know, but Jean-Damien Durand has written several > full language parsers, including ECMAScript: https://github. > com/jddurand/MarpaX-Languages-ECMAScript-AST > > On Thu, Oct 13, 2016 at 8:00 AM, Harry <[email protected]> wrote: > >> Hello, >> >> I'm very new to Marpa but, from its description, it looks extremely >> awesome. >> >> I'm also done playing with the beginner's example of the expression >> calculator; was also able to make small changes to it. So far, so good. >> >> However, now, I'm trying to write a Java 8 Parser using the grammar >> published here: >> https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html >> >> While I think I'm able to map the above Oracle grammar spec to the G1 >> rules (if I stub out some of the lexemes referenced the G1 rules) and >> create an instance of Marpa::R2::Scanless::G, I'm having a hard time >> writing the L0 lexer rules in SLIF for the Lexer grammar >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>. Some >> issues that I will need to (but don't know how to) deal with are: >> >> 1. Keyword vs Identifier: >> >> The Java spec defines Identifier >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8> >> thus: >> Identifier: >> IdentifierChars >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars> >> but not a Keyword >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword> >> or BooleanLiteral >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral> >> or NullLiteral >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral> >> IdentifierChars: >> JavaLetter >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter> >> {JavaLetterOrDigit >> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit> >> } >> JavaLetter: >> any Unicode character that is a "Java letter" >> JavaLetterOrDigit: >> any Unicode character that is a "Java letter-or-digit" >> *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral" >> part? In Perl regex, one could do a negative lookahead assertion like so... >> >> if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars / >> x) { >> # this is an Identifier >> } >> >> >> ... but only if Marpa allowed such a rich, Perl regex syntax. Which it >> doesn't, apparently, in SLIF. >> >> 2. Comment (single- and multi-line versions) >> I could write a bunch of G1 rules to handle the multi-line Java comment, >> but I'm seeing it becoming very verbose. Is there an easier way to handle >> stuff like this in SLIF? >> >> 3. Since Marpa is Perl-based, is it possible to tap the full power of >> Perl regex engine, especially for lexing? >> >> 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer >> grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>... >> that is written in BNF style instead of a 'flat', regex style. If I were to >> mechanically replicate the Lexer grammar using G1 rules (instead of L0 >> rules), would it entail a performance and space overhead by creating >> unnecessary tree nodes for what would otherwise be a flat lexeme in >> bison/flex? >> >> 5. Would Marpa experts recommend using SLIF (internal scanner) for Java >> 8, or should I abandon it in favor of a custom / external lexer? >> >> >> Regards, >> /Harry >> >> -- >> You received this message because you are subscribed to the Google Groups >> "marpa parser" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
