Javascript is not Java I know, but Jean-Damien Durand has written several full language parsers, including ECMAScript: https://github.com/jddurand/MarpaX-Languages-ECMAScript-AST
On Thu, Oct 13, 2016 at 8:00 AM, Harry <[email protected]> wrote: > Hello, > > I'm very new to Marpa but, from its description, it looks extremely > awesome. > > I'm also done playing with the beginner's example of the expression > calculator; was also able to make small changes to it. So far, so good. > > However, now, I'm trying to write a Java 8 Parser using the grammar > published here: > https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html > > While I think I'm able to map the above Oracle grammar spec to the G1 > rules (if I stub out some of the lexemes referenced the G1 rules) and > create an instance of Marpa::R2::Scanless::G, I'm having a hard time > writing the L0 lexer rules in SLIF for the Lexer grammar > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>. Some > issues that I will need to (but don't know how to) deal with are: > > 1. Keyword vs Identifier: > > The Java spec defines Identifier > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8> > thus: > Identifier: > IdentifierChars > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars> > but not a Keyword > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword> > or BooleanLiteral > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral> > or NullLiteral > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral> > IdentifierChars: > JavaLetter > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter> > {JavaLetterOrDigit > <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit> > } > JavaLetter: > any Unicode character that is a "Java letter" > JavaLetterOrDigit: > any Unicode character that is a "Java letter-or-digit" > *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral" > part? In Perl regex, one could do a negative lookahead assertion like so... > > if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x > ) { > # this is an Identifier > } > > > ... but only if Marpa allowed such a rich, Perl regex syntax. Which it > doesn't, apparently, in SLIF. > > 2. Comment (single- and multi-line versions) > I could write a bunch of G1 rules to handle the multi-line Java comment, > but I'm seeing it becoming very verbose. Is there an easier way to handle > stuff like this in SLIF? > > 3. Since Marpa is Perl-based, is it possible to tap the full power of Perl > regex engine, especially for lexing? > > 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer > grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>... > that is written in BNF style instead of a 'flat', regex style. If I were to > mechanically replicate the Lexer grammar using G1 rules (instead of L0 > rules), would it entail a performance and space overhead by creating > unnecessary tree nodes for what would otherwise be a flat lexeme in > bison/flex? > > 5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8, > or should I abandon it in favor of a custom / external lexer? > > > Regards, > /Harry > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
