Re: Java 8 Parser

Jeffrey Kegler Thu, 13 Oct 2016 09:25:35 -0700

Javascript is not Java I know, but Jean-Damien Durand has written several
full language parsers, including ECMAScript:
https://github.com/jddurand/MarpaX-Languages-ECMAScript-AST


On Thu, Oct 13, 2016 at 8:00 AM, Harry <[email protected]> wrote:

> Hello,
>
> I'm very new to Marpa but, from its description, it looks extremely
> awesome.
>
> I'm also done playing with the beginner's example of the expression
> calculator; was also able to make small changes to it. So far, so good.
>
> However, now, I'm trying to write a Java 8 Parser using the grammar
> published here:
>     https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
>
> While I think I'm able to map the above Oracle grammar spec to the G1
> rules (if I stub out some of the lexemes referenced the G1 rules) and
> create an instance of Marpa::R2::Scanless::G, I'm having a hard time
> writing the L0 lexer rules in SLIF for the Lexer grammar
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some
> issues that I will need to (but don't know how to) deal with are:
>
> 1. Keyword vs Identifier:
>
>   The Java spec defines Identifier
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8>
> thus:
> Identifier:
> IdentifierChars
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
>  but not a Keyword
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword>
>  or BooleanLiteral
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
>  or NullLiteral
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
> IdentifierChars:
> JavaLetter
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
>  {JavaLetterOrDigit
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
> }
> JavaLetter:
> any Unicode character that is a "Java letter"
> JavaLetterOrDigit:
> any Unicode character that is a "Java letter-or-digit"
>   *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral"
> part? In Perl regex, one could do a negative lookahead assertion like so...
>
> if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x
> ) {
>     # this is an Identifier
> }
>
>
> ... but only if Marpa allowed such a rich, Perl regex syntax. Which it
> doesn't, apparently, in SLIF.
>
> 2. Comment (single- and multi-line versions)
> I could write a bunch of G1 rules to handle the multi-line Java comment,
> but I'm seeing it becoming very verbose. Is there an easier way to handle
> stuff like this in SLIF?
>
> 3. Since Marpa is Perl-based, is it possible to tap the full power of Perl
> regex engine, especially for lexing?
>
> 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer
> grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>...
> that is written in BNF style instead of a 'flat', regex style. If I were to
> mechanically replicate the Lexer grammar using G1 rules (instead of L0
> rules), would it entail a performance and space overhead by creating
> unnecessary tree nodes for what would otherwise be a flat lexeme in
> bison/flex?
>
> 5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8,
> or should I abandon it in favor of a custom / external lexer?
>
>
> Regards,
> /Harry
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

Reply via email to