Re: Java 8 Parser

Durand Jean-Damien Mon, 01 May 2017 11:45:07 -0700

Hello,

With Marpa::R2 it is possible to do exclusions at the lexeme level using 
user-defined character classes.


Such an implementation was used in ECMAScript as mentionned indeed by 
Jeffrey, c.f; 
https://github.com/jddurand/MarpaX-Languages-ECMAScript-AST/blob/master/lib/MarpaX/Languages/ECMAScript/AST/Grammar/CharacterClasses.pm
 
(which I admint is a bit hard to understand stand-alone without the grammar 
itself - but these are the lexeme implementation with... exclusions).
For example:

sub IsSourceCharacterButNotStarOrLineTerminator { return <<END;
+MarpaX::Languages::ECMAScript::AST::Grammar::CharacterClasses::IsSourceCharacter
-MarpaX::Languages::ECMAScript::AST::Grammar::CharacterClasses::IsStar
-MarpaX::Languages::ECMAScript::AST::Grammar::CharacterClasses::IsLineTerminator
END
}







Regards, Jean-Damien.

Le jeudi 13 octobre 2016 17:00:29 UTC+2, Harry a écrit :
>
> Hello,
>
> I'm very new to Marpa but, from its description, it looks extremely 
> awesome. 
>
> I'm also done playing with the beginner's example of the expression 
> calculator; was also able to make small changes to it. So far, so good.
>
> However, now, I'm trying to write a Java 8 Parser using the grammar 
> published here:
>     https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
>
> While I think I'm able to map the above Oracle grammar spec to the G1 
> rules (if I stub out some of the lexemes referenced the G1 rules) and 
> create an instance of Marpa::R2::Scanless::G, I'm having a hard time 
> writing the L0 lexer rules in SLIF for the Lexer grammar 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some 
> issues that I will need to (but don't know how to) deal with are:
>
> 1. Keyword vs Identifier: 
>
>   The Java spec defines Identifier 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8> 
> thus:
> Identifier:
> IdentifierChars 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
>  but not a Keyword 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword>
>  or BooleanLiteral 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
>  or NullLiteral 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
> IdentifierChars:
> JavaLetter 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
>  {JavaLetterOrDigit 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
> }
> JavaLetter:
> any Unicode character that is a "Java letter"
> JavaLetterOrDigit:
> any Unicode character that is a "Java letter-or-digit"
>   *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral" 
> part? In Perl regex, one could do a negative lookahead assertion like so...
>     
> if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x
> ) {
>     # this is an Identifier
> }
>
>
> ... but only if Marpa allowed such a rich, Perl regex syntax. Which it 
> doesn't, apparently, in SLIF.
>
> 2. Comment (single- and multi-line versions)
> I could write a bunch of G1 rules to handle the multi-line Java comment, 
> but I'm seeing it becoming very verbose. Is there an easier way to handle 
> stuff like this in SLIF?
>  
> 3. Since Marpa is Perl-based, is it possible to tap the full power of Perl 
> regex engine, especially for lexing? 
>
> 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer 
> grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>... 
> that is written in BNF style instead of a 'flat', regex style. If I were to 
> mechanically replicate the Lexer grammar using G1 rules (instead of L0 
> rules), would it entail a performance and space overhead by creating 
> unnecessary tree nodes for what would otherwise be a flat lexeme in 
> bison/flex?
>
> 5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8, 
> or should I abandon it in favor of a custom / external lexer?
>
>
> Regards,
> /Harry
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

Reply via email to