Java 8 Parser

Harry Thu, 13 Oct 2016 08:01:04 -0700

Hello,

I'm very new to Marpa but, from its description, it looks extremely 
awesome.


I'm also done playing with the beginner's example of the expression 
calculator; was also able to make small changes to it. So far, so good.

However, now, I'm trying to write a Java 8 Parser using the grammar 
published here:
    https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html

While I think I'm able to map the above Oracle grammar spec to the G1 rules 
(if I stub out some of the lexemes referenced the G1 rules) and create an 
instance of Marpa::R2::Scanless::G, I'm having a hard time writing the L0 
lexer rules in SLIF for the Lexer grammar 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some 
issues that I will need to (but don't know how to) deal with are:

1. Keyword vs Identifier: 

  The Java spec defines Identifier 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8> thus:
Identifier:
IdentifierChars 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
 but not a Keyword 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword> 
or BooleanLiteral 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
 or NullLiteral 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
IdentifierChars:
JavaLetter 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
 {JavaLetterOrDigit 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
}
JavaLetter:
any Unicode character that is a "Java letter"
JavaLetterOrDigit:
any Unicode character that is a "Java letter-or-digit"
  *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral" 
part? In Perl regex, one could do a negative lookahead assertion like so...
    
if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x) 
{
    # this is an Identifier
}


... but only if Marpa allowed such a rich, Perl regex syntax. Which it 
doesn't, apparently, in SLIF.

2. Comment (single- and multi-line versions)
I could write a bunch of G1 rules to handle the multi-line Java comment, 
but I'm seeing it becoming very verbose. Is there an easier way to handle 
stuff like this in SLIF?
 
3. Since Marpa is Perl-based, is it possible to tap the full power of Perl 
regex engine, especially for lexing? 

4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer 
grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>... 
that is written in BNF style instead of a 'flat', regex style. If I were to 
mechanically replicate the Lexer grammar using G1 rules (instead of L0 
rules), would it entail a performance and space overhead by creating 
unnecessary tree nodes for what would otherwise be a flat lexeme in 
bison/flex?

5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8, 
or should I abandon it in favor of a custom / external lexer?


Regards,
/Harry

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Java 8 Parser

Reply via email to