Hello,
I'm very new to Marpa but, from its description, it looks extremely
awesome.
I'm also done playing with the beginner's example of the expression
calculator; was also able to make small changes to it. So far, so good.
However, now, I'm trying to write a Java 8 Parser using the grammar
published here:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
While I think I'm able to map the above Oracle grammar spec to the G1 rules
(if I stub out some of the lexemes referenced the G1 rules) and create an
instance of Marpa::R2::Scanless::G, I'm having a hard time writing the L0
lexer rules in SLIF for the Lexer grammar
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>. Some
issues that I will need to (but don't know how to) deal with are:
1. Keyword vs Identifier:
The Java spec defines Identifier
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8> thus:
Identifier:
IdentifierChars
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
but not a Keyword
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword>
or BooleanLiteral
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
or NullLiteral
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
IdentifierChars:
JavaLetter
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
{JavaLetterOrDigit
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
}
JavaLetter:
any Unicode character that is a "Java letter"
JavaLetterOrDigit:
any Unicode character that is a "Java letter-or-digit"
*So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral"
part? In Perl regex, one could do a negative lookahead assertion like so...
if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x)
{
# this is an Identifier
}
... but only if Marpa allowed such a rich, Perl regex syntax. Which it
doesn't, apparently, in SLIF.
2. Comment (single- and multi-line versions)
I could write a bunch of G1 rules to handle the multi-line Java comment,
but I'm seeing it becoming very verbose. Is there an easier way to handle
stuff like this in SLIF?
3. Since Marpa is Perl-based, is it possible to tap the full power of Perl
regex engine, especially for lexing?
4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer
grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>...
that is written in BNF style instead of a 'flat', regex style. If I were to
mechanically replicate the Lexer grammar using G1 rules (instead of L0
rules), would it entail a performance and space overhead by creating
unnecessary tree nodes for what would otherwise be a flat lexeme in
bison/flex?
5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8,
or should I abandon it in favor of a custom / external lexer?
Regards,
/Harry
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.