Your specific questions, of the top of my head:

1.) You may want to look at lexeme priorities.  If not, yes, external
lexing may be what you need.

2.) There are several examples of ways to write multi-line comments.  One
is in the FAQ:
http://savage.net.au/Perl-modules/html/marpa.faq/faq.html#q110

3.) Yes, but only via external lexing.

4.) Not sure this answers your question, but L0 rules allow full Marpa
syntax.

5.) For a large language, this can be a very hard call.  Note that you
*can* switch back and forth -- you can use the SLIF for some lexemes, and
use events to switch to external processing for others.

Quick answers, but I hope they help, jeffrey

On Thu, Oct 13, 2016 at 9:24 AM, Jeffrey Kegler <
[email protected]> wrote:

> Javascript is not Java I know, but Jean-Damien Durand has written several
> full language parsers, including ECMAScript: https://github.
> com/jddurand/MarpaX-Languages-ECMAScript-AST
>
> On Thu, Oct 13, 2016 at 8:00 AM, Harry <[email protected]> wrote:
>
>> Hello,
>>
>> I'm very new to Marpa but, from its description, it looks extremely
>> awesome.
>>
>> I'm also done playing with the beginner's example of the expression
>> calculator; was also able to make small changes to it. So far, so good.
>>
>> However, now, I'm trying to write a Java 8 Parser using the grammar
>> published here:
>>     https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
>>
>> While I think I'm able to map the above Oracle grammar spec to the G1
>> rules (if I stub out some of the lexemes referenced the G1 rules) and
>> create an instance of Marpa::R2::Scanless::G, I'm having a hard time
>> writing the L0 lexer rules in SLIF for the Lexer grammar
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some
>> issues that I will need to (but don't know how to) deal with are:
>>
>> 1. Keyword vs Identifier:
>>
>>   The Java spec defines Identifier
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8>
>> thus:
>> Identifier:
>> IdentifierChars
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
>>  but not a Keyword
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword>
>>  or BooleanLiteral
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
>>  or NullLiteral
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
>> IdentifierChars:
>> JavaLetter
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
>>  {JavaLetterOrDigit
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
>> }
>> JavaLetter:
>> any Unicode character that is a "Java letter"
>> JavaLetterOrDigit:
>> any Unicode character that is a "Java letter-or-digit"
>>   *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral"
>> part? In Perl regex, one could do a negative lookahead assertion like so...
>>
>> if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /
>> x) {
>>     # this is an Identifier
>> }
>>
>>
>> ... but only if Marpa allowed such a rich, Perl regex syntax. Which it
>> doesn't, apparently, in SLIF.
>>
>> 2. Comment (single- and multi-line versions)
>> I could write a bunch of G1 rules to handle the multi-line Java comment,
>> but I'm seeing it becoming very verbose. Is there an easier way to handle
>> stuff like this in SLIF?
>>
>> 3. Since Marpa is Perl-based, is it possible to tap the full power of
>> Perl regex engine, especially for lexing?
>>
>> 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer
>> grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>...
>> that is written in BNF style instead of a 'flat', regex style. If I were to
>> mechanically replicate the Lexer grammar using G1 rules (instead of L0
>> rules), would it entail a performance and space overhead by creating
>> unnecessary tree nodes for what would otherwise be a flat lexeme in
>> bison/flex?
>>
>> 5. Would Marpa experts recommend using SLIF (internal scanner) for Java
>> 8, or should I abandon it in favor of a custom / external lexer?
>>
>>
>> Regards,
>> /Harry
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to