Rémi,
of course there are hacks & tricks to parse such a language.
My main point is: it's outside any established concepts and technology.
And: if you happen to maintain a tool where recovery after syntax errors
and code completion and tons of other features are based on a given parser
generator, any deviation from such established concepts really hurts.
We are not in the position to exchange or rewrite the parser generator,
we have to code around it, which inevitably is a messy endeavor.
We're doing it, don't worry.
It would just be fair to spell out what kind of grammar this is.
I don't have a name for it.
The best description I could come up with so far is:
"restricted keywords are keywords when it helps to interpret them as keywords".
Stephan
On 04.05.2017 00:06, fo...@univ-mlv.fr wrote:
----- Mail original -----
De: "Stephan Herrmann" <stephan.herrm...@berlin.de>
À: jigsaw-dev@openjdk.java.net, "Remi Forax" <fo...@univ-mlv.fr>, "Alex Buckley"
<alex.buck...@oracle.com>
Cc: "Brian Goetz" <brian.go...@oracle.com>, "Dan Smith"
<daniel.sm...@oracle.com>
Envoyé: Mercredi 3 Mai 2017 23:31:14
Objet: Re: Java Platform Module System
On 03.05.2017 20:55, Remi Forax wrote:
It's context-free because a context free grammar defined its input in term of
terminals and the theory do not say how to map a token to a terminal.
Jay is right that it requires to use either some specific parser generator
like Tatoo [1] the one i've written 10 years ago (because i wanted the tool to
help me to extend a grammar easily) or to modify an existing parser generator so
the parser can send the production state to the lexer which will enable/disable
the automata that recognize the associated keywords .
Just feeding parser state into the Lexer doesn't cut it for Java 9,
because the classification keyword / identifier cannot be made at
the time when the stream passes the Lexer.
No, it's done between the lexer and the parser
Let me remind you of this example:
module foo { exports transitive
How should the poor lexer recognize in this situation that transitive
is an identifier (sic) (if you complete the text accordingly)?
There is a simple solution, consider module, requires, etc as keyword in the
lexer, and when the keyword is sent to the parser, downgrade it to an
identifier if you are not at the right dotted production.
It's easy to implement if your lexer/parser is non blocking, i.e if you push a
bytebuffer to the lexer and the lexer push terminals to the parser.
the other solution used by Tatoo is to instead of having one giant automata
that recognize all tokens, works with a list automata that recognized each
token and activate them or not depending on the parser state.
Aside from specific heuristics
(which are not available to any parser generator),
any but one :)
we only know about this classification after the parser has matched
an entire declaration.
so i suppose your parser is LR, you know the classification from the dotted
production just before the terminal is about to be recognized.
when you construct the LR table, you know for each dotted production what are
the terminals that can appear so the parser generator can keep these info in a
side table and during the parsing, from the parser state, find which terminals
can be recognized.
I'm not even sure that theory has a name for this kind of grammar.
Maybe we should speak of a constraint solver rather than a parser.
no need to have a constraint solver here, you need to export the terminals that
will lead to a shift or a reduce for any LR states.
Stephan
Rémi
Rémi
[1] http://dl.acm.org/citation.cfm?id=1168057
----- Mail original -----
De: "Alex Buckley" <alex.buck...@oracle.com>
À: "Jayaprakash Arthanareeswaran" <jarth...@in.ibm.com>, "Dan Smith"
<daniel.sm...@oracle.com>, "Brian Goetz"
<brian.go...@oracle.com>
Cc: jigsaw-dev@openjdk.java.net
Envoyé: Mercredi 3 Mai 2017 19:46:54
Objet: Re: Java Platform Module System
On 5/2/2017 3:39 PM, Alex Buckley wrote:
On 5/2/2017 7:07 AM, Jayaprakash Arthanareeswaran wrote:
Chapter 2 in [1] describes context-free grammars. The addition to "3.9
Keywords" defines "restricted keywords", which prevent the grammar for
ModuleDeclaration from being context-free. This prevents compilers from
using common parser generators, since those typically only support
context-free grammars. The lexical/syntactic grammar split defined in
chapter 2 is not of much use for actual implementations of
module-info.java parsers.
The spec at least needs to point out that the given grammar for
ModuleDeclaration is not actually context-free.
The syntactic grammar in JLS8 was not context-free either; the opening
line of Chapter 2 has been false for years. For JLS9, I will remove the
claim that the lexical and syntactic grammars are context-free, and
perhaps a future JLS can discuss the difficulties in parsing the
Jan Lahoda pointed out privately that the syntactic grammar in JLS8 and
JLS9 is in fact context-free -- it's just not LL(1). Not being LL(1) is
what I should have said the grammar hasn't been for a long time.
Alex