On 02/04/2013 00:18, Brian Schott wrote:
I've pretty much finished up my work on the std.d.lexer module. I am
waiting for the review queue to make some progress on the other (three?)
modules being reviewed before starting a thread on it.

In the meantime I've started some work on an AST module for Phobos that
contains the data types necessary to build up a parser module so that we
can have a standard set of code build D dev tools off of. I decided to
work directly from the standard on dlang.org for this to make sure that
my module is correct and that the standard is actually correct.

I've seen several threads on this newsgroup complaining about the state
of the standard and unfortunately this will be another one.

1) Grammar defined in terms of things that aren't tokens. Take, for
example, PropertyDeclaration. It's defined as an "@" token followed
by... what? "safe"? It's not a real token. It's an identifier. You can't
parse this based on checking the token type. You have to check the type
and the value.

2) Grammar references rules that don't exist. UserDefinedAttribute is
defined in terms of CallExpression, but CallExpression doesn't exist
elsewhere in the grammar. BaseInterfaceList is defined in terms of
InterfaceClasses, but that rule is never defined.

3) Unnecessary rules. KeyExpression, ValueExpression,
ScopeBlockStatement, DeclarationStatement, ThenStatement, ElseStatement,
Test, Increment, Aggregate, LwrExpression, UprExpression, FirstExp,
LastExp, StructAllocator, StructDeallocator, EnumTag, EnumBaseType,
EmptyEnumBody, ConstraintExpression, MixinIdentifier, etc... are all
defined in terms of only one other rule.

I think that we need to be able to create a grammar description that:
* Fits in to a single file, so that a tool implementer does not need to
collect bits of the grammar from the various pages on dlang.org.
* Can be verified to be correct by an existing tool such as Bison,
Goldie, JavaCC, <your favorite here> with a small number of changes.
* Is part of the dmd/dlang repositories on github and gets updated every
time the language changes.

I'm willing to work on this if there's a good chance it will actually be
implemented. Thoughts?

Interesting thread. I've been working on a hand-written D parser (in Java, for the DDT IDE) and I too have found a slew of grammar spec issues. Some of them more serious than the ones you mentioned above. In same cases it's actually not clear, or downright wrong what the grammar spec says. For example, here's one off of my notes:

  void func(int foo() { } );

The spec says that is parsable (basically a function declaration in the parameter list), which makes no sense, and DMD doesn't accept. Some cases are a bit trickier, since it's not clear if the syntax should be accepted or not (sometimes they might make sense but not be allowed).

These issues make things a bit harder for tools development that require D language parsers. But the whole grammar spec is so messy, I've been unsure whether it's worth filling bug reports or not (would they be addressed?). There is also the problem that even if those issues are fixed now, the spec could very easily fall out of date in the future, unless we have some system to test the spec. Like you mentioned, ideally we would have a grammar spec for a grammar/PG tool so that correctness could more easily be verified. (it doesn't guarantee no spec bugs, but it makes it much harder for them to be there)


--
Bruno Medeiros - Software Engineer

Reply via email to