[il-antlr-interest: 33283] [antlr-interest] on "crap" grammars

Vlad Thu, 21 Jul 2011 08:42:41 -0700

This test grammar was called "crap" by Jim Idle. I am willing to eat the humble 
pie and admit where I am an ANTLR novice or don't know something about 
grammars, but I am just not seeing it in this simple case:


grammar testerrors;

options
{
    language='C';
}

NAME    :   ( 'a'..'z' | 'A'..'Z' | '0'..'9' )+ ;
WS      :   ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN; } ;

parse:
    decl ( options { greedy = true; }: ',' decl )* ','? EOF
    ;

decl:
    NAME ':' type
    ;

type:
    'int' | 'float'
    ; 

The start symbol is a comma-delimited list of simple '<name> : <type>' 
declarations and allows the list to optionally end in a comma as is done in 
some languages (Python, etc). This is a pretty common way to structure it. In 
JavaCC, for example, you'd use a local LOOKAHEAD(2) inside the ()* to 
disambiguate the choice between matching one more decl or ending the list. 
Without it and with the default k=1, JavaCC emits an ambiguity warning at 
parser generation time. In ANTLR case, the ambiguity can be dealt with 
similarly, with a local k=2 option or the way done above (which I borrowed from 
http://www.antlr.org/grammar/1200715779785/Python.g). Without either, ANTLR 
also emits a warning at parser generation time. All of this seems to work as 
expected.

So, what is so obviously wrong with the grammar snippet that deserves the 
"crap" moniker? I am learning ANTLR because I want to add a multi-target parser 
generator tool to my skill set. For Java work, JavaCC is still out there and 
generates fast parsers, has good error handling, and can build ASTs/visitors. 
In C++, I would normally do a simple case like this via boost.spirit but it's a 
bit of a template metaprogramming monster. With ANTLR I am successfully 
compiling my C parser within a larger C++ codebase and the only learning curve 
issues are odd error messages on relatively trivial input errors, where ANTLR 
can't seem to identify the token it is expecting. E.g., input "name : bad" 
results in

-memory-(1)  : error 10 : Unexpected token, at offset 6
    near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0> Line: 1 
LinePos:6]
     : Missing <invalid>

I would be happy to get specific pointers to docs and articles on how to 
improve error handling by ANTLR *C* parsers. At least being able to modify the 
stock error display function to tackle the common case of mis-spelling a token 
name would be great.

Thank you,
Vlad


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 33283] [antlr-interest] on "crap" grammars

Reply via email to