You can't use .* in the lexer, only . The . rule should be the last one in the lexer and is just used to catch any character tha you have not otherwise matched (usually indicates a spurious character).
Make sure that your lexer rules are not ambiguous - they must not overlap :-) Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Crocker Ron-QA1007 > Sent: Monday, February 15, 2010 12:05 PM > To: [email protected] > Subject: [antlr-interest] Newbie problem with line-oriented parsing > > Hi all - > > I'm new here, so be nice to me. Further, let me start by apologizing > for > such a verbose first message. > I have started porting a DSL, one that I've been supporting for 15+ > years, from lex/yacc based toolset (via a tool called MetaTool) to > ANTLR. > > I've been looking through the various materials available on the net > and > have a copy of The Definitive ANTLR Reference. As I started porting the > grammar (EBNF ish) I've run into something I don't know how to deal > with. Unfortunately I need to drag everyone through some background to > get to the question, however I can start with the grammar I'm > struggling > with and the immediate problem. > > <><><><> cut here - flowgen.g <><><><> > grammar flowgen; > > options { > language = Java; > } > > /* *********** */ > /* TRANSACTION */ > /* *********** */ > transaction: ( ((KEY_START DEFINE_k) => xdefine*) tran_name message+ > ); > > xdefine: KEY_START DEFINE_k ID_name NEW_LINE; > > tran_name: ~(KEY_START|NP_START|NEWLINE_) .* NEW_LINE; > > message: num1? from_name num2? to_name ((~(NP_START|WHITE|NEWLINE_)) > => > msg_name?) NEW_LINE; > > num1: FLOATnumber; > num2: FLOATnumber; > > from_name: COLUMN_name; > to_name: COLUMN_name; > > msg_name: MSG_name; > > // Tokens - keywords > DEFINE_k: 'DEFINE'; > > // Tokens - operators > fragment KEY_START: '$'; > fragment NP_START: '%'; > NEW_LINE: NEWLINE_; > > // Tokens - names and numbers > fragment NUMBER: '0'..'9'; > fragment UPPERCASE: 'A'..'Z'; > fragment VARBASE: UPPERCASE (UPPERCASE|NUMBER|'_')*; > fragment VARNAME: '$' VARBASE; > fragment WHITE: ' '|'\t'; > fragment NEWLINE_: '\n'|'\r'; > > FLOATnumber: NUMBER+ ('.' NUMBER+)?; > > ID_name: VARBASE; > VAR_name: VARNAME; > > COLUMN_name: ( (ALPHA|NUMBER) (ALPHA|NUMBER|'_'|'&'|'-')* > | VARNAME > ); > // name: <([A-Za-z0-9][A-Za-z0-9_&-]*)?(\$[A-Z][A-Z0-9_]*)*> > > WS: (WHITE|NEWLINE_)+ {skip();}; > NON_PRINTING_COMMENT: NP_START .* NEWLINE_ {skip();}; > > MSG_name: .*; > <><><><> end <><><><> > > When I run this through antlr I get the following error: > Grammar: src/flowgen.g > error(201): src/flowgen.g:57:12: The following alternatives can never > be > matched: 1 > |---> MSG_name: .*; > > 1 error > > BUILD FAIL > (this is compliments of antlrv3ide plugin for eclipse; similar results > occur with ANTLRworks) > > ************ BEGIN BACKGROUND ************ > This language, flowgen, is used to specify Message Sequence Charts. We > could be using ITU Z.120 for this, but since our local DSL predates > Z.120 we have some interest in maintaining this language. The flowgen > language is a simplified version of Z.120 in that the input language is > simple and direct, and using the flowgen tools one can create the > corresponding picture (and even the corresponding Z.120 input). [After > re-reading that, I'm not sure the background helps OTHER than to note > that it's an old DSL and there is a solid user base not interested in > moving to another DSL that is overly-complicated for the particular job > at hand.] > > The format of a flowgen input file is simple: The first non-commented > line is the "title" of the flow, and all subsequent lines represent > messages in the flow. Newline's separate the constructs. > > Here is an example flowgen input file: > > 1. % Here is a comment > 2. Simple flowgen flow > 3. % Show a message going from A to B to C and back. > 4. A B Message 1 > 5. # This is the first message in the sequence > 6. B C Message 2 > 7. # This is the next message > 8. C B > 9. % Note how the above message has no message name > 10. B A End > > And this is the output of "classic" flowgen. > > Simple flowgen flow Page: 1 > > A B C > | | | > | [1] Message 1| | > |o------------>| | > | | | > | This is the first message in the sequence > | | | > | | [2] Message 2| > | |o------------>| > | | | > | | This is the next message > | | | > | | [3] | > | |<------------o| > | | | > | [4] End | | > |<------------o| | > | | | > > Some notes: > Lines 1 and 9 are "comment" lines and are ignored. > > In this language, there are several constructs that map well to > grammar-based solutions. > * A title is the text associated with the first non-commented line > * A message is the pair (arrow,comment) where an arrow represents the > message moving from one place to another and a comment is optional text > used to describe something about the message. > * An arrow is the triple (from,to,message_text) where from and to are > required and represent column names (equivalent to IDs in other > pedagogic grammars), while message_text is optional and represents the > "name" of the message. > * A note is associated with an arrow and is a multi-line construct. > Each > of these lines begins with any number of '#' characters, but it is only > the text after the '#'s that comprise the note. > * A comment starts with the % character and continues to the end of the > line [akin to the C++/Java '//' operator] > * Blank lines are ignored, independent of context. > > ************ END BACKGROUND ************ > > Given this understanding, I created the grammar above. I'm not sure a) > what to do about the error, but more importantly, b) I'm much more > concerned about HOW to convince an ANTLR grammar to do what I want it > to > do. In comparison with the prior toolset, the LL vs. LR question > doesn't > bother me. However, the way MetaTool handled restrictions on the > lexical > space was to take advantage of lex's "start states". The flowgen > grammar > has become so complicated [I've only given a snapshot; it is much more > substantial] that we've broken lex and are about to break flex. Similar > problem with yacc/bison, hence the desire to migrate to something more > robust. > > Thanks for hearing me out and I look forward to your > recommendations/suggestions. > > Ron Crocker > Fellow of the Technical Staff > Motorola > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
