Brilliant! That was it, thank you, Jim! And much cleaner, too. Karim
---------- Forwarded message ---------- From: Jim Idle <[email protected]> Date: Tue, Jun 8, 2010 at 12:41 PM Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char position on error To: "[email protected] interest" <[email protected]> You need to change your var rule I think. Try this: var_id : ID (DOT^ ID)*; This is properly left factored and will also produce a tree that is much easier to resolve in DOT notation. Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Karim Chichakly > Sent: Tuesday, June 08, 2010 8:41 AM > To: [email protected] interest > Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char > position on error > > Hi Jim, > > The semantic predicate was a red herring. First of all, no viable alt > exception is reliably giving me a token, which makes sense to me. > > Secondly, changing the grammar is what leads to my problem and I think > I > know why, though I don't know how to get around it. If my grammar > appears > as follows: > > var : var_id > | var_id LEFT_PAREN args RIGHT_PAREN -> ^(CALL var_id > args) > ; > > and I enter "a ab", the token I get from no viable alt exception is, in > fact, "ab", as I expect. If I rewrite the grammar as follows: > > var : var_id > ( LEFT_PAREN args RIGHT_PAREN -> ^(CALL var_id args) > | -> var_id > ) > ; > > and I enter "a ab", the token I get from no viable alt exception is "". > Indeed, token->start == token->stop == parser->input_start. It is not > as > surprising to me that token->start == token->stop as that they equal > parser->input_start. In thinking about this, it seems the parser get > the > next token, "ab", which successfully matches var_id. It then has to > get > another token (consuming "ab") to distinguish between the cases > presented > before it. We are at the end of input, so, of course, the next token > it > gets is "", but I would have expected there to be a difference between > token->start and parser->input_start of 4. > > Tracing through the generated code, I see a different picture. I > thought > the problem was that it consumed the token. The problem is really that > it > never consumed any tokens at all. That is why token->start == > parser->input_start. My full definition of var_id is: > > var_id : ID > | ID (DOT ID)+ -> ^(DOT ID+) > | DOT^ ID > ; > > The code generated for this switches on LA(1) to test for ID or DOT. > Within > case ID, it then switches on LA(2) to decide between the first two > alternatives. Here it fails because LA(2) is also ID and it throws a > no > viable alt exception. Unfortunately, since it made this decision based > on > LA(1) and LA(2), the actual token that caused the problem has not been > identified; instead everything still points to the start of the input. > Is > there any way to recover this information? > > By the way, for the [first] grammar that properly returns the erroneous > token, cdfa18.predict is called instead, which does consume tokens. > The > first grammar, however, is unable to disambiguate the cases I have to > parse > ("X" as a 0-argument function call vs. "X" as a variable). > > Thanks again, > > Karim > > > ---------- Forwarded message ---------- > From: Karim Chichakly <[email protected]> > Date: Mon, Jun 7, 2010 at 2:46 PM > Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char > position on error > To: "[email protected] interest" <[email protected]> > > > Hi Jim, > > Thank you! I did not realize you could write a rule like that in > ANTLR. > > Re: No viable alt exception: I can see that the parser has no idea > about > what kind of token it is, but didn't the lexer pull a token off? [If > not, > what is the parser trying to match?] Where is that token? I am > guessing > that this will be moot after I change the grammar as you suggest since > I was > getting that token (with the same error) before I put the leading > predicate > in. > > Thanks, > > Karim > > > ---------- Forwarded message ---------- > From: Jim Idle <[email protected]> > Date: Mon, Jun 7, 2010 at 2:12 PM > Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char > position on error > To: "[email protected] interest" <[email protected]> > > > With no viable alt, there is no token to inspect as there was no token > missing etc. You can use the bitmap of expected tokens to say what > tokens > could be there at that point. Hence there is no token in the exception > as > there is no specific token that is in error. At least off the top of my > head > that is the case. > > You are approaching the problem from the wrong end: > > varorFunc > : i=IDENT > ( > LPAREN fa=funcArgs? RPAREN > { you could issue an error here if $i is not a function or > wait > until the tree walk } > -> ^(FUNCTION $i $fa?) > > | -> {isFunction($i)}? -> ^(FUNCTION $i) > -> $i > ) > ; > > You can get an IDENT with or without function parameters and the syntax > (which is what your parser is concerned with) is always valid. Later > you can > verify if the names that were used were valid functions and issue a > much > nicer message than the parser could generate alone. > > Jim > > > > > -----Original Message----- > > From: [email protected] [mailto:antlr-interest- > > [email protected]] On Behalf Of Karim Chichakly > > Sent: Monday, June 07, 2010 10:46 AM > > To: [email protected] interest > > Subject: [antlr-interest] Fwd: Semantic predicate losing token/char > > position on error > > > > Hi Jim, > > > > Thank you. I am sorry, but I completely missed that on the support > > page. > > > > I understand your point (and thanks for the tip about > pANTLR3_STRING), > > but in your example, what is funcCall? In my full grammar, I also > have > > a branch that looks for var_id(args), so perhaps funcCall : (args)? > > However, the problem I have is that the grammar I am parsing allows > an > > identifier by itself (i.e., no distiguishing syntactical features, > such > > as parens) to represent either a variable or a zero-argument function > > call. All function names are reserved, so I can distinguish zero- > > argument function calls from variables via a symbol table lookup. > > > > In the spirit of what you are saying, I think would have to pass the > > var_ids through as var_ids and then do the lookup in a follow-on pass > > that modifies the AST as needed. Is this really the best way, i.e., > to > > add another pass? > > > > I enclose my nascent error handler. As you can see, I am trying to > > supply uniform behavior rather than do different things based on the > > specific error (all I want is a clear indication of what went wrong > and > > the position where it went wrong). Perhaps this is folly. The error > > in this case was ANTLR3_NO_VIABLE_ALT_EXCEPTION. > > > > Thanks again, > > > > Karim > > > > > > ---------- Forwarded message ---------- > > From: Jim Idle <[email protected]> > > Date: Mon, Jun 7, 2010 at 1:02 PM > > Subject: Re: [antlr-interest] Semantic predicate losing token/char > > position on error > > To: "[email protected] interest" <[email protected]> > > > > > > > > > > > -----Original Message----- > > > From: [email protected] [mailto:antlr-interest- > > > [email protected]] On Behalf Of Karim Chichakly > > > Sent: Monday, June 07, 2010 8:44 AM > > > To: [email protected] interest > > > Subject: [antlr-interest] Semantic predicate losing token/char > > > position on error > > > > > > Hi, > > > > > > Thank you again for your previous help. I now know about > > > antlr.markmail.org(perhaps a link from www.antlr.org would help > > > others) > > > > You mean like the one on the support page with a box that you can > type > > your search terms in and a logo saying "Mark mail"? ;-) > > > > > > > If, however, I add a semantic predicate to that grammar (enclosed) > to > > > distinguish between X as a function call and X as a variable (which > > is > > > described starting on page 297 of the Definitive ANTLR Reference), > I > > > no longer get a character position. All four of the variables > > > involved in the position calculation are set to 1, and the start > and > > > stop then become zero. > > > These values are, by the way, a bit peculiar as these fields > usually > > > hold pointers into the text. I also note that token->input is now > > > NULL. > > > > Well, though this might be shown as an example in the book it isn't > > really the way to do things. You are trying to make a semantic > > distinction via syntax rules and that is always going to give you a > > headache. You should parse as: > > > > var_id: > > ( funcCall -> ^(FUNCTION var_id funcCall) > > | -> var_id > > ) > > ; > > > > Then check to see if the function construct really was a function > when > > you walk the tree in a verification pass. > > > > I need to see your error reporting function to help you more on the > > display stuff. It is likely that you are trying to use elements that > > are not valid for the type of error you are being passed. Not all > > elements are available for all errors. > > > > Finally, do not use the pANTLR3_STRING stuff unless your grammar is > > just a small single-shot parse as you will create a new string every > > time you run that predicate! Call a function, use LT() to get the > next > > token, then use the pointers in the token directly. You will use no > > memory that way! > > > > Jim > > > > > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: > > http://www.antlr.org/mailman/options/antlr-interest/your-email- > address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
