Xie, Linlin wrote:

Hi Jim,

 

Thanks for your reply. We finally figure out that large number of expecting is actually -1, which is EOF.

Yes - I figured as much.

I guess this would rule out the possibility of a bug in antlr, if we don’t speak of the appropriateness of the message. In the use case I mentioned in my last email, I would think start(Rule2), start(Rule3) and ; all should be the expected tokens, instead of EOF. Do you think if there is anything antlr can do to improve the error messages to make them more relevant? Or should I improve my grammar to get more appropriate error messages, and how?

You have to write your own message display routines that make sense with your grammar. The default ones do check for EOF though. Your issue is that because all the things leading up to EOF are optional, ANTLR assumes that they are just not present:

Say start(rule2) is FOO and start(rule3) is BAR.

Then after rule1 it says:

No FOO is there, so go past Rule2, it isn't present
No BAR is there so go past Rule3, it isn't present

Now, what is the start set that can come next? Only EOF, so match EOF - oh it failed, so the expecting token is -1 for EOF.

However, if you do this:

: rule1
    (   rule2
          (
               rule3 EOF
             | EOF
          )
       | rule3 EOF
       | EOF
    )
;

Now, after rule1 has parsed, the followset will be FOO | BAR|EOF so you will get the error straight away. After rule2 is parsed, followset will be BAR|EOF so you will get the error straight away, after rule3, only EOF is viable.

 

Also I can see when the displayRecognitionError() checks the recognizer type, it only considers either parser or tree parser, why is lexer not considered here?

1) Lexers can only say: "Not expecting character 'y' here. and so antlr3lexer.c has its own handler. You should install your own handler remember?
2) If your lexer is throwing errors, then it is broken really. It should be coded to cope with anything one way or another. However, sometimes that is difficult of course. You need to make sure that your lexer rules can terminate just about anywhere, but throw your own (descriptive error) about any missing pieces. Then you have a final lexer rule:

ANY : . { SKIP(); log error about unknown character being ignored.

What this does is then move all your error handling up to the parser, where you have better context. Similarly, you should move any errors that you can out the parser and in to the tree parser, where once again you have better context. The classic example is trying to code the number of parameters that any particular function can take. Don;t do that, accept any, including 0, then check for validity in your first tree walk.

I can see that a lexer error is considered a No Via Alt parser exception, but there is still lexer error report from antlr, where can I find the lexer error report code? Or how can I intercept the lexer error like I do with the parser error report?

Intercept the same way, install your own displayRecognitionError, but make it say "Internal compiler error - lexer rules bad :-(  all your base belong to us"

Jim



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Reply via email to