I'm not very familiar with ANTLR's error recovery mechanisms, but I suspect 
that the generated code for the 'expressions' rule looks for a character that 
it recognizes as the start of an 'expression' rule before it calls into the 
'expression' rule and when it doesn't find one in the second case, it exits out 
into the root rule, which then checks if the next token is EOF and fails.

But this is just speculation. Hopefully one of the more experienced ANTLRers 
can give you a better answer.

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Luchesar Cekov
Sent: June 30, 2010 1:35 PM
Cc: [email protected]
Subject: Re: [antlr-interest] Continue parsing after an error

Hi Gordon,

Thanks for the prompt response.
Adding OTHER as an alternative was what I tried to do in the beginning. 
Unfortunately my use case is a bit more complex. I have worked out a 
better example below.
In this example, the input string  [ax][kx][ax] is wrong (k is not 
allowed) but the grammar builds the full ast tree, so it recovers from 
the error - it would generate three expression nodes the second of which 
contains a ErrorCommonToken inside as per recoverFromMismatchedToken().
The string [ax]sax][ax] on the other end, generates only the first bit 
of the tree, till the error.  - it generares only one expression node.

I do not understand why I get this different behavior - the parser 
recovers if the error happens in the middle of a rule, but not if the 
error is at the beginning of a rule.

Is this a problem in my grammar or it is just the way ANTLR works?

Thanks,
Luchesar

================
grammar StartOfARuleFailTest;

options {    output=AST;    ASTLabelType=CommonTree; }

tokens { ROOT_TOKEN;ERROR_TOKEN;EXPRESSIONS;EXPRESSION; }

@members {
    @Override
    protected Object recoverFromMismatchedToken(IntStream input, int 
ttype, BitSet follow)
            throws RecognitionException {
        MismatchedTokenException ex = new 
MismatchedTokenException(ttype, input);
        input.consume();
        return createErrorToken(ex, ttype);
    }
   
    public static ErrorCommonToken createErrorToken(RecognitionException 
ex, int ttype) {
        ErrorCommonToken errorCommonToken = new ErrorCommonToken(ex.token);
        errorCommonToken.setType(ttype);
       
        return errorCommonToken;
    }
}

root : expressions  EOF -> ^(ROOT_TOKEN expressions) ;
expressions  : expression* -> ^(EXPRESSIONS expression*) ;
expression : '[' 'a' 'x' ']' -> ^(EXPRESSION '[' 'a' 'x' ']');

OTHER   : . ;
================


Gordon Tyler wrote:
> The grammar you have defined says, roughly:
>
> Parse any number of '[' or ']' until you reach EOF.
>
> It does not describe what to do if something other than '[' or ']' are found 
> before EOF is found.
>
> You have defined a token, OTHER, to match the other stuff, but your parse 
> rules do not reference OTHER. Perhaps something like this would work:
>
> root : (expressions | OTHER)* EOF -> ^(ROOT_TOKEN expressions) ;
>
>
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Luchesar Cekov
> Sent: June 30, 2010 10:10 AM
> To: [email protected]
> Cc: Valerio Malenchino
> Subject: [antlr-interest] Continue parsing after an error
>
> Dear ANTLR enthusiasts,
>
> I am struggling with a problem. The parser jumps to the end of file from 
> the middle of the document.
>
> The setup is as follow:
>     * I have two alternatives flowed by EOF
>     * during parse time in the middle of the document next token can not 
> match either alternatives start
>
> This leads to parsing termination because the parser jumps to the EndOfFile.
>
> A simple grammar the illustrates the problem is
>
> ===============
> tokens {ROOT_TOKEN;}
> root
>     : expressions EOF -> ^(ROOT_TOKEN expressions) ;
> expressions : ('[' | ']')* ;
> OTHER   : . ;
> ===============
>
> If then I try parsing "[[][]]sdsdf[]][]][" the parsing will stop and the 
> first "s" and will try to recover as if the EOF was the next token.
> When looking at the generated Parser it looks like if there is no viable 
> alternative in the top rule in this case "root" the parser will behave 
> as if it reached the EOF and will skip the rest of the tokens.
>
> The result AST will contain only children up until the first illegal 
> token "s".
>
> I cannot see where my mistake is. It looks like the parser should not do 
> that. Can you suggest a workaround for the problem?
>
> Thanks in advance,
> Luchesar
>   

-- 

Luchesar Cekov
Software Engineer
+44 (0) 207 239 4949
*Ontology Systems*
www.ontology.com <http://www.ontology.com/>

        

award list of icons       

 

 

 

 

.

 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to