I am learning ANTLR, so my answer might not be the very best solution, but here 
goes.

I think you are right about there being a bug within ANTLR, when I copy your 
grammer and run with your test data I see the parser only recieving one zero 
when multiple zeros are together.  

To me that implies looking into the lexer code generated by ANTLR, and in 
GammarLogLexer.java I find this:
    ...
    public void mTokens() throws RecognitionException {
        // ...GrammarLog.g:1:8: ( REG7_TIPO | REG9_TIPO | HEX_DIGIT )
        int alt1=3;
        int LA1_0 = input.LA(1);
        if ( (LA1_0=='0') ) {
            int LA1_1 = input.LA(2);
            if ( (LA1_1=='0') ) {
                int LA1_3 = input.LA(3);
                if ( (LA1_3=='0') ) {
                    int LA1_4 = input.LA(4);
                    if ( (LA1_4=='7') ) {
                        alt1=1;
                    }
                    else if ( (LA1_4=='9') ) {
                        alt1=2;
                    }
                    else {
                        NoViableAltException nvae =
                            new NoViableAltException("", 1, 4, input);
                        throw nvae;
                    }
                }
                else {
                    NoViableAltException nvae =
                        new NoViableAltException("", 1, 3, input);
                    throw nvae;
                }
            }
            else {
                alt1=3;}
        }
        ...
The way I read this code is that if the lexer finds two zeros in a row, it is 
going to return REG7_TIPO or REG9_TIPO or throw an exception.  I think the two 
else blocks should return a HEX_DIGIT token instead of throwing an exception.

Ok, first attempt is to make the two tokens actual token rules within the 
grammar and place after the existing HEX_DIGIT rule (also comment out the token 
option near the top).  Which reorganizes the code slightly but did not affect 
the execution, and it still has the same problem.

Next option is to put a gated symantic predicate on the two lexer rules (place 
these after the HEX_DIGIT rule), this forces these two tokens to appear at the 
beginning of a line and turns off their generation later in the line (this is 
where someone with more experience might be able to give a better predicate 
than what I have placed here):
        REG7_TIPO    :        {$pos == 0}?=>    '0007FFF8'; 
        REG9_TIPO    :        {$pos == 0}?=>    '0009FFF6';

Well, that will make your test data work (although I had to add six more hex 
digits at the end - I didn't attempt to see if you had correct string length 
for your test).  It does appear to pick up the zeros correctly.  

Two potential problems I can see with this, first can your language have just 
hex digits at the beginning, that would make my predicates invalid.  Second is 
if the two tokens can appear in any other location within the "sentence", then 
the predicates would disqualify the tokens at any other location.  And since 
this is a scaled down version, I am not sure if either of these are true.

I also attempted if the REG7_TIPO and REG9_TIPO started with a non-hex digit 
and then everything would work right away without gated symantic predicates.  
And there would be no problems with the token appearing in the middle of a 
line.  This actually would be the best thing from a code perspective, although 
I am not sure it is possible.

Hope this helps.
Wayne

PS. I think this might be the exact same problem for your other questions, also 
mentioning multiple zeros.


      

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to