On Thu, Dec 17, 2009 at 1:37 AM, David-Sarah Hopwood <[email protected]> wrote: > Marcin Rzeźnicki wrote: >> 2009/12/14 Marcin Rzeźnicki <[email protected]>: >>> 2009/12/13 Jim Idle <[email protected]>: >>>> This usually means that your lexer token numbers are out of sync with your >>>> parser tokens. Regen in correct order and make sure all tokens have been >>>> declared. >>>> >>> Umm, what if I work with combined grammar? And some of literals are >>> 'inlined'? >> >> I think I know what has been causing this problem but I am scratching >> my head. It seems that ANTLR lexer is, well, a strange beast. >> I have a rule, say >> CLASS >> : >> 'class' >> ; >> >> and below >> >> IDENTIFIER >> : >> {Character.isJavaIdentifierStart(input.LA(1))}?=> . ( >> {Character.isJavaIdentifierPart(input.LA(1))}?=> . )* >> ; >> >> (the latter rule has been questioned here, but bear with me a while, I >> need it to explain my case) >> >> Now, upon seeing input 'class' ANTLR matches IDENTIFIER because of >> this gating predicate. Well, 'class' would have been a valid >> identifier, of course but shouldn't it try to match 'class' based on >> rules precedence? > > This seems to be an idiosyncrasy of how ANTLR lexers treat gated semantic > predicates. Although . can match the 'c' in 'class', it appears that ANTLR > doesn't recognize that because of the predicate. That is the reason for the > additional complexity in the rules that I posted earlier: >
I wonder, it seems that it knows that it can match CLASS and IDENTIFIER at the point of seeing 'c' in fresh state. The problem lies, I think, in the fact that it ignores the latter guard - isJavaIdentifierPart. My conclusion after debugging the lexer is that it behaves like: 1: I see 'c' so that can be a CLASS - good - move on. 2: I see 'l' so that can still be a CLASS, else I would assume that I would be an ID 3: ... 4: Now I might be a CLASS, I am looking beyond if ( ((LA35_411>='\u0000' && LA35_411<='\uFFFF')) && ((Character.isJavaIdentifierStart(input.LA(1))))) (Now, I do not get this completely why it checks here so, it should have checked isJavaIdPart instead) 5: From the above check I conclude that this is an ID Steps 4 and 5 might be a little bit unclear - I think that the input rewind has taken place somewhere, hence antlr conclusion. Possibly that's the error cause. I'll investigate further. Thank you for an interesting idea -- Greetings Marcin Rzeźnicki List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
