Hey guys, A question was posted a few days ago about dealing with an infinite input stream, and the suggestion was to subclass TokenStream so that it didn't read in all of the input upfront.
I'm running into a similar problem, but before I go run off and subclass things I thought I'd see if there's a "best practice" for my situation. It also overlaps with the "how do I use keywords as identifiers<http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741>" FAQ. I have a data-file grammar that recognizes strings, numbers, and a ton of keywords. Pretending "VERSION" and "LIMIT" are two keywords, here's (part of) the .g file: data_file: 'VERSION' STRING ';' | 'LIMIT' NUMBER ';' ; NUMBER: ('-'|'+')? ('0'..'9')+ | ('-'|'+')? ('0'..'9')* '.' ('0'..'9')* ; STRING: ('a'..'z' | 'A'..'Z' | '_' | '.' | '0'..'9')+ ; Problem input #1: VERSION 1.2 ; The "1.2" is lexed as a number instead of a string, so I get a parse error. Problem input #2: VERSION LIMIT ; The "LIMIT" is lexed as a keyword instead of a string, so I get a parse error. I saw the FAQ about keywords-as-identifiers, but I don't think it's helpful for me. For the NUMBER-that-should-be-a-STRING problem, there's no exact string I could pass to input.LT(1).getText().equals(), because it requires a regex to match a NUMBER. The other solution was to make an "identifier" rule to match all possibilities -- is the best solution here really to change the rule to 'VERSION' (STRING | NUMBER) ';'? For the keyword-that-should-be-a-STRING problem, I'm hesitant to use either of those solutions because of the sheer number of keywords in this grammar. Ideally what I'd like to do is what I did in Flex and Bison (which I'm porting this grammar from). What I did there was have the parser control how the lexer interpreted subsequent tokens. I embedded a rule in the parser, immediately after the 'VERSION' token, to tell Flex to enter a "force-the-next-token-to-be-a-STRING-no-matter-what" start state. It worked beautifully. I got most of the way through implementing that in my ANTLR grammar when I found out that ANTLRFileStream reads all the tokens in before the parser even starts up -- which means the parser can't give the lexer any direction over token interpretation. Thoughts, suggestions, outrageous flames? Is there a "good" way to do this, or maybe is there a completely different approach I should take? Thanks! -Chris List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
