The rule is used in another rule to represent a part of a string that doesn't contain reserved word. You're right that the grammar is not well formed, and I should try to refactor it in some way. This rule is only used by the rule that represents strings. So I marked it as a fragment lexer rule and then the size of lexer seems acceptable to me. Because Java runtime generated an acceptable lexer for the original grammar, it confused me at first. Thanks for answering.
On Mon, Mar 7, 2011 at 1:39 AM, Jim Idle <[email protected]> wrote: > It means that that rule is not well formed, but without seeing your lexer > I can't tell you :-) But generally, rules that say "everything but these", > combined with overly specialized rules will cause you issues. Try to be as > relaxed as you can in the lexer without generating ambiguities, then check > valid characters with code; the reason is that the errors you give out > will be semantic in nature such as "Identifiers cannot contain characters > like 'x'" instead of: Unexpected character 'x' skipped, rather than the > ability of ANTLR to work out some way for your rules to be encoded. > > Your rule below though looks highly ambiguous based on the sets. Perhaps > all you are looking for is a final rule in the list of rules that says: > > ANYTHINGELSE : . { myErrorMessage(invalid); } ; > > Additionally adding + to such a rule will generate huge tables and will > probably not make sense. Just removing the + will help, but is unlikely to > be the correct solution. Post what you are trying to do rather than what > is going wrong with what you have. > > Jim > >> -----Original Message----- >> From: [email protected] [mailto:antlr-interest- >> [email protected]] On Behalf Of Mu Qiao >> Sent: Sunday, March 06, 2011 5:19 PM >> To: [email protected] >> Subject: [antlr-interest] How to reduce the size of the generated >> lexer? >> >> Hi >> >> I use c runtime libantlr3c-3.1.3. The generated lexer is bigger than >> 10 MB full of arrays of integers. I tried to see what was going on and >> I found there was a rule: >> NQCHAR_NO_ALPHANUM >> : ~('\n'|'\r'|' >> '|'\t'|'\\'|CARET|QMARK|COLON|AT|SEMIC|POUND|SLASH|BANG|TIMES|COMMA|PIP >> E|AMP|MINUS|PLUS|PCT|EQUALS|LSQUARE|RSQUARE|RPAREN|LPAREN|RBRACE|LBRACE >> |DOLLAR|TICK|DOT|LT|GT|SQUOTE|QUOTE|'a'..'z'|'A'..'Z'|'0'..'9')+; >> >> If I remove the rule, the lexer is only 400 KB. >> >> I'm still new to antlr and I'm not sure if there is any way to refactor >> the rule and reduce the size of lexer. Could anyone please help me out? >> >> -- >> Best wishes, >> Mu Qiao >> GnuPG fingerprint: 92B1 B0C4 8D14 F8C4 EFA5 3ACC 30B3 0DE4 17B1 57E9 >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >> email-address > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > -- Best wishes, Mu Qiao GnuPG fingerprint: 92B1 B0C4 8D14 F8C4 EFA5 3ACC 30B3 0DE4 17B1 57E9 List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
