The huge file size occurs because your lexer/parser is probably trying to do too much or asking ANTLR to do lots of disambiguation and the complex overlaps are generating huge tables. In the case of the parser, I suspect that you need some single token predicates to help with keyword disambiguation; have you removed ALL the warnings that ANTLR generates on your grammar? If you do not remove all the warnings then this sort of thing happens a lot. Especially on a terrible language such as SQL has morphed in to.
The code only LOOKS small in Java because the generated java uses run length encoded strings for the table values that it must expand at runtime - the C target lays down the exact same tables, but in static so that it is set up at compile time. Java is unable to use compile time initialized tables like this until JDK 1.7, so the Java target must jump through hoops to generate the tables. So in fact generating the C is a better indicator of how efficient your grammar is. You can probably trace the table sizes down to a few key decisions. Your set text errors are likely that you are not using the SETTEXT macro correctly in some way. Also, I would avoid doing that at lex time and do any manipulation if you actually use the token in question. I can't help unless I see the lexer code in question though. Use the 3.4 beta C runtime - there is no difference in the release version except for the API documentation that I keep trying to finish but my boat keeps winking at me and making me go on the river. Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Todor Dimitrov > Sent: Saturday, August 20, 2011 7:39 AM > To: [email protected] > Subject: [antlr-interest] Sparql Grammar & Huge C Files > > Dear *, > > generating the C lexer and parser for the Sparql grammar using the > options below produces huge files: > > options { > language = C; > output = AST; > ASTLabelType = pANTLR3_BASE_TREE; > } > > 2.4K Sparql.tokens > 85M SparqlLexer.c <--- > 30K SparqlLexer.h > 1.5M SparqlParser.c <--- > 69K SparqlParser.h > > In addition, the files cannot be compiled as it seems that the > generators have not been updated to reflect the API changes in the > latest C runtime (or maybe it is the other way round :)). In > particular, I see errors like these: > > SparqlLexer.c:1214276:48: error: member reference type 'pANTLR3_STRING' > (aka 'struct ANTLR3_STRING_struct *') is a > pointer; maybe you meant to use '->'? > setText(LEXER->getText(LEXER).substring(1, LEXER- > >getText(LEXER).length()-1)); > ~~~~~~~~~~~~~~~~~~~~~^ > -> > SparqlLexer.c:1214276:49: error: no member named 'substring' in 'struct > ANTLR3_STRING_struct'; did you mean 'subString'? > setText(LEXER->getText(LEXER).substring(1, LEXER- > >getText(LEXER).length()-1)); > ^~~~~~~~~ > subString > ./antlr3string.h:179:8: note: 'subString' declared here > (*subString) (struct > ANTLR3_STRING_struct * string, ANTLR3_UINT32 ... > ^ > SparqlLexer.c:1214276:83: error: member reference type 'pANTLR3_STRING' > (aka 'struct ANTLR3_STRING_struct *') is a > pointer; maybe you meant to use '->'? > setText(LEXER->getText(LEXER).substring(1, LEXER- > >getText(LEXER).length()-1)); > > ~~~~~~~~~~~~~~~~~~~~~^ > > -> > SparqlLexer.c:1214276:84: error: no member named 'length' in 'struct > ANTLR3_STRING_struct' > setText(LEXER->getText(LEXER).substring(1, LEXER- > >getText(LEXER).length()-1)); > > > I'm using antlr 3.4, but I have also tested this with antlr 3.3. > Generating the Java lexer and parser works as expected and the files > are much smaller: > > 2.4K Sparql.tokens > 582K SparqlLexer.java > 876K SparqlParser.java > > Any suggestions and help are highly appreciated. > > Thanks in advance, > > Todor > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
