Begin forwarded message:
> From: Todor Dimitrov <[email protected]> > Subject: Re: [antlr-interest] Sparql Grammar & Huge C Files > Date: August 20, 2011 5:52:33 PM GMT+02:00 > To: Jim Idle <[email protected]> > > Hi Jim, > > this is an open source grammar for the Sparql language that has not been > developed by me. I have run the ANTLR tool like this: > > java -Xms1024m -Xmx1024m -cp antlr-3.4-complete.jar org.antlr.Tool Sparql.g > > No warnings have been outputted and looking at the ANTLR tool options, I > don't see any switches that would enable/disable warnings generation. I'm not > using the SETTEXT macro and I'm not quite sure where to use it. Are there any > examples for it? In addition, the Sparql grammar contains only rewriting > rules so I'm not sure whether I have to use the SETTEXT macro. I've attached > the grammar file for reference. > > Todor > > > On Aug 20, 2011, at 5:36 PM, Jim Idle wrote: > >> The huge file size occurs because your lexer/parser is probably trying to >> do too much or asking ANTLR to do lots of disambiguation and the complex >> overlaps are generating huge tables. In the case of the parser, I suspect >> that you need some single token predicates to help with keyword >> disambiguation; have you removed ALL the warnings that ANTLR generates on >> your grammar? If you do not remove all the warnings then this sort of >> thing happens a lot. Especially on a terrible language such as SQL has >> morphed in to. >> >> The code only LOOKS small in Java because the generated java uses run >> length encoded strings for the table values that it must expand at runtime >> - the C target lays down the exact same tables, but in static so that it >> is set up at compile time. Java is unable to use compile time initialized >> tables like this until JDK 1.7, so the Java target must jump through hoops >> to generate the tables. So in fact generating the C is a better indicator >> of how efficient your grammar is. You can probably trace the table sizes >> down to a few key decisions. >> >> Your set text errors are likely that you are not using the SETTEXT macro >> correctly in some way. Also, I would avoid doing that at lex time and do >> any manipulation if you actually use the token in question. I can't help >> unless I see the lexer code in question though. >> >> Use the 3.4 beta C runtime - there is no difference in the release version >> except for the API documentation that I keep trying to finish but my boat >> keeps winking at me and making me go on the river. >> >> >> Jim >> >> >> >>> -----Original Message----- >>> From: [email protected] [mailto:antlr-interest- >>> [email protected]] On Behalf Of Todor Dimitrov >>> Sent: Saturday, August 20, 2011 7:39 AM >>> To: [email protected] >>> Subject: [antlr-interest] Sparql Grammar & Huge C Files >>> >>> Dear *, >>> >>> generating the C lexer and parser for the Sparql grammar using the >>> options below produces huge files: >>> >>> options { >>> language = C; >>> output = AST; >>> ASTLabelType = pANTLR3_BASE_TREE; >>> } >>> >>> 2.4K Sparql.tokens >>> 85M SparqlLexer.c <--- >>> 30K SparqlLexer.h >>> 1.5M SparqlParser.c <--- >>> 69K SparqlParser.h >>> >>> In addition, the files cannot be compiled as it seems that the >>> generators have not been updated to reflect the API changes in the >>> latest C runtime (or maybe it is the other way round :)). In >>> particular, I see errors like these: >>> >>> SparqlLexer.c:1214276:48: error: member reference type 'pANTLR3_STRING' >>> (aka 'struct ANTLR3_STRING_struct *') is a >>> pointer; maybe you meant to use '->'? >>> setText(LEXER->getText(LEXER).substring(1, LEXER- >>>> getText(LEXER).length()-1)); >>> ~~~~~~~~~~~~~~~~~~~~~^ >>> -> >>> SparqlLexer.c:1214276:49: error: no member named 'substring' in 'struct >>> ANTLR3_STRING_struct'; did you mean 'subString'? >>> setText(LEXER->getText(LEXER).substring(1, LEXER- >>>> getText(LEXER).length()-1)); >>> ^~~~~~~~~ >>> subString >>> ./antlr3string.h:179:8: note: 'subString' declared here >>> (*subString) (struct >>> ANTLR3_STRING_struct * string, ANTLR3_UINT32 ... >>> ^ >>> SparqlLexer.c:1214276:83: error: member reference type 'pANTLR3_STRING' >>> (aka 'struct ANTLR3_STRING_struct *') is a >>> pointer; maybe you meant to use '->'? >>> setText(LEXER->getText(LEXER).substring(1, LEXER- >>>> getText(LEXER).length()-1)); >>> >>> ~~~~~~~~~~~~~~~~~~~~~^ >>> >>> -> >>> SparqlLexer.c:1214276:84: error: no member named 'length' in 'struct >>> ANTLR3_STRING_struct' >>> setText(LEXER->getText(LEXER).substring(1, LEXER- >>>> getText(LEXER).length()-1)); >>> >>> >>> I'm using antlr 3.4, but I have also tested this with antlr 3.3. >>> Generating the Java lexer and parser works as expected and the files >>> are much smaller: >>> >>> 2.4K Sparql.tokens >>> 582K SparqlLexer.java >>> 876K SparqlParser.java >>> >>> Any suggestions and help are highly appreciated. >>> >>> Thanks in advance, >>> >>> Todor >>> >>> >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >>> email-address >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
