Begin forwarded message:

> From: Todor Dimitrov <[email protected]>
> Subject: Re: [antlr-interest] Sparql Grammar & Huge C Files
> Date: August 20, 2011 5:52:33 PM GMT+02:00
> To: Jim Idle <[email protected]>
> 
> Hi Jim,
> 
> this is an open source grammar for the Sparql language that has not been 
> developed by me. I have run the ANTLR tool like this:
> 
> java -Xms1024m -Xmx1024m -cp antlr-3.4-complete.jar org.antlr.Tool Sparql.g
> 
> No warnings have been outputted and looking at the ANTLR tool options, I 
> don't see any switches that would enable/disable warnings generation. I'm not 
> using the SETTEXT macro and I'm not quite sure where to use it. Are there any 
> examples for it? In addition, the Sparql grammar contains only rewriting 
> rules so I'm not sure whether I have to use the SETTEXT macro. I've attached 
> the grammar file for reference.
> 
> Todor
> 
> 
> On Aug 20, 2011, at 5:36 PM, Jim Idle wrote:
> 
>> The huge file size occurs because your lexer/parser is probably trying to
>> do too much or asking ANTLR to do lots of disambiguation and the complex
>> overlaps are generating huge tables. In the case of the parser, I suspect
>> that you need some single token predicates to help with keyword
>> disambiguation; have you removed ALL the warnings that ANTLR generates on
>> your grammar? If you do not remove all the warnings then this sort of
>> thing happens a lot. Especially on a terrible language such as SQL has
>> morphed in to.
>> 
>> The code only LOOKS small in Java because the generated java uses run
>> length encoded strings for the table values that it must expand at runtime
>> - the C target lays down the exact same tables, but in static so that it
>> is set up at compile time. Java is unable to use compile time initialized
>> tables like this until JDK 1.7, so the Java target must jump through hoops
>> to generate the tables. So in fact generating the C is a better indicator
>> of how efficient your grammar is. You can probably trace the table sizes
>> down to a few key decisions.
>> 
>> Your set text errors are likely that you are not using the SETTEXT macro
>> correctly in some way. Also, I would avoid doing that at lex time and do
>> any manipulation if you actually use the token in question. I can't help
>> unless I see the lexer code in question though.
>> 
>> Use the 3.4 beta C runtime - there is no difference in the release version
>> except for the API documentation that I keep trying to finish but my boat
>> keeps winking at me and making me go on the river.
>> 
>> 
>> Jim
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: [email protected] [mailto:antlr-interest-
>>> [email protected]] On Behalf Of Todor Dimitrov
>>> Sent: Saturday, August 20, 2011 7:39 AM
>>> To: [email protected]
>>> Subject: [antlr-interest] Sparql Grammar & Huge C Files
>>> 
>>> Dear *,
>>> 
>>> generating the C lexer and parser for the Sparql grammar using the
>>> options below produces huge files:
>>> 
>>> options {
>>>     language = C;
>>>     output = AST;
>>>     ASTLabelType = pANTLR3_BASE_TREE;
>>> }
>>> 
>>> 2.4K Sparql.tokens
>>> 85M SparqlLexer.c <---
>>> 30K SparqlLexer.h
>>> 1.5M SparqlParser.c <---
>>> 69K SparqlParser.h
>>> 
>>> In addition, the files cannot be compiled as it seems that the
>>> generators have not been updated to reflect the API changes in the
>>> latest C runtime (or maybe it is the other way round :)). In
>>> particular, I see errors like these:
>>> 
>>> SparqlLexer.c:1214276:48: error: member reference type 'pANTLR3_STRING'
>>> (aka 'struct ANTLR3_STRING_struct *') is a
>>>     pointer; maybe you meant to use '->'?
>>>                    setText(LEXER->getText(LEXER).substring(1, LEXER-
>>>> getText(LEXER).length()-1));
>>>                            ~~~~~~~~~~~~~~~~~~~~~^
>>>                                                 ->
>>> SparqlLexer.c:1214276:49: error: no member named 'substring' in 'struct
>>> ANTLR3_STRING_struct'; did you mean 'subString'?
>>>                    setText(LEXER->getText(LEXER).substring(1, LEXER-
>>>> getText(LEXER).length()-1));
>>>                                                  ^~~~~~~~~
>>>                                                  subString
>>> ./antlr3string.h:179:8: note: 'subString' declared here
>>>                                       (*subString)    (struct
>>> ANTLR3_STRING_struct * string, ANTLR3_UINT32 ...
>>>                                         ^
>>> SparqlLexer.c:1214276:83: error: member reference type 'pANTLR3_STRING'
>>> (aka 'struct ANTLR3_STRING_struct *') is a
>>>     pointer; maybe you meant to use '->'?
>>>                    setText(LEXER->getText(LEXER).substring(1, LEXER-
>>>> getText(LEXER).length()-1));
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~~^
>>> 
>>> ->
>>> SparqlLexer.c:1214276:84: error: no member named 'length' in 'struct
>>> ANTLR3_STRING_struct'
>>>                    setText(LEXER->getText(LEXER).substring(1, LEXER-
>>>> getText(LEXER).length()-1));
>>> 
>>> 
>>> I'm using antlr 3.4, but I have also tested this with antlr 3.3.
>>> Generating the Java lexer and parser works as expected and the files
>>> are much smaller:
>>> 
>>> 2.4K Sparql.tokens
>>> 582K SparqlLexer.java
>>> 876K SparqlParser.java
>>> 
>>> Any suggestions and help are highly appreciated.
>>> 
>>> Thanks in advance,
>>> 
>>> Todor
>>> 
>>> 
>>> 
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>>> email-address
>> 
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: 
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> 
> 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to