[il-antlr-interest: 27716] Re: [antlr-interest] ANTLR running out of memory during generation

Jim Idle Sat, 30 Jan 2010 11:44:23 -0800

Ron,

First you really need to switch off backtracking unless the objective of your 
parser is to analyze SQL (you gave it away when you mentioned 632 keywords that 
can be identifiers). There are not as many predicates required as you think so 
long as you left factor everything.


Your tokens should be consecutive so long as you list them that way in the 
lexer. 

The problem might well be that although SQL sort of allows all keywords to be 
identifiers, it does not allow all because some of them would be to ambiguous 
even for a syntax directed hand crafted parser. If you turn on backtracking 
then try to allow one of these reserved words to be an identifier, then you 
will probably mask the issue because all warnings and errors are turned off. 

It is entirely feasible to create a full SQL parser without backtracking, very 
little look ahead and few predicates (all of the one or two token lookahead 
type). I have an online demo of T-SQL for instance on my web site at 
www.temporal-wave.com  (select 'online demos' link), and Oracle SQL/PLSQL will 
be up there before long too.

So, I think you will need to do the following to have a chance of generating 
the code:

1) Use -Xconversiontimeout 10000
2) Cause switches to be generated rather than ifs: -Xmaxswitchcaselabels 32000 
-Xminswitchalts 1-xmaxinlineddfastates 65534
3) Use -Xmx2G when invoking the java command (assuming your jvm allows that)

But if you cannot get it going that way, then basically you are masking a 
bigger problem in your grammar that you are not seeing because of global 
backtracking. 

Jim

> -----Original Message-----
> From: [email protected] [mailto:antlr-interest-
> [email protected]] On Behalf Of Ron Hunter-Duvar
> Sent: Friday, January 29, 2010 8:52 PM
> To: [email protected]
> Subject: [antlr-interest] ANTLR running out of memory during generation
> 
> I'm having a strange problem with ANTLR. I'm building a grammar for a
> language with a huge number (hundreds) of non-reserved keywords. I'm
> using the approach of having the lexer return a different token type
> for
> each keyword, and then having a parser rule of the form:
> 
>     id : ( ID | QUOTED_ID | KW_A | KW_B | ... | KW_ZZZ );
> 
> This was working great until today. In fact, ANTLR 3.2 generates
> surprisingly clever code for this - all the keywords are assigned
> consecutive token numbers, and generated code just says:
> 
>     if ( (input.LA(1)>=KW_A && input.LA(1)<=KW_ZZZ)||(input.LA(1)>=ID
> &&
> input.LA(1)<=QUOTED_ID) ) {
>         input.consume();
>         ...
> 
> This works all the way up to 631 keywords. ANTLR runs in about 20
> seconds, and never uses more than 269MB of memory. When I add a 632nd
> keyword (doesn't matter what the keyword is), and change nothing else,
> ANTLR runs for 2 minutes and runs out of heap space. I kept bumping the
> max space up, but even going to 2GB doesn't make any difference.
> 
> What's really interesting is that I was using ANTLR 3.1 until now. When
> I ran into this I upgraded to 3.2, but both of them fail at exactly the
> same spot, 632 keywords. Not surprisingly, the stack trace varies from
> one run to the next, depending on the exact point it runs out of
> memory,
> but it always has deeply nested calls to these and other methods:
> 
> 
> org.antlr.stringtemplate.language.ASTExpr.writeTemplate(ASTExpr.java:75
> 0)
>     org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:680)
> 
> org.antlr.stringtemplate.language.ASTExpr.writeAttribute(ASTExpr.java:6
> 60)
> 
> org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluato
> r.java:86)
>     org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149)
> 
> org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705)
> 
> I don't know if it makes a difference, but I'm using backtracking
> (otherwise, this approach to non-reserved keywords doesn't work without
> a lot of synpreds), and outputting ASTs.
> 
> Since this is size related, it's hard to narrow it down to a simple
> example. I could try to duplicate it with just the id rule and nothing
> else.
> 
> Any ideas what might be happening here, and whether a fix might be
> possible?
> 
> Thanks,
> Ron
> 
> --
> Ron Hunter-Duvar | Software Developer V | 403-272-6580
> Oracle Service Engineering
> Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5
> 
> All opinions expressed here are mine, and do not necessarily represent
> those of my employer.
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 27716] Re: [antlr-interest] ANTLR running out of memory during generation

Reply via email to