Re: [antlr-dev] Java - Out of heap space when parsing huge file

Mahesh R. Seshan Wed, 14 Jul 2010 11:11:29 -0700

Sam,

Thank you very much for taking time and responding...

I made the changes that you suggested (very valuable and helpful) butran into the Java Error : Out of heap space. Then I tried using theUnbufferedTokenStream and the approach worked for a file with 300,000lines. However, when I attempt to parse a file with 1 Million lines, Iget the Out of heap space error from the ANTLRReaderStream.


Any suggestions ?

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

atorg.antlr.runtime.ANTLRReaderStream.load(ANTLRReaderStream.java:78)atorg.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:68)atorg.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:52)atorg.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:48)atorg.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:40)

        at test.FileImport.main(FileImport.java:23)

-mahesh

On 7/13/2010 6:16 PM, Sam Harwell wrote:

You might try the following:

COMMENT :    '!' .* '\n' {skip();}
        ;

WS      :   ( ' '
        |       '\t'
        |       '\r'
        |       '\n' )+ {skip();}
        ;
1. Use skip() instead of $channel=HIDDEN to prevent the token fromever being created. Setting the channel still creates the token, itjust hides it from the parser.
2. Use a + (1 or more) in the WS rule to parse whitespace runs insteadof individual characters.
3. Since your code doesn't handle old-style Mac line endings (carriagereturn '\r' by itself), simplify the COMMENT rule using a wildcard.
*From:* [email protected][mailto:[email protected]] *On Behalf Of *Mahesh R. Seshan
*Sent:* Tuesday, July 13, 2010 4:34 PM
*To:* [email protected]
*Subject:* [antlr-dev] Java - Out of heap space when parsing huge file

Greetings,
I am trying to use an ANTLR parser to parse a huge file but runs intoa Java Error indicating out of heap space. The grammar (as follows)itself is relatively simple. After going over some posts, I do notbelieve that UnbufferedTokenStream is an option because white-space isto be ignored in the input file...Also, UnbufferedTokenStream is notavailable in ANTLRv3.2 which is what I am using...Any advise isgreatly appreciated....
file    :    line+
        ;
line    :    STRING ID data ';'
        ;
data    :    primitive | sequence
        ;
primitive
        :    INTEGER
        ;
sequence
        :    '{' data? (',' data?)* '}'
        ;

INTEGER :    ('0'..'9')+
        ;

STRING  :    '"' ~('"')* '"'
        ;

ID      :       ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')*
        ;

COMMENT :    '!' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
        ;

WS      :   ( ' '
        |       '\t'
        |       '\r'
        |       '\n' ) {$channel=HIDDEN;}
        ;

Following is the Java error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.antlr.runtime.Lexer.emit(Lexer.java:151)
        at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
atorg.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119)atorg.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)atorg.antlr.runtime.CommonTokenStream.LA(CommonTokenStream.java:300)
        at parser.FileImportParser.file(FileImportParser.java:56)
        at test.FileImport.main(FileImport.java:42)

-mahesh

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] Java - Out of heap space when parsing huge file

Reply via email to