Sam,

Thank you very much for taking time and responding...

I made the changes that you suggested (very valuable and helpful) but ran into the Java Error : Out of heap space. Then I tried using the UnbufferedTokenStream and the approach worked for a file with 300,000 lines. However, when I attempt to parse a file with 1 Million lines, I get the Out of heap space error from the ANTLRReaderStream.

Any suggestions ?

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.antlr.runtime.ANTLRReaderStream.load(ANTLRReaderStream.java:78) at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:68) at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:52) at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:48) at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:40)
        at test.FileImport.main(FileImport.java:23)

-mahesh

On 7/13/2010 6:16 PM, Sam Harwell wrote:

You might try the following:

COMMENT :    '!' .* '\n' {skip();}
        ;

WS      :   ( ' '
        |       '\t'
        |       '\r'
        |       '\n' )+ {skip();}
        ;

1. Use skip() instead of $channel=HIDDEN to prevent the token from ever being created. Setting the channel still creates the token, it just hides it from the parser.

2. Use a + (1 or more) in the WS rule to parse whitespace runs instead of individual characters.

3. Since your code doesn't handle old-style Mac line endings (carriage return '\r' by itself), simplify the COMMENT rule using a wildcard.

*From:* [email protected] [mailto:[email protected]] *On Behalf Of *Mahesh R. Seshan
*Sent:* Tuesday, July 13, 2010 4:34 PM
*To:* [email protected]
*Subject:* [antlr-dev] Java - Out of heap space when parsing huge file

Greetings,

I am trying to use an ANTLR parser to parse a huge file but runs into a Java Error indicating out of heap space. The grammar (as follows) itself is relatively simple. After going over some posts, I do not believe that UnbufferedTokenStream is an option because white-space is to be ignored in the input file...Also, UnbufferedTokenStream is not available in ANTLRv3.2 which is what I am using...Any advise is greatly appreciated....

file    :    line+
        ;
line    :    STRING ID data ';'
        ;
data    :    primitive | sequence
        ;
primitive
        :    INTEGER
        ;
sequence
        :    '{' data? (',' data?)* '}'
        ;

INTEGER :    ('0'..'9')+
        ;

STRING  :    '"' ~('"')* '"'
        ;

ID      :       ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')*
        ;

COMMENT :    '!' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
        ;

WS      :   ( ' '
        |       '\t'
        |       '\r'
        |       '\n' ) {$channel=HIDDEN;}
        ;

Following is the Java error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.antlr.runtime.Lexer.emit(Lexer.java:151)
        at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
at org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119) at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238) at org.antlr.runtime.CommonTokenStream.LA(CommonTokenStream.java:300)
        at parser.FileImportParser.file(FileImportParser.java:56)
        at test.FileImport.main(FileImport.java:42)

-mahesh

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Reply via email to