Sam,
Thank you very much for taking time and responding...
I made the changes that you suggested (very valuable and helpful) but
ran into the Java Error : Out of heap space. Then I tried using the
UnbufferedTokenStream and the approach worked for a file with 300,000
lines. However, when I attempt to parse a file with 1 Million lines, I
get the Out of heap space error from the ANTLRReaderStream.
Any suggestions ?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at
org.antlr.runtime.ANTLRReaderStream.load(ANTLRReaderStream.java:78)
at
org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:68)
at
org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:52)
at
org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:48)
at
org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:40)
at test.FileImport.main(FileImport.java:23)
-mahesh
On 7/13/2010 6:16 PM, Sam Harwell wrote:
You might try the following:
COMMENT : '!' .* '\n' {skip();}
;
WS : ( ' '
| '\t'
| '\r'
| '\n' )+ {skip();}
;
1. Use skip() instead of $channel=HIDDEN to prevent the token from
ever being created. Setting the channel still creates the token, it
just hides it from the parser.
2. Use a + (1 or more) in the WS rule to parse whitespace runs instead
of individual characters.
3. Since your code doesn't handle old-style Mac line endings (carriage
return '\r' by itself), simplify the COMMENT rule using a wildcard.
*From:* [email protected]
[mailto:[email protected]] *On Behalf Of *Mahesh R. Seshan
*Sent:* Tuesday, July 13, 2010 4:34 PM
*To:* [email protected]
*Subject:* [antlr-dev] Java - Out of heap space when parsing huge file
Greetings,
I am trying to use an ANTLR parser to parse a huge file but runs into
a Java Error indicating out of heap space. The grammar (as follows)
itself is relatively simple. After going over some posts, I do not
believe that UnbufferedTokenStream is an option because white-space is
to be ignored in the input file...Also, UnbufferedTokenStream is not
available in ANTLRv3.2 which is what I am using...Any advise is
greatly appreciated....
file : line+
;
line : STRING ID data ';'
;
data : primitive | sequence
;
primitive
: INTEGER
;
sequence
: '{' data? (',' data?)* '}'
;
INTEGER : ('0'..'9')+
;
STRING : '"' ~('"')* '"'
;
ID : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')*
;
COMMENT : '!' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
;
WS : ( ' '
| '\t'
| '\r'
| '\n' ) {$channel=HIDDEN;}
;
Following is the Java error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.antlr.runtime.Lexer.emit(Lexer.java:151)
at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
at
org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119)
at
org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)
at
org.antlr.runtime.CommonTokenStream.LA(CommonTokenStream.java:300)
at parser.FileImportParser.file(FileImportParser.java:56)
at test.FileImport.main(FileImport.java:42)
-mahesh
_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev