note that the char streams still buffer; just token buffers toss stuff out.
ter
On Jul 14, 2010, at 11:11 AM, Mahesh R. Seshan wrote:
> Sam,
>
> Thank you very much for taking time and responding...
>
> I made the changes that you suggested (very valuable and helpful) but ran
> into the Java Error : Out of heap space. Then I tried using the
> UnbufferedTokenStream and the approach worked for a file with 300,000 lines.
> However, when I attempt to parse a file with 1 Million lines, I get the Out
> of heap space error from the ANTLRReaderStream.
>
> Any suggestions ?
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at org.antlr.runtime.ANTLRReaderStream.load(ANTLRReaderStream.java:78)
> at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:68)
> at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:52)
> at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:48)
> at org.antlr.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:40)
> at test.FileImport.main(FileImport.java:23)
>
> -mahesh
>
> On 7/13/2010 6:16 PM, Sam Harwell wrote:
>> You might try the following:
>>
>> COMMENT : '!' .* '\n' {skip();}
>> ;
>>
>> WS : ( ' '
>> | '\t'
>> | '\r'
>> | '\n' )+ {skip();}
>> ;
>>
>> 1. Use skip() instead of $channel=HIDDEN to prevent the token from
>> ever being created. Setting the channel still creates the token, it just
>> hides it from the parser.
>> 2. Use a + (1 or more) in the WS rule to parse whitespace runs instead
>> of individual characters.
>> 3. Since your code doesn’t handle old-style Mac line endings (carriage
>> return '\r' by itself), simplify the COMMENT rule using a wildcard.
>>
>> From: [email protected] [mailto:[email protected]] On
>> Behalf Of Mahesh R. Seshan
>> Sent: Tuesday, July 13, 2010 4:34 PM
>> To: [email protected]
>> Subject: [antlr-dev] Java - Out of heap space when parsing huge file
>>
>> Greetings,
>>
>> I am trying to use an ANTLR parser to parse a huge file but runs into a Java
>> Error indicating out of heap space. The grammar (as follows) itself is
>> relatively simple. After going over some posts, I do not believe that
>> UnbufferedTokenStream is an option because white-space is to be ignored in
>> the input file...Also, UnbufferedTokenStream is not available in ANTLRv3.2
>> which is what I am using...Any advise is greatly appreciated....
>>
>> file : line+
>> ;
>> line : STRING ID data ';'
>> ;
>> data : primitive | sequence
>> ;
>> primitive
>> : INTEGER
>> ;
>> sequence
>> : '{' data? (',' data?)* '}'
>> ;
>>
>> INTEGER : ('0'..'9')+
>> ;
>>
>> STRING : '"' ~('"')* '"'
>> ;
>>
>> ID : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')*
>> ;
>>
>> COMMENT : '!' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>> ;
>>
>> WS : ( ' '
>> | '\t'
>> | '\r'
>> | '\n' ) {$channel=HIDDEN;}
>> ;
>> Following is the Java error:
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>> at org.antlr.runtime.Lexer.emit(Lexer.java:151)
>> at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
>> at
>> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119)
>> at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)
>> at org.antlr.runtime.CommonTokenStream.LA(CommonTokenStream.java:300)
>> at parser.FileImportParser.file(FileImportParser.java:56)
>> at test.FileImport.main(FileImport.java:42)
>>
>> -mahesh
> _______________________________________________
> antlr-dev mailing list
> [email protected]
> http://www.antlr.org/mailman/listinfo/antlr-dev
_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev