Jukka Zitting
Tue, 24 Nov 2009 05:42:42 -0800
Hi, On Tue, Nov 24, 2009 at 1:17 PM, Daniel Knapp <daniel.kn...@mni.fh-giessen.de> wrote: > i'm trying to parse about 4GB of data. With the following code it always > results in an JavaHeapSpace Error. I think there must be a better way to do > this, but i don't know how. > Has anybody a hint for me how to solve this problem? I think increasing the > HeapSpace in Eclipse should not be the solution. > [...] > StringWriter textBuffer = new StringWriter();
Instead of buffering the text in memory, you can stream it to a file
or some other place. Where are are you planning to put the parse
result?
With Tika 0.5 you could do something as simple as this:
import org.apache.tika.Tika;
Reader reader = new Tika().parse(file);
You can then read the parse result incrementally from the reader
object, or pass the reader for example to a Lucene Document for
indexing.
BR,
Jukka Zitting