Daniel Knapp
Tue, 24 Nov 2009 05:52:24 -0800
Am 24.11.2009 um 14:41 schrieb Jukka Zitting: > Hi, > > On Tue, Nov 24, 2009 at 1:17 PM, Daniel Knapp > <daniel.kn...@mni.fh-giessen.de> wrote: >> i'm trying to parse about 4GB of data. With the following code it always >> results in an JavaHeapSpace Error. I think there must be a better way to do >> this, but i don't know how. >> Has anybody a hint for me how to solve this problem? I think increasing the >> HeapSpace in Eclipse should not be the solution. >> [...] >> StringWriter textBuffer = new StringWriter(); > > Instead of buffering the text in memory, you can stream it to a file > or some other place. Where are are you planning to put the parse > result?
I want to send the results to a Solr Server (the integrated handler in Solr is no option for me, the files or on another Server). > > With Tika 0.5 you could do something as simple as this: > > import org.apache.tika.Tika; > > Reader reader = new Tika().parse(file); > > You can then read the parse result incrementally from the reader > object, or pass the reader for example to a Lucene Document for > indexing. I've read about that. But i don't know how to check when the end of a file is reached and merge the result with the related Metadata. > > BR, > > Jukka Zitting
smime.p7s
Description: S/MIME cryptographic signature