RE: Out of memory error

Rob Staveley (Tom) Thu, 13 Jul 2006 07:23:33 -0700

If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o
rg.pdfbox.pdmodel.PDDocument), you are going to get a large String and may
need a 1G heap.


If, however, you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#writeText
(org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a temporary
file, you will not need so much RAM, but you need to use
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html
#Field(java.lang.String,%20java.io.Reader) to construct your Lucene field
(rather than
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html
#Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.document.Fi
eld.Store,%20org.apache.lucene.document.Field.Index)).

-----Original Message-----
From: Suba Suresh [mailto:[EMAIL PROTECTED] 
Sent: 13 July 2006 14:55
To: [email protected]
Subject: Out of memory error

I am indexing different document formats with lucene 1.9. One of the pdf
file I am indexing is 300MG. Whenever the index writer hits that file it
stops the indexing with "Out of Memory" exception. I am using the pdf box
library to index. I have set the following merge factors in my code.

writer.setMergeFactor(1000);
writer.setMaxMergeDocs(9999999);
writer.setMaxBufferedDocs(1000);
writer.setMaxFieldLength(Integer.MAX_VALUE);

I would like any help and suggestions.

thanks,
suba suresh.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

smime.p7s
Description: S/MIME cryptographic signature

RE: Out of memory error

Reply via email to