Definitely. Thanks for both the suggestions. Yes it is 300MB.(typo)
suba suresh.
Rob Staveley (Tom) wrote:
Let us know how you get on. There are a lot of people fighting very similar
battles on this list.
-----Original Message-----
From: Suba Suresh [mailto:[EMAIL PROTECTED]
Sent: 13 July 2006 15:30
To: java-user@lucene.apache.org
Subject: Re: Out of memory error
Thanks.
I am using the getText(PDDocument) method of the PDFTextStripper. I will try
the other suggestion.
suba suresh.
Rob Staveley (Tom) wrote:
If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#get
Text(o rg.pdfbox.pdmodel.PDDocument), you are going to get a large
String and may need a 1G heap.
If, however, you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#wri
teText
(org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a
temporary file, you will not need so much RAM, but you need to use
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
d.html
#Field(java.lang.String,%20java.io.Reader) to construct your Lucene
field (rather than
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
d.html
#Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.docum
ent.Fi eld.Store,%20org.apache.lucene.document.Field.Index)).
-----Original Message-----
From: Suba Suresh [mailto:[EMAIL PROTECTED]
Sent: 13 July 2006 14:55
To: java-user@lucene.apache.org
Subject: Out of memory error
I am indexing different document formats with lucene 1.9. One of the
pdf file I am indexing is 300MG. Whenever the index writer hits that
file it stops the indexing with "Out of Memory" exception. I am using
the pdf box library to index. I have set the following merge factors in my
code.
writer.setMergeFactor(1000);
writer.setMaxMergeDocs(9999999);
writer.setMaxBufferedDocs(1000);
writer.setMaxFieldLength(Integer.MAX_VALUE);
I would like any help and suggestions.
thanks,
suba suresh.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]