Let us know how you get on. There are a lot of people fighting very similar battles on this list.
-----Original Message----- From: Suba Suresh [mailto:[EMAIL PROTECTED] Sent: 13 July 2006 15:30 To: java-user@lucene.apache.org Subject: Re: Out of memory error Thanks. I am using the getText(PDDocument) method of the PDFTextStripper. I will try the other suggestion. suba suresh. Rob Staveley (Tom) wrote: > If you are using > http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#get > Text(o rg.pdfbox.pdmodel.PDDocument), you are going to get a large > String and may need a 1G heap. > > If, however, you are using > http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#wri > teText > (org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a > temporary file, you will not need so much RAM, but you need to use > http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel > d.html > #Field(java.lang.String,%20java.io.Reader) to construct your Lucene > field (rather than > http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel > d.html > #Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.docum > ent.Fi eld.Store,%20org.apache.lucene.document.Field.Index)). > > -----Original Message----- > From: Suba Suresh [mailto:[EMAIL PROTECTED] > Sent: 13 July 2006 14:55 > To: java-user@lucene.apache.org > Subject: Out of memory error > > I am indexing different document formats with lucene 1.9. One of the > pdf file I am indexing is 300MG. Whenever the index writer hits that > file it stops the indexing with "Out of Memory" exception. I am using > the pdf box library to index. I have set the following merge factors in my code. > > writer.setMergeFactor(1000); > writer.setMaxMergeDocs(9999999); > writer.setMaxBufferedDocs(1000); > writer.setMaxFieldLength(Integer.MAX_VALUE); > > I would like any help and suggestions. > > thanks, > suba suresh. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
smime.p7s
Description: S/MIME cryptographic signature