Thanks, Karl. It would be good if maxBufferedDocs could respond dynamically to available heap. It seems a shame to set <10 for the sake of sporadic large documents. Failing that, it would be nice if we could explicitly pre-flush buffers when we encounter a big field.
I'm increasingly thinking that mergeFactor is what I need to look at. I currently have it set to the default 10, but bearing in mind that it is a real-time application (indexing messages from an MTA), it makes sense to make this smaller. Is the RAM requirement die to mergeFactor a product of Document size and mergeFactor or does Document size have no bearing on the RAM requirement due to mergeFactor? -----Original Message----- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: 06 June 2006 10:48 To: java-user@lucene.apache.org Subject: RE: Avoiding java.lang.OutOfMemoryError in an unstored field On Tue, 2006-06-06 at 10:43 +0100, Rob Staveley (Tom) wrote: > You are right there are going to be a lot of tokens. The entire boxy > of a text document is getting indexed in an unstored field, but I > don't see how I can flush a partially loaded field. Check these out: http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.h tml void setMaxBufferedDocs(int maxBufferedDocs) Determines the minimal number of documents required before the buffered in-memory documents are merging and a new Segment is created. void setMaxFieldLength(int maxFieldLength) The maximum number of terms that will be indexed for a single field in a document. void setMergeFactor(int mergeFactor) Determines how often segment indices are merged by addDocument(). --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
smime.p7s
Description: S/MIME cryptographic signature