Thanks, Karl. It would be good if maxBufferedDocs could respond dynamically
to available heap. It seems a shame to set <10 for the sake of sporadic
large documents. Failing that, it would be nice if we could explicitly
pre-flush buffers when we encounter a big field.

I'm increasingly thinking that mergeFactor is what I need to look at. I
currently have it set to the default 10, but bearing in mind that it is a
real-time application (indexing messages from an MTA), it makes sense to
make this smaller. Is the RAM requirement die to mergeFactor a product of
Document size and mergeFactor or does Document size have no bearing on the
RAM requirement due to mergeFactor?


-----Original Message-----
From: karl wettin [mailto:[EMAIL PROTECTED] 
Sent: 06 June 2006 10:48
To: java-user@lucene.apache.org
Subject: RE: Avoiding java.lang.OutOfMemoryError in an unstored field

On Tue, 2006-06-06 at 10:43 +0100, Rob Staveley (Tom) wrote:
> You are right there are going to be a lot of tokens. The entire boxy 
> of a text document is getting indexed in an unstored field, but I 
> don't see how I can flush a partially loaded field.

Check these out:

http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.h
tml

void setMaxBufferedDocs(int maxBufferedDocs) 
          Determines the minimal number of documents required before the
buffered in-memory documents are merging and a new Segment is created. 

void setMaxFieldLength(int maxFieldLength) 
          The maximum number of terms that will be indexed for a single
field in a document.  

void setMergeFactor(int mergeFactor) 
          Determines how often segment indices are merged by addDocument().


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to