Re: suppressing FreqProxPostingsArray

Ken McCracken Tue, 20 Mar 2012 15:21:24 -0700

Hi Mike,

Thanks for the response. We will do some more investigation. We willlook to see if there is a clean way to suppress at least the extra 3array allocations.


Cheers,

-Ken

On Mar 19, 2012, at 5:32 PM, Michael McCandless <[email protected]> wrote:

Hmm, I agree we could be more RAM efficient if the field is DOCS_ONLY.

We shouldn't have to allocate/use docFreqs, lastDocCodes,
lastPositions arrays (3 of the 7); the others are still needed, I
think.

But, that said, you shouldn't hit OOME, as long as your max heap sizes
is large enough (and, your IndexWriterConfig's RAMBufferSizeMB is
small enough); Lucene should simply flush a new segment once the
buffered documents are using too much RAM.
Hmm, and you don't index massive documents. How many UUIDs perdocument?
Mike McCandless

http://blog.mikemccandless.com
On Mon, Mar 19, 2012 at 3:29 PM, Ken McCracken <[email protected]> wrote:
Hi,
I am using lucene-3.5 and getting an OutOfMemoryError on a largeindexingtask of 100M documents. I am creating an index with 3 UUIDs asseparate
field values.  I am using Store.YES on 1 of them and Store.NO on the
others; I am using Index.NOT_ANALYZED_NO_NORMS on all three;explicitly
setting
field.setIndexOptions(IndexOptions.DOCS_ONLY);          and
indexWriterConfig.setTermIndexInterval(termIndexInterval); to1024. I am
trying to index 100M records into my index.
Is there any reasonFreqProxTermsWriterPerField.FreqProxPostingsArray needsto be constructed even though I have the positions etc suppressed?Itseems that the reason I get an OutOfMemoryError is that 7 int[] ofsizeproportional to number of unique fields are being constructed;however, atleast some of them are probably wasteful given my indexingconfigurations.
Any help is appreciated.

Thanks,
-Ken

    [junit] Error:
[junit] Exception in thread "Thread-18"java.lang.OutOfMemoryError:
Java heap space
   [junit]     at
org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:35)
   [junit]     at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:190)
   [junit]     at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204)
   [junit]     at
org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
   [junit]     at
org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:137)
   [junit]     at
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:440)
   [junit]     at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:94)
   [junit]     at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: suppressing FreqProxPostingsArray

Reply via email to