Hello all,

We have over 3 billion unique terms in our indexes and with Solr 3.x we set
the TermIndexInterval to about 8 times its default value in order to index
without OOMs.  (
http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again)

We are now working with Solr 4 and running into memory issues and are
wondering if we need to do something analogous for Solr 4.

The javadoc for IndexWriterConfig (
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
)
indicates that the lucene 4.1 postings format has some parameters which may
be set:
"..To configure its parameters (the minimum and maximum size for a block),
you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int,
int)
<https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29>
"

Is there documentation or discussion somewhere about how to determine
appropriate parameters or some detail about what setting the maxBlockSize
and minBlockSize does?

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

Reply via email to