The first int to Lucene41PostingsFormat is the min block size (default 25) and the second is the max (default 48) for the block tree terms dict.
The max must be >= 2*(min-1). Since you were using 8X the default before, maybe try min=200 and max=398? However, block tree should have been more RAM efficient than 3.x's terms index... if you run CheckIndex with -verbose it will print additional details about the block structure of your terms indices... Mike McCandless http://blog.mikemccandless.com On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West <tburt...@umich.edu> wrote: > Hello all, > > We have over 3 billion unique terms in our indexes and with Solr 3.x we set > the TermIndexInterval to about 8 times its default value in order to index > without OOMs. ( > http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again) > > We are now working with Solr 4 and running into memory issues and are > wondering if we need to do something analogous for Solr 4. > > The javadoc for IndexWriterConfig ( > http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29 > ) > indicates that the lucene 4.1 postings format has some parameters which may > be set: > "..To configure its parameters (the minimum and maximum size for a block), > you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, > int) > <https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29> > " > > Is there documentation or discussion somewhere about how to determine > appropriate parameters or some detail about what setting the maxBlockSize > and minBlockSize does? > > Tom Burton-West > http://www.hathitrust.org/blogs/large-scale-search --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org