The first int to Lucene41PostingsFormat is the min block size (default
25) and the second is the max (default 48) for the block tree terms
dict.

The max must be >= 2*(min-1).

Since you were using 8X the default before, maybe try min=200 and
max=398?  However, block tree should have been more RAM efficient than
3.x's terms index... if you run CheckIndex with -verbose it will print
additional details about the block structure of your terms indices...

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West <tburt...@umich.edu> wrote:
> Hello all,
>
> We have over 3 billion unique terms in our indexes and with Solr 3.x we set
> the TermIndexInterval to about 8 times its default value in order to index
> without OOMs.  (
> http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again)
>
> We are now working with Solr 4 and running into memory issues and are
> wondering if we need to do something analogous for Solr 4.
>
> The javadoc for IndexWriterConfig (
> http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
> )
> indicates that the lucene 4.1 postings format has some parameters which may
> be set:
> "..To configure its parameters (the minimum and maximum size for a block),
> you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int,
> int)
> <https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29>
> "
>
> Is there documentation or discussion somewhere about how to determine
> appropriate parameters or some detail about what setting the maxBlockSize
> and minBlockSize does?
>
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to