Tom: I'll be very interested to see your final numbers. I did a worst-case test at one point and saw a 2/3 reduction, but.... that was deliberately "worst case", I used a bunch of string/text types, did some faceting on them, etc, IOW not real-world at all. So it'll be cool to see what you come up with.
The other benefit is that you have many, many few objects allocated on the heap, I was seeing two orders of magnitude fewer. That's right, 99% reduction. Again, though, I was deliberately doing really bad stuff.... Best, Erick On Sat, Jan 10, 2015 at 4:58 PM, Tom Burton-West <tburt...@umich.edu> wrote: > Thanks Mike, > > We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4 > with 4,6, and 8GB for heap. As of Friday night when the indexes were about > half done (about 400GB on disk) only the 4GB had issues. I'll find out on > Monday if the other runs had issues. If we can go from 10GB in Solr 3.x to > 6GB with Solr 4.x, that will be a significant change. > > With TermsIndexInterval we traded off less memory use for increased chance > of disk seeks and more data to be read per seek (and if I remember right, > that more data was scanned sequentially rather than binary searched.) > What is the trade-off when increasing the block size? > > Tom > > On Sat, Jan 10, 2015 at 4:46 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> The first int to Lucene41PostingsFormat is the min block size (default >> 25) and the second is the max (default 48) for the block tree terms >> dict. >> >> The max must be >= 2*(min-1). >> >> Since you were using 8X the default before, maybe try min=200 and >> max=398? However, block tree should have been more RAM efficient than >> 3.x's terms index... if you run CheckIndex with -verbose it will print >> additional details about the block structure of your terms indices... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West <tburt...@umich.edu> >> wrote: >> > Hello all, >> > >> > We have over 3 billion unique terms in our indexes and with Solr 3.x we >> set >> > the TermIndexInterval to about 8 times its default value in order to >> index >> > without OOMs. ( >> > http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again) >> > >> > We are now working with Solr 4 and running into memory issues and are >> > wondering if we need to do something analogous for Solr 4. >> > >> > The javadoc for IndexWriterConfig ( >> > >> http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29 >> > ) >> > indicates that the lucene 4.1 postings format has some parameters which >> may >> > be set: >> > "..To configure its parameters (the minimum and maximum size for a >> block), >> > you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, >> > int) >> > < >> https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29 >> > >> > " >> > >> > Is there documentation or discussion somewhere about how to determine >> > appropriate parameters or some detail about what setting the maxBlockSize >> > and minBlockSize does? >> > >> > Tom Burton-West >> > http://www.hathitrust.org/blogs/large-scale-search >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org