Super, thanks for bringing closure! Mike McCandless
http://blog.mikemccandless.com On Wed, Apr 18, 2012 at 5:33 PM, Ivan Brusic <i...@brusic.com> wrote: > Just wanted to circle back and report on our progress. > > We finally applied the settings to our production environment and the > improvements have been dramatic. Our indexing time has returned to 2.3 > levels. > > Thanks again, > > Ivan > > On Fri, Apr 6, 2012 at 11:36 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> On Thu, Apr 5, 2012 at 3:31 PM, Ivan Brusic <i...@brusic.com> wrote: >> >>> On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless >>> <luc...@mikemccandless.com> wrote: >>>> I'm assuming this is a "build once and never change" index...? Else, >>>> it sounds like you should never run forceMerge... >>> >>> Correct. The forceMerge was merely to preserve the previous 2.3 >>> behavior of using optimize. >> >> OK. Avoid it, unless you can't... >> >>>> To preserve insertion order you just need to use one of the >>>> Log*MergePolicy (which you are already doing). Merge factor doesn't >>>> affect this... >>> >>> I was never sure why the merge factor was set to 2. My experiences in >>> the past was to set a high merge factor when doing a batch index. >> >> Well, it's not entirely clear... you'd have to test in your env to be sure. >> >> My instinct is to use a large (maybe infinite) MF while indexing, and >> then big MF while forceMerge'ing. >> >>>> For the fastest way to get to a single-segment index.... use >>>> NoMergePolicy while indexing the documents, and set the largest RAM >>>> buffer you can afford. This will create tons of segments in the index >>>> dir, which is fine as long as you will not open a reader on it... >>>> then: >>>> >>>> Open a new IW, with Log*MergePolicy, set a highish (maybe 30) >>>> mergeFactor, and call forceMerge(1). You may need to cutover to >>>> SerialMergeScheduler... >>> >>> NoMergePolicy? Never seen that class used before. >> >> It's like Log*MP with infinite mergeFactor... >> >>> RAM buffer size is >>> not an issue. Is the limitation still 2048MB? >> >> Yes. >> >>> Is the fastest way also the best way? :) There will never be a read >>> open on the index. Your second solution is similar to the existing >>> code with the exception of the mergeFactor. Will setting the merge >>> factor to a more reasonable number help with the merge speed? >> >> I think you'd have to test in your env. >> >> A non-infinite MF is good in that it gets some merges out of the way >> before the end, ie, you can soak up some otherwise unused IO >> resources/concurrency while you are indexing... making it less >> work/time to forceMerge in the end. >> >>> What enforces the preservation of the insertion order? The >>> MergePolicy? >> >> MergePolicy does. >> >> Though, in 4.0, it's also important you use only 1 thread for >> indexing. Prior to 4.0, docIDs were assigned in arrival order, >> across threads, but with 4.0, each thread gets a private segment, so >> the docIDs are jumbled. >> >>> How does the MergeScheduler affect things? >> >> It shouldn't affect docID order. >> >>> Used Lucene >>> on a few projects over the years and I never had to tweak the index >>> creation. >> >> The defaults normally work well... but docID assignment is an impl >> detail and is free to change across releases... >> >>> I guess I need to reread the tuning chapter in LIA, it's >>> been a few years. >> >> ;) >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org