Right... It all blends together, I need an NLP analyzer for my emails On Wed, Jan 13, 2010 at 3:05 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > I think Jason meant "15-20GB segments"? > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > ________________________________ > From: Jason Rutherglen <jason.rutherg...@gmail.com> > To: java-user@lucene.apache.org > Sent: Wed, January 13, 2010 5:54:38 PM > Subject: Re: Max Segmentation Size when Optimizing Index > > Yes... You could hack LogMergePolicy to do something else. > > I use optimise(numsegments:5) regularly on 80GB indexes, that if > optimized to 1 segment, would thrash the IO excessively. This works > fine because 15-20GB indexes are plenty large and fast. > > On Wed, Jan 13, 2010 at 2:44 PM, Trin Chavalittumrong <mrt...@gmail.com> > wrote: >> Seems like optimize() only cares about final number of segments rather than >> the size of the segment. Is it so? >> >> On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen < >> jason.rutherg...@gmail.com> wrote: >> >>> There's a different method in LogMergePolicy that performs the >>> optimize... Right, so normal merging uses the findMerges method, then >>> there's a findMergeOptimize (method names could be inaccurate). >>> >>> On Wed, Jan 13, 2010 at 2:29 PM, Trin Chavalittumrong <mrt...@gmail.com> >>> wrote: >>> > Do you mean MergePolicy is only used during index time and will be >>> ignored >>> > by by the Optimize() process? >>> > >>> > >>> > On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen < >>> > jason.rutherg...@gmail.com> wrote: >>> > >>> >> Oh ok, you're asking about optimizing... I think that's a different >>> >> algorithm inside LogMergePolicy. I think it ignores the maxMergeMB >>> >> param. >>> >> >>> >> On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong <mrt...@gmail.com >>> > >>> >> wrote: >>> >> > Thanks, Jason. >>> >> > >>> >> > Is my understanding correct that >>> >> LogByteSizeMergePolicy.setMaxMergeMB(100) >>> >> > will prevent >>> >> > merging of two segments that is larger than 100 Mb each at the >>> optimizing >>> >> > time? >>> >> > >>> >> > If so, why do think would I still see segment that is larger than 200 >>> MB? >>> >> > >>> >> > >>> >> > >>> >> > On Wed, Jan 13, 2010 at 1:43 PM, Jason Rutherglen < >>> >> > jason.rutherg...@gmail.com> wrote: >>> >> > >>> >> >> Hi Trin, >>> >> >> >>> >> >> There was recently a discussion about this, the max size is >>> >> >> for the before merge segments, rather than the resultant merged >>> >> >> segment (if that makes sense). It'd be great if we had a merge >>> >> >> policy that limited the resultant merged segment, though that'd >>> >> >> by a rough approximation at best. >>> >> >> >>> >> >> Jason >>> >> >> >>> >> >> On Wed, Jan 13, 2010 at 1:36 PM, Trin Chavalittumrong < >>> mrt...@gmail.com >>> >> > >>> >> >> wrote: >>> >> >> > Hi, >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > I am trying to optimize the index which would merge different >>> segment >>> >> >> > together. Let say the index folder is 1Gb in total, I need each >>> >> >> segmentation >>> >> >> > to be no larger than 200Mb. I tried to use *LogByteSizeMergePolicy >>> >> *and >>> >> >> > setMaxMergeMB(100) to ensure no segment after merging would be >>> 200Mb. >>> >> >> > However, I still see segment that are larger than 200Mb. I did call >>> >> >> > IndexWriter.optimize(20) to make sure there are enough number >>> >> >> segmentation >>> >> >> > to allow each segment to be under 200Mb. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > Can someone let me know if I am using this right? Or any suggestion >>> on >>> >> >> how >>> >> >> > to tackle this would be helpful. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > Thanks, >>> >> >> > >>> >> >> > Trin >>> >> >> > >>> >> >> >>> >> >> --------------------------------------------------------------------- >>> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >>> >> >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org