Right... It all blends together, I need an NLP analyzer for my emails On Wed, Jan 13, 2010 at 3:05 PM, Otis Gospodnetic <[email protected]> wrote: > I think Jason meant "15-20GB segments"? > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > ________________________________ > From: Jason Rutherglen <[email protected]> > To: [email protected] > Sent: Wed, January 13, 2010 5:54:38 PM > Subject: Re: Max Segmentation Size when Optimizing Index > > Yes... You could hack LogMergePolicy to do something else. > > I use optimise(numsegments:5) regularly on 80GB indexes, that if > optimized to 1 segment, would thrash the IO excessively. This works > fine because 15-20GB indexes are plenty large and fast. > > On Wed, Jan 13, 2010 at 2:44 PM, Trin Chavalittumrong <[email protected]> > wrote: >> Seems like optimize() only cares about final number of segments rather than >> the size of the segment. Is it so? >> >> On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen < >> [email protected]> wrote: >> >>> There's a different method in LogMergePolicy that performs the >>> optimize... Right, so normal merging uses the findMerges method, then >>> there's a findMergeOptimize (method names could be inaccurate). >>> >>> On Wed, Jan 13, 2010 at 2:29 PM, Trin Chavalittumrong <[email protected]> >>> wrote: >>> > Do you mean MergePolicy is only used during index time and will be >>> ignored >>> > by by the Optimize() process? >>> > >>> > >>> > On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen < >>> > [email protected]> wrote: >>> > >>> >> Oh ok, you're asking about optimizing... I think that's a different >>> >> algorithm inside LogMergePolicy. I think it ignores the maxMergeMB >>> >> param. >>> >> >>> >> On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong <[email protected] >>> > >>> >> wrote: >>> >> > Thanks, Jason. >>> >> > >>> >> > Is my understanding correct that >>> >> LogByteSizeMergePolicy.setMaxMergeMB(100) >>> >> > will prevent >>> >> > merging of two segments that is larger than 100 Mb each at the >>> optimizing >>> >> > time? >>> >> > >>> >> > If so, why do think would I still see segment that is larger than 200 >>> MB? >>> >> > >>> >> > >>> >> > >>> >> > On Wed, Jan 13, 2010 at 1:43 PM, Jason Rutherglen < >>> >> > [email protected]> wrote: >>> >> > >>> >> >> Hi Trin, >>> >> >> >>> >> >> There was recently a discussion about this, the max size is >>> >> >> for the before merge segments, rather than the resultant merged >>> >> >> segment (if that makes sense). It'd be great if we had a merge >>> >> >> policy that limited the resultant merged segment, though that'd >>> >> >> by a rough approximation at best. >>> >> >> >>> >> >> Jason >>> >> >> >>> >> >> On Wed, Jan 13, 2010 at 1:36 PM, Trin Chavalittumrong < >>> [email protected] >>> >> > >>> >> >> wrote: >>> >> >> > Hi, >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > I am trying to optimize the index which would merge different >>> segment >>> >> >> > together. Let say the index folder is 1Gb in total, I need each >>> >> >> segmentation >>> >> >> > to be no larger than 200Mb. I tried to use *LogByteSizeMergePolicy >>> >> *and >>> >> >> > setMaxMergeMB(100) to ensure no segment after merging would be >>> 200Mb. >>> >> >> > However, I still see segment that are larger than 200Mb. I did call >>> >> >> > IndexWriter.optimize(20) to make sure there are enough number >>> >> >> segmentation >>> >> >> > to allow each segment to be under 200Mb. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > Can someone let me know if I am using this right? Or any suggestion >>> on >>> >> >> how >>> >> >> > to tackle this would be helpful. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > Thanks, >>> >> >> > >>> >> >> > Trin >>> >> >> > >>> >> >> >>> >> >> --------------------------------------------------------------------- >>> >> >> To unsubscribe, e-mail: [email protected] >>> >> >> For additional commands, e-mail: [email protected] >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: [email protected] >>> >> For additional commands, e-mail: [email protected] >>> >> >>> >> >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected]
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
