I did look at it, but I didn't find that it answers this particular need (ending with a segment no bigger than X). Perhaps by tweaking several parameters (e.g. maxLarge/SmallNumSegments + maxMergeSizeMB) I can achieve something, but it's not very clear what is the right combination.
Which is related to one of the points -- is it not more intuitive for an app to set this threshold (if it needs any thresholds), than tweaking all of those parameters? If so, then we only need two thresholds (size + mergeFactor), and we can reuse BalancedMP's findBalancedMerges logic (perhaps w/ some adaptations) to derive a merge plan. Shai On Mon, May 2, 2011 at 4:42 PM, Earwin Burrfoot <ear...@gmail.com> wrote: > Have you checked BalancedSegmentMergePolicy? It has some more knobs :) > > On Mon, May 2, 2011 at 17:03, Shai Erera <ser...@gmail.com> wrote: > > Hi > > > > Today, LogMP allows you to set different thresholds for segments sizes, > > thereby allowing you to control the largest segment that will be > > considered for merge + the largest segment your index will hold (=~ > > threshold * mergeFactor). > > > > So, if you want to end up w/ say 20GB segments, you can set > > maxMergeMB(ForOptimize) to 2GB and mergeFactor=10. > > > > However, this often does not achieve your desired goal -- if the index > > contains 5 and 7 GB segments, they will never be merged b/c they are > > bigger than the threshold. I am willing to spend the CPU and IO resources > > to end up w/ 20 GB segments, whether I'm merging 10 segments together or > > only 2. After I reach a 20GB segment, it can rest peacefully, at least > > until I increase the threshold. > > > > So I wonder, first, if this threshold (i.e., largest segment size you > > would like to end up with) is more natural to set than thee current > > thresholds, > > from the application level? I.e., wouldn't it be a simpler threshold to > set > > instead of doing weird calculus that depend on maxMergeMB(ForOptimize) > > and mergeFactor? > > > > Second, should this be an addition to LogMP, or a different > > type of MP. One that adheres to only those two factors (perhaps the > > segSize threshold should be allowed to set differently for optimize and > > regular merges). It can pick segments for merge such that it maximizes > > the result segment size (i.e., don't necessarily merge in sequential > > order), but not more than mergeFactor. > > > > I guess, if we think that maxResultSegmentSizeMB is more intuitive than > > the current thresholds, application-wise, then this change should go > > into LogMP. Otherwise, it feels like a different MP is needed, because > > LogMP is already complicated and another threshold would confuse things. > > > > What do you think of this? Am I trying to optimize too much? :) > > > > Shai > > > > > > > > -- > Kirill Zakharenko/Кирилл Захаренко > E-Mail/Jabber: ear...@gmail.com > Phone: +7 (495) 683-567-4 > ICQ: 104465785 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >