[ 
https://issues.apache.org/jira/browse/LUCENE-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150405#comment-13150405
 ] 

Michael McCandless commented on LUCENE-3569:
--------------------------------------------

For natural merges I think the existing MergePolicy makes sense: it's
embedded into IW (IWC) and is invoked whenever there is a change to
the segments (eg, new segment flushed).

But for forced merges (either forceMerge or expungeDeletes)... I don't
think we need a new MergePolicy-like class?  Can't this "outside
logic" simply invoke registerMerge() directly on the incoming IW?

So eg in contrib/misc (say), we'd add a new IndexUtils class (or
something); it has a static method "expungeDeletes", that takes an IW
instance.  When the app calls that method, it inspects the IW's
segments, chooses its merges, and registers them.

Just like a MergePolicy, the method would have to check which merges
are already running/registered (IW.getMergingSegments) and "work
around" them.  EG, if there are 7 segments with deletions, you check
and see that 4 of them are already merging / scheduled for merge, so
you know you only have to merge the other 3.

                
> Consolidate IndexWriter's optimize, maybeMerge and expungeDeletes under one 
> merge(MP) method
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3569
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3569
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shai Erera
>
> Today, IndexWriter exposes 3 methods for 'cleaning up' / 'compacting' / 
> 'optimizing' your index:
> * optimize() -- merges as much segments as possible (down to 1 segment), and 
> is discouraged in many cases because of its performance implications.
> * maybeMerge() -- runs 'subtle' merges. Attempts to balance the index by not 
> leaving too many segments, yet not merging large segments if unneeded.
> * expungeDeletes() -- cleans up deleted documents from segments and on the go 
> merges them.
> * a default MP that can be set on IndexWriterConfig, for ongoing merges IW 
> performs (i.e. as a result of flushing a new segment).
> These methods are confusing in several levels:
> * Their names are misleading, see LUCENE-3454.
> * Why does expungeDeletes need to merge segments?
> * Eventually, they really do what the MergePolicy decides that should be 
> done. I.e., one could write an MP that always merges all segments, and 
> therefore calling maybeMerge would not be so subtle anymore. On the other 
> hand, one could write an MP that never merges large segments (we in fact have 
> several of those), and therefore calling optimize(1) would not end up with 
> one segment.
> So the proposal is to replace all these methods with a single one 
> merge(MergePolicy) (more on the names later). MergePolicy will have only one 
> method findSegmentsForMerge and the caller will be responsible to configure 
> it in order to perform the needed merges. We will provide ready-to-use MPs:
> * LightMergePolicy -- for setting on IWC and doing the ongoing merges IW 
> executes. This one will pick segments respecting various parameters such as 
> mergeFactor, segmentSizes etc.
> * HeavyMergePolicy -- for doing the optimize()-style merges.
> * ExpungeDeletesMergePolicy -- for expunging deletes (my proposal is to drop 
> segment merging from it, by default).
> Now about the names:
> * I think that it will be good, API-backcompat wise and in general, if we 
> name that method doMaintenance (as expungeDeletes does not have to merge 
> anything).
> * Instead of MergePolicy we call it MaintenancePolicy and similarly its 
> single method findSegmentsForMaintenance, or getMaintenanceSpecification.
> * I called the MPs Light and Heavy just for the text, I think a better name 
> should be found, but nothing comes up to mind now.
> It will allow us to use this on 3.x, by deprecating MP and all related 
> methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to