[
https://issues.apache.org/jira/browse/LUCENE-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149881#comment-13149881
]
Hoss Man commented on LUCENE-3569:
----------------------------------
{quote}
I would add an instanceof check so that this returns IAE if you do this
IWconfig.setMaintenancePolicy(new OptimizingMaintenancePolicy()).
{quote}
Are you also going to hack IndexWriter.forceMerge to throw IAE if the user
specifies "1" ? Other then the name, how is this different exactly?
It doesn't seem like there has actually been any fundamental objection to the
main idea of this issue, or to renaming "MergePolicy" or "MaintenancePolicy" --
the main contention here seems to be the idea of an
"OptimizingMaintenancePolicy" in light of the entire "optimize sounds cool
there for is bad" meme. So can we agree that the specific example of
"OptimizingMaintenancePolicy" is a bad idea and that an instance like that
should probably have a name more explicit about it's purpose like
"AlwaysForceMergeToASingleSegmentMaintenancePolicy" (or better still:
"NeverExceedNSegmentsMaintenancePolicy" with a single constructor that requires
an "int maxSegments" property)
Personally i like the idea of simplifying the methods, and having an
abstraction like MergePolicy handle the subtleties of "merge down to N
segments" vs "expunge deletes (and maybe merge) but my two concerns (as someone
who doesn't understand the nuances of the impl under the IndexWriter covers)
are:
1) "Maintenance" of what? ... if it's just about the segments (but not just
merging, because maybe you have one that does delete expunging w/o merging)
then "SegmentManagementPolicy" might be better. it's not like this thing is
doing logical document maintenance, or cleaning up unused files on disk.
2) how exactly would the MP specified on the IWC interact with the MP passed
explicitly to the merge(MP) method? does the merge(MP) method completely
ignore/override the configured MP? ... that seems like something that could be
incredibly error prone. Would it be better to use a pattern where *some* MPs
have public methods for "forceMerge(int)" and "expungeDeletes()" and encourage
client code that wants programmatic control of this type of thing to keep a ref
to the IWC->MP they are using and call methods on it directly?
> Consolidate IndexWriter's optimize, maybeMerge and expungeDeletes under one
> merge(MP) method
> --------------------------------------------------------------------------------------------
>
> Key: LUCENE-3569
> URL: https://issues.apache.org/jira/browse/LUCENE-3569
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Reporter: Shai Erera
>
> Today, IndexWriter exposes 3 methods for 'cleaning up' / 'compacting' /
> 'optimizing' your index:
> * optimize() -- merges as much segments as possible (down to 1 segment), and
> is discouraged in many cases because of its performance implications.
> * maybeMerge() -- runs 'subtle' merges. Attempts to balance the index by not
> leaving too many segments, yet not merging large segments if unneeded.
> * expungeDeletes() -- cleans up deleted documents from segments and on the go
> merges them.
> * a default MP that can be set on IndexWriterConfig, for ongoing merges IW
> performs (i.e. as a result of flushing a new segment).
> These methods are confusing in several levels:
> * Their names are misleading, see LUCENE-3454.
> * Why does expungeDeletes need to merge segments?
> * Eventually, they really do what the MergePolicy decides that should be
> done. I.e., one could write an MP that always merges all segments, and
> therefore calling maybeMerge would not be so subtle anymore. On the other
> hand, one could write an MP that never merges large segments (we in fact have
> several of those), and therefore calling optimize(1) would not end up with
> one segment.
> So the proposal is to replace all these methods with a single one
> merge(MergePolicy) (more on the names later). MergePolicy will have only one
> method findSegmentsForMerge and the caller will be responsible to configure
> it in order to perform the needed merges. We will provide ready-to-use MPs:
> * LightMergePolicy -- for setting on IWC and doing the ongoing merges IW
> executes. This one will pick segments respecting various parameters such as
> mergeFactor, segmentSizes etc.
> * HeavyMergePolicy -- for doing the optimize()-style merges.
> * ExpungeDeletesMergePolicy -- for expunging deletes (my proposal is to drop
> segment merging from it, by default).
> Now about the names:
> * I think that it will be good, API-backcompat wise and in general, if we
> name that method doMaintenance (as expungeDeletes does not have to merge
> anything).
> * Instead of MergePolicy we call it MaintenancePolicy and similarly its
> single method findSegmentsForMaintenance, or getMaintenanceSpecification.
> * I called the MPs Light and Heavy just for the text, I think a better name
> should be found, but nothing comes up to mind now.
> It will allow us to use this on 3.x, by deprecating MP and all related
> methods.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]