[
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520881
]
Steven Parkes commented on LUCENE-847:
--------------------------------------
my feeling is we should not deprecate
setUseCompoundFile, setMergeFactor, setMaxMergeDocs
I understood that you didn't want to deprecate them in IndexWriter. I wasn't
sure that you meant that they should be added to the MergePolicy interface? If
you do, everything makes sense. Otherwise, it sounds like there's still a cast
in there and I'm not sure about that.
I think IndexWriter should enforce it? Ie no merge policy should be
allowed to leave segments in other dirs (= at inconsistent index) at
point of commit.
I think it's just about code location: since a merge policy might want to
factor into it's algorithm the directories used, it needs the info and it will
presumably sometimes do it. Presumably you could provide code in
MergePolicyBase so the merges could decide when but wouldn't have to write the
copy loop. If you put the code in IndexWriter too, it sounds duplicated, again
presuming sometimes a policy might want to do it itself.
I like that idea :) It fits well w/ the stateless API. Ie, merge
policy returns all possible merges and "someone above" takes care of
scheduling them.
So it returns a vector of specs?
That's essentially what the CMP as an above/below wrapper does. I can see that
above/below is strange enough to be less clever (I wasn't trying to be so much
clever as backwards compatible) and more insane.
Sane is good.
Hmm. This means each merge policy must know whether it's talking to
CMP or IndexWriter underneith? With the stateless approach this
wouldn't happen.
Well, I wouldn't so much say it has to know. All it cares is what merge
returns. Doesn't have to know who returned it or why.
The only real difference between this and the "generate a vector of merges" is
that in the merge policy can take advantage immediately of merge results in the
serial case where if you're generating a vector of merges, it can't know.
Of course, I guess in that case, if IndexWriter gets a vector of merges, it can
always take the lowest and ignore the rest, calling the merge policy again
incase it wants to request a different set. Then you only have the excess
computation for merges you never really considered.
Oh I see... that's kind of sneaky (planning on using exceptions to
abort a merge requested by the policy).
There's always going to be the chance of an exception to a merge. I'm pretty
sure of that. But you're right, if the merge policy isn't in the control path,
it would never see them. They'll be there, but it's out of the path.
But since you're already doing the work
to allow a merge to run in the BG without blocking adding of docs,
flushing, etc, wouldn't this come nearly for free?
I haven't looked at this.
Well, eg flush() now synchronizes on IndexWriter
Yeah, and making it not is less than straightforward. I've looked at his code a
fair amount, experimented with different ideas, but hadn't gotten all the way
to a working model.
You can look at locking segmentInfos but there are many places that
segmentInfos is iterated over that would require locks if the lock on IW wasn't
sufficient to guarantee that the iteration was safe.
I did look at that early on, so maybe my understanding was still too lacking
and it's more feasible than I was thinking ...
> Factor merge policy out of IndexWriter
> --------------------------------------
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Steven Parkes
> Assignee: Steven Parkes
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt,
> LUCENE-847.patch.txt, LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable,
> making it possible for apps to choose a custom merge policy and for easier
> experimenting with merge policy variants.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]