[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

Steven Parkes (JIRA) Sat, 18 Aug 2007 11:30:52 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520881
 ]


Steven Parkes commented on LUCENE-847:
--------------------------------------

        my feeling is we should not deprecate
        setUseCompoundFile, setMergeFactor, setMaxMergeDocs

I understood that you didn't want to deprecate them in IndexWriter. I wasn't 
sure that you meant that they should be added to the MergePolicy interface? If 
you do, everything makes sense. Otherwise, it sounds like there's still a cast 
in there and I'm not sure about that.

        I think IndexWriter should enforce it?  Ie no merge policy should be
        allowed to leave segments in other dirs (= at inconsistent index) at
        point of commit.

I think it's just about code location: since a merge policy might want to 
factor into it's algorithm the directories used, it needs the info and it will 
presumably sometimes do it. Presumably you could provide code in 
MergePolicyBase so the merges could decide when but wouldn't have to write the 
copy loop. If you put the code in IndexWriter too, it sounds duplicated, again 
presuming sometimes a policy might want to do it itself. 

        I like that idea :)  It fits well w/ the stateless API.  Ie, merge
        policy returns all possible merges and "someone above" takes care of
        scheduling them.

So it returns a vector of specs?

That's essentially what the CMP as an above/below wrapper does. I can see that 
above/below is strange enough to be less clever (I wasn't trying to be so much 
clever as backwards compatible) and more insane.

Sane is good.

        Hmm.  This means each merge policy must know whether it's talking to
        CMP or IndexWriter underneith?  With the stateless approach this
        wouldn't happen.

Well, I wouldn't so much say it has to know. All it cares is what merge 
returns. Doesn't have to know who returned it or why.

The only real difference between this and the "generate a vector of merges" is 
that in the merge policy can take advantage immediately of merge results in the 
serial case where if you're generating a vector of merges, it can't know.

Of course, I guess in that case, if IndexWriter gets a vector of merges, it can 
always take the lowest and ignore the rest, calling the merge policy again 
incase it wants to request a different set. Then you only have the excess 
computation for merges you never really considered.

        Oh I see...  that's kind of sneaky (planning on using exceptions to
        abort a merge requested by the policy).

There's always going to be the chance of an exception to a merge. I'm pretty 
sure of that. But you're right, if the merge policy isn't in the control path, 
it would never see them. They'll be there, but it's out of the path.

        But since you're already doing the work
        to allow a merge to run in the BG without blocking adding of docs,
        flushing, etc, wouldn't this come nearly for free?

I haven't looked at this.

        Well, eg flush() now synchronizes on IndexWriter

Yeah, and making it not is less than straightforward. I've looked at his code a 
fair amount, experimented with different ideas, but hadn't gotten all the way 
to a working model.

You can look at locking segmentInfos but there are many places that 
segmentInfos is iterated over that would require locks if the lock on IW wasn't 
sufficient to guarantee that the iteration was safe.

I did look at that early on, so maybe my understanding was still too lacking 
and it's more feasible than I was thinking ...

> Factor merge policy out of IndexWriter
> --------------------------------------
>
>                 Key: LUCENE-847
>                 URL: https://issues.apache.org/jira/browse/LUCENE-847
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

Reply via email to