[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733771#action_12733771
 ] 

Michael McCandless commented on LUCENE-1076:
--------------------------------------------

Well... one option might be "the newly merged segment always replaces the 
leftmost segment".  Another option could be to leave it undefined, ie IW makes 
no commitment as to where it will place the newly merged segment so you should 
not rely on it.  Presumably apps that rely on Lucene's internal doc ID to "mean 
something" would not use a merge policy that selects non-contiguous segments.

Unfortunately, with the current index format, there's a big cost to allowing 
non-contiguous segments to be merged: it means the doc stores will always be 
merged.  Whereas, today, if you build up a large new index, no merging is done 
for the doc stores.

If we someday allowed a single segment to reference multiple original doc 
stores (logically concatenating [possibly many] slices out of them), which 
would presumably be a perf hit when retrieving the stored doc or term vectors, 
then this cost would go away.

> Allow MergePolicy to select non-contiguous merges
> -------------------------------------------------
>
>                 Key: LUCENE-1076
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1076
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1076.patch
>
>
> I started work on this but with LUCENE-1044 I won't make much progress
> on it for a while, so I want to checkpoint my current state/patch.
> For backwards compatibility we must leave the default MergePolicy as
> selecting contiguous merges.  This is necessary because some
> applications rely on "temporal monotonicity" of doc IDs, which means
> even though merges can re-number documents, the renumbering will
> always reflect the order in which the documents were added to the
> index.
> Still, for those apps that do not rely on this, we should offer a
> MergePolicy that is free to select the best merges regardless of
> whether they are continuguous.  This requires fixing IndexWriter to
> accept such a merge, and, fixing LogMergePolicy to optionally allow
> it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to