[ 
https://issues.apache.org/jira/browse/SOLR-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669525#comment-16669525
 ] 

Erick Erickson commented on SOLR-12259:
---------------------------------------

I'm having a design problem here. At the Solr level, we have 
MergePolicyFactory, while at the Lucene level we have MergePolicy. All well and 
good.

However, I need to mix them up a bit and it's *A Bad Idea* to have Lucene be 
aware of anything having to do with Solr.

So I'm coding up a new MergePolicy and associated Factory. I can assign the 
MergePolicy to an IndexWriter and call writer.forceMerge and the writer calls 
findForcedMerges in my new policy. At that point I need to make decisions on 
which if any of the segments passed to findForcedMerges needs to be 
merged/rewritten. And some of the information I need is only available at the 
Solr level, in this case I need to compare the schema definitions against the 
info in the segments.

Ideally, I want to do something in findForcedMerges like:
{code:java}
for (each segment in segmentInfos) {      
   if (Solr thinks it should be rewritten)  { put it in the real merge list }
}
{code}
What I'm having trouble with is figuring out how to reach back out into Solr 
and evaluate the "if (Solr thinks it should be rewritten)" part without doing 
violence to Lucene.

It seems like a function pointer that I could set in my MergePolicy could do 
the trick, but that seems complicated. And even if I could, is that even 
acceptable architecturally since it's code in Solr? Albeit, there's no 
dependency on Solr here, a Java function in anybody's calling code could set 
it. It would take a seginfo and return true or false. Oh for the old C days 
when a function pointer was just an int....

Or is the right thing to do just put any new MergePolicy in Solr where it _can_ 
be aware of other "Solr stuff"?

Any comments [~mikemccand] [~rcmuir] [~jpountz] [~romseygeek] ?

> Robustly upgrade indexes
> ------------------------
>
>                 Key: SOLR-12259
>                 URL: https://issues.apache.org/jira/browse/SOLR-12259
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> The general problem statement is that the current upgrade path is trappy and 
> cumbersome.  It would be a great help "in the field" to make the upgrade 
> process less painful.
> Additionally one of the most common things users want to do is enable 
> docValues, but currently they often have to re-index.
> Issues:
> 1> if I upgrade from 5x to 6x and then 7x, theres no guarantee that when I go 
> to 7x all the segments have been rewritten in 6x format. Say I have a segment 
> at max size that has no deletions. It'll never be rewritten until it has 
> deleted docs. And perhaps 50% deleted docs currently.
> 2> IndexUpgraderTool explicitly does a forcemerge to 1 segment, which is bad.
> 3> in a large distributed system, running IndexUpgraderTool on all the nodes 
> is cumbersome even if <2> is acceptable.
> 4> Users who realize specifying docValues on a field would be A Good Thing 
> have to re-index. We have UninvertDocValuesMergePolicyFactory. Wouldn't it be 
> nice to be able to have this done all at once without forceMerging to one 
> segment.
> Proposal:
> Somehow avoid the above. Currently LUCENE-7976 is a start in that direction. 
> It will make TMP respect max segments size so can avoid forceMerges that 
> result in one segment. What it does _not_ do is rewrite segments with zero 
> (or a small percentage) deleted documents.
> So it  doesn't seem like a huge stretch to be able to specify to TMP the 
> option to rewrite segments that have no deleted documents. Perhaps a new 
> parameter to optimize?
> This would likely require another change to TMP or whatever.
> So upgrading to a new solr would look like
> 1> install the new Solr
> 2> execute 
> "http://node:port/solr/collection_or_core/update?optimize=true&upgradeAllSegments=true";
> What's not clear to me is whether we'd require 
> UninvertDocValuesMergePolicyFactory to be specified and wrap TMP or not.
> Anyway, let's discuss. I'll create yet another LUCENE JIRA for TMP do rewrite 
> all segments that I'll link.
> I'll also link several other JIRAs in here, they're coalescing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to