[
https://issues.apache.org/jira/browse/SOLR-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661538#comment-16661538
]
Erick Erickson commented on SOLR-12259:
---------------------------------------
I was thinking about this on the way back from Activate. One of the issues
we'll have is the fact that it'd be a mess to support arbitrary upgrade paths
for all the reasons in LUCENE-7976. Doing "whatever it can" is so fraught.
THIS IS A STRAW MAN PROPOSAL. Feel free to shoot holes in it.....
In essence, this could be thought of as using custom merge policies do "do the
right thing", where the "right thing" varied (and will continue to vary going
forward).
Here's what I came up with as design goals:
> transform _all_ segments in an core.
> can upgrade collections/cores individually even if collections shared a
> configset
> extensible in future
> can deal with "safe" X+2 upgrades if there ever are any.
> should not require restarting Solr
> should not require special solrconfig.xml changes
> _may_ require enabling the new end-point. Possibly a new config API? Maybe
> require a config API call to enable/disable?
What I have in mind is a new request handler that
> locks the index for updates until done. I'm not horribly comfortable with
> this but it would circumvent a world of problems.
> applied a (possibly custom) merge policy that would implement what's desired
> _on all segments without merging._ This is essentially a "singleton merge"
> on each segment regardless of its state. We could, of course, skip segments
> that didn't require the transformation. This is somewhat along the lines of
> UninvertDocValuesMergePolicyFactory
> A prime candidate we _would_ supply would be upgrade to docValues fields.
Note that the merge policy in effect at the client would not be changed at all.
Say I was running TMP. This would come in completely around the end and not
change that at all. We'd probably have to supply a new merge policy to be used
by this end-point.
The reason I don't want to merge segments is that it would get weird having to
make the different merge policies do the right thing, NoMergePolicy is
particularly problematic ;)
The simple case here would require a full rewrite of all segments for each
transformation, which is a drawback. OTOH, until we have examples of multiple
transformations we want to happen, maybe we can go with it for now.
Conceivably this could be used for special-purpose upgraded with limited scope
that could do the X->X+2 upgrade. I have no concrete examples of what would be
safe and I _certainly_ don't want to distribute any such thing as part of Solr.
Having the mechanism in place could allow users to make their own (at their own
risk). Or call Uwe...
Comments?
> Robustly upgrade indexes
> ------------------------
>
> Key: SOLR-12259
> URL: https://issues.apache.org/jira/browse/SOLR-12259
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Priority: Major
>
> The general problem statement is that the current upgrade path is trappy and
> cumbersome. It would be a great help "in the field" to make the upgrade
> process less painful.
> Additionally one of the most common things users want to do is enable
> docValues, but currently they often have to re-index.
> Issues:
> 1> if I upgrade from 5x to 6x and then 7x, theres no guarantee that when I go
> to 7x all the segments have been rewritten in 6x format. Say I have a segment
> at max size that has no deletions. It'll never be rewritten until it has
> deleted docs. And perhaps 50% deleted docs currently.
> 2> IndexUpgraderTool explicitly does a forcemerge to 1 segment, which is bad.
> 3> in a large distributed system, running IndexUpgraderTool on all the nodes
> is cumbersome even if <2> is acceptable.
> 4> Users who realize specifying docValues on a field would be A Good Thing
> have to re-index. We have UninvertDocValuesMergePolicyFactory. Wouldn't it be
> nice to be able to have this done all at once without forceMerging to one
> segment.
> Proposal:
> Somehow avoid the above. Currently LUCENE-7976 is a start in that direction.
> It will make TMP respect max segments size so can avoid forceMerges that
> result in one segment. What it does _not_ do is rewrite segments with zero
> (or a small percentage) deleted documents.
> So it doesn't seem like a huge stretch to be able to specify to TMP the
> option to rewrite segments that have no deleted documents. Perhaps a new
> parameter to optimize?
> This would likely require another change to TMP or whatever.
> So upgrading to a new solr would look like
> 1> install the new Solr
> 2> execute
> "http://node:port/solr/collection_or_core/update?optimize=true&upgradeAllSegments=true"
> What's not clear to me is whether we'd require
> UninvertDocValuesMergePolicyFactory to be specified and wrap TMP or not.
> Anyway, let's discuss. I'll create yet another LUCENE JIRA for TMP do rewrite
> all segments that I'll link.
> I'll also link several other JIRAs in here, they're coalescing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]