[
https://issues.apache.org/jira/browse/SOLR-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672469#comment-16672469
]
Erick Erickson commented on SOLR-12259:
---------------------------------------
Preliminary patch in case anyone's interested in looking at the approach. I
have to leave tomorrow and won't be back at this until next week.
> I consider this a PoC, meaning it demonstrates that the approach can work in
> a limited environment. If nobody shoots holes in the general idea it then
> needs a _lot_ of polish
> There are a ton of nocommit's
> I'm thinking that what we really want here is a new endpoint rather than
> spoofing an update command. Before doing that I wanted to see whether the
> idea would work at all.
> I've hardcoded a values that have to change, for example the hard 5G limit
> to merged segments.
The main bits are in DirectUpdateHandler2.rewriteSegments where the code:
> intercepts the update command that looks like:
> "....core/update?commit=true&rewriteWithPolicyFactory=org.apache.solr.index.RewriteWithDocValuesMergePolicyFactory"
> Substitutes the specified factory in liveConfig
> forceMerges while respecting max segment size
> sets the old factory back in liveConfig
{code:java}
private void rewriteSegments(IndexWriter writer, String policyFactory) throws
IOException {
LiveIndexWriterConfig liveConfig = writer.getConfig();
MergePolicy oldPolicy = liveConfig.getMergePolicy();
try {
Class<?> factory =
core.getResourceLoader().getClassLoader().loadClass(policyFactory);
Constructor constructor =
factory.getDeclaredConstructor(SolrResourceLoader.class,
MergePolicyFactoryArgs.class, IndexSchema.class);
MergePolicyFactory newFactory =
(MergePolicyFactory)constructor.newInstance(core.getResourceLoader(), new
MergePolicyFactoryArgs(), core.getLatestSchema());
liveConfig.setMergePolicy(newFactory.getMergePolicy());
writer.forceMerge(Integer.MAX_VALUE, true); // nocommit MAX_VALUE? Wait?
Really?
} catch (IllegalAccessException | InstantiationException |
ClassNotFoundException | NoSuchMethodException |
InvocationTargetException e) {
String msg = String.format(Locale.ROOT, "Could not instantiate %s
MergePolicyFactory", policyFactory);
log.error(msg);
throw new RuntimeException(msg, e);
} finally {
liveConfig.setMergePolicy(oldPolicy);
}
}
{code}
Here's what I guarantee:
> it compiles
> it runs TestTieredMergePolicy and the (modified)
> UninvertDocValuesMergePolicyTest that exercises this process
Here's what I don't guarantee:
> I didn't break a bunch of other tests
> precommit passes
> it works when active indexing is going on
> About a zillion other things that occurred to me, you'll see lots of
> questions in nocommits
The biggest questions I have at this point is what happens if indexing and
background merging is all going on at the same time this is running.....
All comments welcome, especially around gotcha's people already know about.
> Robustly upgrade indexes
> ------------------------
>
> Key: SOLR-12259
> URL: https://issues.apache.org/jira/browse/SOLR-12259
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Priority: Major
> Attachments: SOLR-12259.patch
>
>
> The general problem statement is that the current upgrade path is trappy and
> cumbersome. It would be a great help "in the field" to make the upgrade
> process less painful.
> Additionally one of the most common things users want to do is enable
> docValues, but currently they often have to re-index.
> Issues:
> 1> if I upgrade from 5x to 6x and then 7x, theres no guarantee that when I go
> to 7x all the segments have been rewritten in 6x format. Say I have a segment
> at max size that has no deletions. It'll never be rewritten until it has
> deleted docs. And perhaps 50% deleted docs currently.
> 2> IndexUpgraderTool explicitly does a forcemerge to 1 segment, which is bad.
> 3> in a large distributed system, running IndexUpgraderTool on all the nodes
> is cumbersome even if <2> is acceptable.
> 4> Users who realize specifying docValues on a field would be A Good Thing
> have to re-index. We have UninvertDocValuesMergePolicyFactory. Wouldn't it be
> nice to be able to have this done all at once without forceMerging to one
> segment.
> Proposal:
> Somehow avoid the above. Currently LUCENE-7976 is a start in that direction.
> It will make TMP respect max segments size so can avoid forceMerges that
> result in one segment. What it does _not_ do is rewrite segments with zero
> (or a small percentage) deleted documents.
> So it doesn't seem like a huge stretch to be able to specify to TMP the
> option to rewrite segments that have no deleted documents. Perhaps a new
> parameter to optimize?
> This would likely require another change to TMP or whatever.
> So upgrading to a new solr would look like
> 1> install the new Solr
> 2> execute
> "http://node:port/solr/collection_or_core/update?optimize=true&upgradeAllSegments=true"
> What's not clear to me is whether we'd require
> UninvertDocValuesMergePolicyFactory to be specified and wrap TMP or not.
> Anyway, let's discuss. I'll create yet another LUCENE JIRA for TMP do rewrite
> all segments that I'll link.
> I'll also link several other JIRAs in here, they're coalescing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]