[
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008677#comment-14008677
]
Shawn Heisey commented on LUCENE-5705:
--------------------------------------
bq. Doing all merging in the end is somewhat dangerous; you should only do it
if you know you will do no searching on the index until the merging has
completed.
I actually can guarantee this when I do a full rebuild. For every one of my
shards, I have a build core and a live core. Full rebuilds happen on the build
cores, and I can be sure that no searching will take place until after that
import is done and the SolrJ program swaps it with the live core. If It's
possible in Solr to enable and disable merging on the fly for each core, then
that might be a viable path. I would need to change my post-import processes a
little bit -- I run a delete process against the new index, and the current
delete process does a query to check for document presence first. I'd have to
add a boolean option to the method so it would be able to do the deletes
blindly.
I don't even open a new searcher until the full import is done. I don't let
DIH do the commit, I handle that. I do have the realtime get handler enabled
(which does open a new searcher on every autoCommit), but that almost never
actually sees requests. On the build cores, I think I can safely say that it
would actually never happen.
bq. (Hmm: is Solr using multiple indexing threads in your case...?)
I doubt it. In 1.x and 3.x I did have the DIH threads option enabled, but in
4.x the option was removed. Even in my SolrJ program, there is only ever one
thread making requests to any core. Most requests end up on the hot shard,
which is why I *have* a hot shard.
> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
> Key: LUCENE-5705
> URL: https://issues.apache.org/jira/browse/LUCENE-5705
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Affects Versions: 4.8
> Reporter: Shawn Heisey
> Assignee: Shawn Heisey
> Priority: Minor
> Fix For: 4.9
>
> Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch,
> infostream-s0build-shard.zip
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2. This
> causes problems for Solr's dataimport handler when very large imports are
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time,
> the add/update thread will stop for several minutes while the largest merge
> finishes. In the meantime, the dataimporter JDBC connection to the database
> will time out, and when the add/update thread resumes, the import will fail
> because the ResultSet throws an exception. Setting maxMergeCount to 6
> eliminates this issue for virtually any size import -- although it is
> theoretically possible to have that many simultaneous merge tiers, I've never
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate
> for most installations), I cannot think of a really good reason that the
> default for maxMergeCount should be so low. If someone does need to strictly
> control the number of threads that get created, they can reduce the number.
> Perhaps someone with more experience knows of a really good reason to make
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid
> bikeshedding. I don't think it should be Integer.MAX_VALUE.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]