[
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008352#comment-14008352
]
Shawn Heisey commented on LUCENE-5705:
--------------------------------------
bq. Wait, this comment should also be in trunk?
It very likely is in trunk. I was just trying to be precise about where I
actually looked, in case trunk says something slightly different and others
happen to be looking too.
bq. Hmm that should not have been the case; if you turn on IW infoStream, CMS
tells you when it's pausing a large merge
I've never actually done this. I will turn on infostream and start a new full
rebuild of the entire 96 million document index. Those infostreams will be
available after several hours, and they ought to be very large.
I just know that when importing millions of records from a database, if you
don't increase maxMergeCount, the incoming thread will stall long enough for
JDBC to kill the connection. If smaller merges were really running first, then
it seems like we would never be over the threshold long enough for the
connection to die -- my smallest merges would probably complete in less than a
second, and the next size up would only take a few seconds. When I first
noticed the problem, I clocked one merge-caused indexing pause at over eight
minutes.
> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
> Key: LUCENE-5705
> URL: https://issues.apache.org/jira/browse/LUCENE-5705
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Affects Versions: 4.8
> Reporter: Shawn Heisey
> Assignee: Shawn Heisey
> Priority: Minor
> Fix For: 4.9
>
> Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2. This
> causes problems for Solr's dataimport handler when very large imports are
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time,
> the add/update thread will stop for several minutes while the largest merge
> finishes. In the meantime, the dataimporter JDBC connection to the database
> will time out, and when the add/update thread resumes, the import will fail
> because the ResultSet throws an exception. Setting maxMergeCount to 6
> eliminates this issue for virtually any size import -- although it is
> theoretically possible to have that many simultaneous merge tiers, I've never
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate
> for most installations), I cannot think of a really good reason that the
> default for maxMergeCount should be so low. If someone does need to strictly
> control the number of threads that get created, they can reduce the number.
> Perhaps someone with more experience knows of a really good reason to make
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid
> bikeshedding. I don't think it should be Integer.MAX_VALUE.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]