[
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008402#comment-14008402
]
Shawn Heisey commented on LUCENE-5705:
--------------------------------------
bq. disabling merges while you import data will improve latency in that respect.
If I had a Lucene program, turning off merging is likely a very simple thing to
do. With Solr, is that possible to change without filesystem (solrconfig.xml)
modification, and without restarting Solr or reloading cores? If it is, I
could do an optimize as the last step of a full rebuild. The lack of merging
during the rebuild, followed by an optimize at the end, would probably be
faster than what happens now. If I have to change the config and
restart/reload, then this is not something I can implement -- anyone who has
access can currently kick off a rebuild simply by changing an entry in a MySQL
database table. The SolrJ program notices this and starts all the the
dataimport handlers in the build cores. Managing filesystem changes from a
Java program across multiple machines is not something I want to try. If I
switched to SolrCloud, config changes are relatively easy using the zkCli API,
but switching to SolrCloud would actually lead to a loss of functionality in my
index.
Once the index is built, my SolrJ program does a full optimize on one cold
shard per day, so it takes six days for the whole index. The hot shard is
optimized once an hour -- only takes about 30 seconds.
> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
> Key: LUCENE-5705
> URL: https://issues.apache.org/jira/browse/LUCENE-5705
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Affects Versions: 4.8
> Reporter: Shawn Heisey
> Assignee: Shawn Heisey
> Priority: Minor
> Fix For: 4.9
>
> Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2. This
> causes problems for Solr's dataimport handler when very large imports are
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time,
> the add/update thread will stop for several minutes while the largest merge
> finishes. In the meantime, the dataimporter JDBC connection to the database
> will time out, and when the add/update thread resumes, the import will fail
> because the ResultSet throws an exception. Setting maxMergeCount to 6
> eliminates this issue for virtually any size import -- although it is
> theoretically possible to have that many simultaneous merge tiers, I've never
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate
> for most installations), I cannot think of a really good reason that the
> default for maxMergeCount should be so low. If someone does need to strictly
> control the number of threads that get created, they can reduce the number.
> Perhaps someone with more experience knows of a really good reason to make
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid
> bikeshedding. I don't think it should be Integer.MAX_VALUE.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]