[ 
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008402#comment-14008402
 ] 

Shawn Heisey commented on LUCENE-5705:
--------------------------------------

bq. disabling merges while you import data will improve latency in that respect.

If I had a Lucene program, turning off merging is likely a very simple thing to 
do.  With Solr, is that possible to change without filesystem (solrconfig.xml) 
modification, and without restarting Solr or reloading cores?  If it is, I 
could do an optimize as the last step of a full rebuild.  The lack of merging 
during the rebuild, followed by an optimize at the end, would probably be 
faster than what happens now.  If I have to change the config and 
restart/reload, then this is not something I can implement -- anyone who has 
access can currently kick off a rebuild simply by changing an entry in a MySQL 
database table.  The SolrJ program notices this and starts all the the 
dataimport handlers in the build cores.  Managing filesystem changes from a 
Java program across multiple machines is not something I want to try.  If I 
switched to SolrCloud, config changes are relatively easy using the zkCli API, 
but switching to SolrCloud would actually lead to a loss of functionality in my 
index.

Once the index is built, my SolrJ program does a full optimize on one cold 
shard per day, so it takes six days for the whole index.  The hot shard is 
optimized once an hour -- only takes about 30 seconds.


> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This 
> causes problems for Solr's dataimport handler when very large imports are 
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, 
> the add/update thread will stop for several minutes while the largest merge 
> finishes.  In the meantime, the dataimporter JDBC connection to the database 
> will time out, and when the add/update thread resumes, the import will fail 
> because the ResultSet throws an exception.  Setting maxMergeCount to 6 
> eliminates this issue for virtually any size import -- although it is 
> theoretically possible to have that many simultaneous merge tiers, I've never 
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate 
> for most installations), I cannot think of a really good reason that the 
> default for maxMergeCount should be so low.  If someone does need to strictly 
> control the number of threads that get created, they can reduce the number.  
> Perhaps someone with more experience knows of a really good reason to make 
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid 
> bikeshedding.  I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to