[ 
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008382#comment-14008382
 ] 

Shawn Heisey edited comment on LUCENE-5705 at 5/25/14 5:24 PM:
---------------------------------------------------------------

I do see evidence in the infostream that I'm currently creating that merges are 
done out of order with preference to small merges.

{noformat}
IW 4 [Sun May 25 09:43:57 MDT 2014; Lucene Merge Thread #11]: merge time 47224 
msec for 563274 docs
IW 4 [Sun May 25 09:52:39 MDT 2014; Lucene Merge Thread #13]: merge time 8761 
msec for 68640 docs
IW 4 [Sun May 25 09:53:44 MDT 2014; Lucene Merge Thread #12]: merge time 266527 
msec for 4227876 docs
{noformat}

When I was having the problem I described (which was admittedly a long time 
ago, Solr 1.4.0 most likely), I was using the old default, 
LogByteSizeMergePolicy.  Would that have been using CMS, or a different 
scheduler?  When no scheduler is configured in Solr 4.x, does it choose CMS?  I 
would think that it does.

I have seen others have this problem very recently on the mailing list and IRC. 
 I'm reasonably sure that at least one of them was on a 4.x release.  Bumping 
up maxMergeCount has fixed it for those people, just like it did for me.  The 
evidence that's right before my eyes would suggest that nobody should still be 
having any problems like this, assuming that what they are getting by default 
is the ConcurrentMergeScheduler.



was (Author: elyograg):
I do see evidence in the infostream that I'm currently creating that merges are 
done out of order with preference to small merges.

{noformat}
IW 4 [Sun May 25 09:43:57 MDT 2014; Lucene Merge Thread #11]: merge time 47224 
msec for 563274 docs
IW 4 [Sun May 25 09:52:39 MDT 2014; Lucene Merge Thread #13]: merge time 8761 
msec for 68640 docs
IW 4 [Sun May 25 09:53:44 MDT 2014; Lucene Merge Thread #12]: merge time 266527 
msec for 4227876 docs
{noformat}

When I was having the problem I described (which was admittedly a long time 
ago, Solr 1.4.0 most likely), I was using the old default, 
LogByteSizeMergePolicy.  Would that have been using CMS, or a different 
scheduler?  When no scheduler is configured in Solr 4.x, does it choose CMS?  I 
would think that it does.

I have seen others have this problem very recently on the mailing list and IRC. 
 I'm reasonably sure that at least one of them was on a 4.x release.  Bumping 
up maxMergeCount has fixed it for those people, just like it did for me.


> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This 
> causes problems for Solr's dataimport handler when very large imports are 
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, 
> the add/update thread will stop for several minutes while the largest merge 
> finishes.  In the meantime, the dataimporter JDBC connection to the database 
> will time out, and when the add/update thread resumes, the import will fail 
> because the ResultSet throws an exception.  Setting maxMergeCount to 6 
> eliminates this issue for virtually any size import -- although it is 
> theoretically possible to have that many simultaneous merge tiers, I've never 
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate 
> for most installations), I cannot think of a really good reason that the 
> default for maxMergeCount should be so low.  If someone does need to strictly 
> control the number of threads that get created, they can reduce the number.  
> Perhaps someone with more experience knows of a really good reason to make 
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid 
> bikeshedding.  I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to