[ 
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008433#comment-14008433
 ] 

Shawn Heisey commented on LUCENE-5705:
--------------------------------------

Here's the summary of merge results for one of the indexes, after the rebuild 
was done:

{noformat}
IW 4 [Sun May 25 08:34:02 MDT 2014; Lucene Merge Thread #0]: merge time 38437 
msec for 413954 docs
IW 4 [Sun May 25 08:39:58 MDT 2014; Lucene Merge Thread #1]: merge time 34488 
msec for 411844 docs
IW 4 [Sun May 25 08:46:12 MDT 2014; Lucene Merge Thread #2]: merge time 6705 
msec for 61045 docs
IW 4 [Sun May 25 08:53:23 MDT 2014; Lucene Merge Thread #3]: merge time 54341 
msec for 623054 docs
IW 4 [Sun May 25 08:59:38 MDT 2014; Lucene Merge Thread #4]: merge time 9369 
msec for 88050 docs
IW 4 [Sun May 25 09:07:22 MDT 2014; Lucene Merge Thread #5]: merge time 53734 
msec for 625095 docs
IW 4 [Sun May 25 09:12:40 MDT 2014; Lucene Merge Thread #6]: merge time 10407 
msec for 95045 docs
IW 4 [Sun May 25 09:20:03 MDT 2014; Lucene Merge Thread #7]: merge time 47114 
msec for 560845 docs
IW 4 [Sun May 25 09:24:39 MDT 2014; Lucene Merge Thread #8]: merge time 5368 
msec for 46523 docs
IW 4 [Sun May 25 09:31:26 MDT 2014; Lucene Merge Thread #9]: merge time 51475 
msec for 619516 docs
IW 4 [Sun May 25 09:36:28 MDT 2014; Lucene Merge Thread #10]: merge time 9420 
msec for 88276 docs
IW 4 [Sun May 25 09:43:57 MDT 2014; Lucene Merge Thread #11]: merge time 47224 
msec for 563274 docs
IW 4 [Sun May 25 09:52:39 MDT 2014; Lucene Merge Thread #13]: merge time 8761 
msec for 68640 docs
IW 4 [Sun May 25 09:53:44 MDT 2014; Lucene Merge Thread #12]: merge time 266527 
msec for 4227876 docs
IW 4 [Sun May 25 09:56:07 MDT 2014; Lucene Merge Thread #14]: merge time 38959 
msec for 495135 docs
IW 4 [Sun May 25 10:06:31 MDT 2014; Lucene Merge Thread #15]: merge time 32033 
msec for 410559 docs
IW 4 [Sun May 25 10:14:07 MDT 2014; Lucene Merge Thread #16]: merge time 7521 
msec for 54797 docs
IW 4 [Sun May 25 10:21:12 MDT 2014; Lucene Merge Thread #17]: merge time 48044 
msec for 576053 docs
IW 4 [Sun May 25 10:27:41 MDT 2014; Lucene Merge Thread #18]: merge time 6843 
msec for 62448 docs
IW 4 [Sun May 25 10:34:33 MDT 2014; Lucene Merge Thread #19]: merge time 44991 
msec for 619962 docs
IW 4 [Sun May 25 10:40:08 MDT 2014; Lucene Merge Thread #20]: merge time 11078 
msec for 118848 docs
IW 4 [Sun May 25 10:46:40 MDT 2014; Lucene Merge Thread #21]: merge time 54392 
msec for 643896 docs
IW 4 [Sun May 25 10:52:17 MDT 2014; Lucene Merge Thread #22]: merge time 7091 
msec for 74945 docs
IW 4 [Sun May 25 11:00:10 MDT 2014; Lucene Merge Thread #23]: merge time 44073 
msec for 584655 docs
IW 4 [Sun May 25 11:09:57 MDT 2014; Lucene Merge Thread #24]: merge time 5769 
msec for 49129 docs
IW 4 [Sun May 25 11:21:50 MDT 2014; Lucene Merge Thread #25]: merge time 307003 
msec for 4128767 docs
IW 4 [Sun May 25 11:22:31 MDT 2014; Lucene Merge Thread #25]: merge time 41087 
msec for 463915 docs
IW 4 [Sun May 25 11:27:01 MDT 2014; Lucene Merge Thread #26]: merge time 12255 
msec for 107006 docs
IW 4 [Sun May 25 11:39:36 MDT 2014; Lucene Merge Thread #27]: merge time 44532 
msec for 618865 docs
IW 4 [Sun May 25 11:48:01 MDT 2014; Lucene Merge Thread #28]: merge time 8192 
msec for 82499 docs
IW 4 [Sun May 25 12:00:37 MDT 2014; Lucene Merge Thread #29]: merge time 54516 
msec for 775824 docs
IW 4 [Sun May 25 12:12:46 MDT 2014; Lucene Merge Thread #30]: merge time 9692 
msec for 101961 docs
IW 4 [Sun May 25 12:19:33 MDT 2014; Lucene Merge Thread #31]: merge time 51258 
msec for 732080 docs
IW 4 [Sun May 25 12:25:20 MDT 2014; Lucene Merge Thread #32]: merge time 11955 
msec for 124069 docs
IW 4 [Sun May 25 12:34:20 MDT 2014; Lucene Merge Thread #33]: merge time 57059 
msec for 743397 docs
IW 4 [Sun May 25 12:40:12 MDT 2014; Lucene Merge Thread #34]: merge time 7408 
msec for 71889 docs
IW 4 [Sun May 25 12:48:40 MDT 2014; Lucene Merge Thread #35]: merge time 47083 
msec for 628885 docs
IW 4 [Sun May 25 13:02:48 MDT 2014; Lucene Merge Thread #36]: merge time 282123 
msec for 4761885 docs
IW 4 [Sun May 25 13:02:58 MDT 2014; Lucene Merge Thread #36]: merge time 9565 
msec for 103121 docs
IW 4 [Sun May 25 13:11:26 MDT 2014; Lucene Merge Thread #37]: merge time 30681 
msec for 426626 docs
IW 4 [Sun May 25 13:20:44 MDT 2014; Lucene Merge Thread #38]: merge time 30638 
msec for 408589 docs
IW 4 [Sun May 25 13:28:14 MDT 2014; Lucene Merge Thread #39]: merge time 4735 
msec for 42766 docs
IW 4 [Sun May 25 13:36:49 MDT 2014; Lucene Merge Thread #40]: merge time 51305 
msec for 622337 docs
IW 4 [Sun May 25 13:45:10 MDT 2014; Lucene Merge Thread #41]: merge time 8094 
msec for 79872 docs
IW 4 [Sun May 25 13:52:23 MDT 2014; Lucene Merge Thread #42]: merge time 48678 
msec for 640757 docs
IW 4 [Sun May 25 13:59:05 MDT 2014; Lucene Merge Thread #43]: merge time 11398 
msec for 92616 docs
{noformat}

One problem with no merging is the number of open files.  The merges listed 
above, assuming that each one sees 35 segments merged down to one, that means 
that the index drops by 34 segments for each one, result in a net difference of 
nearly fifteen hundred *segments*, with 72 segments remaining when indexing 
finishes.  That's a LOT of files.  If I were to turn on useCompoundFile, it 
would greatly reduce the file count and make the idea manageable ... but I'm 
curious about whether the compound file results in lower real world performance.

I will attach the INFOSTREAM file that I grepped to get the output above.


> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This 
> causes problems for Solr's dataimport handler when very large imports are 
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, 
> the add/update thread will stop for several minutes while the largest merge 
> finishes.  In the meantime, the dataimporter JDBC connection to the database 
> will time out, and when the add/update thread resumes, the import will fail 
> because the ResultSet throws an exception.  Setting maxMergeCount to 6 
> eliminates this issue for virtually any size import -- although it is 
> theoretically possible to have that many simultaneous merge tiers, I've never 
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate 
> for most installations), I cannot think of a really good reason that the 
> default for maxMergeCount should be so low.  If someone does need to strictly 
> control the number of threads that get created, they can reduce the number.  
> Perhaps someone with more experience knows of a really good reason to make 
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid 
> bikeshedding.  I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to