[ 
https://issues.apache.org/jira/browse/LUCENE-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546915#comment-13546915
 ] 

Shawn Heisey commented on LUCENE-4661:
--------------------------------------

I have a question about this - both for myself and for a message on the 
solr-user mailing list today.

If you are importing millions of records from MySQL (or another DB) with DIH, 
eventually you'll reach a point where you've got multiple merge levels 
happening at the same time, which will stop indexing of new data long enough 
that the JDBC connection to the DB will time out.

Is it enough in that situation to increase maxMergeCount, or do you also have 
to increase maxThreadCount?  I have changed both, but if I only need to 
increase maxMergeCount and thus get the benefit of this issue, that would be 
awesome:

{noformat}
  <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
    <int name="maxThreadCount">4</int>
    <int name="maxMergeCount">4</int>
  </mergeScheduler>
{noformat}

                
> Reduce default maxMerge/ThreadCount for ConcurrentMergeScheduler
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4661
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4661
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.1, 5.0
>
>
> I think our current defaults (maxThreadCount=#cores/2,
> maxMergeCount=maxThreadCount+2) are too high ... I've frequently found
> merges falling behind and then slowing each other down when I index on
> a spinning-magnets drive.
> As a test, I indexed all of English Wikipedia with term-vectors (=
> heavy on merging), using 6 threads ... at the defaults
> (maxThreadCount=3, maxMergeCount=5, for my machine) it took 5288 sec
> to index & wait for merges & commit.  When I changed to
> maxThreadCount=1, maxMergeCount=2, indexing time sped up to 2902
> seconds (45% faster).  This is on a spinning-magnets disk... basically
> spinning-magnets disk don't handle the concurrent IO well.
> Then I tested an OCZ Vertex 3 SSD: at the current defaults it took
> 1494 seconds and at maxThreadCount=1, maxMergeCount=2 it took 1795 sec
> (20% slower).  Net/net the SSD can handle merge concurrency just fine.
> I think we should change the defaults: spinning magnet drives are hurt
> by the current defaults more than SSDs are helped ... apps that know
> their IO system is fast can always increase the merge concurrency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to