[ 
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008395#comment-14008395
 ] 

Shawn Heisey commented on LUCENE-5705:
--------------------------------------

bq. but it sounds like you're doing a large initial import from JDBC? Maybe you 
only do a single import even?

I do imports on all of my shards at once.  There are 96 million docs total 
right now, growing at a rate of a few million per year.  One shard (which we 
call the incremental, but is better known as a "hot" shard) has only the newest 
few hundred thousand docs in it.  The rest of the docs are split between the 
other six shards with a MySQL CRC32 calculation on the database primary key, 
modulo 6.  In production, each of two servers holds three cold shards, the 
second server also holds the hot shard.  This means that for the long haul of a 
full rebuild, each server is doing three imports at the same time.  This is not 
running in cloud mode.

DIH is *only* used for full rebuilds.  I have a SolrJ program that does normal 
updates and starts/monitors/finishes the rebuilds.

A strong possibility right now is that with my current settings (4.7.2, 
explicitly configured with TMP, CMS, and an effective mergeFactor of 35), I 
would no longer run into this issue on my production hardware that has fairly 
fast disks.  Grepping for the "merge time" lines so far only shows the one 
overlap which I pasted above.

With the more frequent merging inherent in the default mergeFactor of 10, other 
users might have a bigger chance of running into a problem, especially with 
single or mirrored 7200RPM disks.

My dev hardware has slower disks (7200RPM RAID1) and houses all seven shards on 
one server.  Rebuilds take nearly twice as long there as they do on the 
production hardware - rebuilds on that hardware are definitely I/O bound.  The 
infostream that is now building is being done on the production hardware.


> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This 
> causes problems for Solr's dataimport handler when very large imports are 
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, 
> the add/update thread will stop for several minutes while the largest merge 
> finishes.  In the meantime, the dataimporter JDBC connection to the database 
> will time out, and when the add/update thread resumes, the import will fail 
> because the ResultSet throws an exception.  Setting maxMergeCount to 6 
> eliminates this issue for virtually any size import -- although it is 
> theoretically possible to have that many simultaneous merge tiers, I've never 
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate 
> for most installations), I cannot think of a really good reason that the 
> default for maxMergeCount should be so low.  If someone does need to strictly 
> control the number of threads that get created, they can reduce the number.  
> Perhaps someone with more experience knows of a really good reason to make 
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid 
> bikeshedding.  I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to