Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

Michael McCandless Thu, 04 Aug 2011 10:27:07 -0700

Indeed, from the log fragment I can see the merges are just really
slow.  You had 6 merges run:


IW 0 [Wed Aug 03 22:43:24 CEST 2011; Lucene Merge Thread #0]: merged
segment size=1234.550 MB vs estimate=1300.063 MB
IW 0 [Thu Aug 04 00:15:54 CEST 2011; Lucene Merge Thread #4]: merged
segment size=740.168 MB vs estimate=780.602 MB
IW 0 [Thu Aug 04 00:29:49 CEST 2011; Lucene Merge Thread #1]: merged
segment size=1165.862 MB vs estimate=1224.516 MB
IW 0 [Thu Aug 04 00:39:36 CEST 2011; Lucene Merge Thread #5]: merged
segment size=899.690 MB vs estimate=943.422 MB
IW 0 [Thu Aug 04 00:39:52 CEST 2011; Lucene Merge Thread #3]: merged
segment size=1046.637 MB vs estimate=1097.111 MB
IW 0 [Thu Aug 04 01:07:04 CEST 2011; Lucene Merge Thread #2]: merged
segment size=1281.083 MB vs estimate=1340.087 MB

And the times are long:

IW 0 [Wed Aug 03 22:43:25 CEST 2011; Lucene Merge Thread #0]: merge
time 4194615 msec for 744793 docs
IW 0 [Thu Aug 04 00:15:55 CEST 2011; Lucene Merge Thread #4]: merge
time 6461433 msec for 1205717 docs
IW 0 [Thu Aug 04 00:29:50 CEST 2011; Lucene Merge Thread #1]: merge
time 9783566 msec for 1472419 docs
IW 0 [Thu Aug 04 00:39:38 CEST 2011; Lucene Merge Thread #5]: merge
time 7209832 msec for 1468231 docs
IW 0 [Thu Aug 04 00:39:53 CEST 2011; Lucene Merge Thread #3]: merge
time 8662995 msec for 1699997 docs
IW 0 [Thu Aug 04 01:07:04 CEST 2011; Lucene Merge Thread #2]: merge
time 11197195 msec for 1944231 docs

Though, for all but the first merge, the times include the "paused"
time, so it's not a real measure of how long the merge took.  Still,
4195 seconds to merge to a ~1300 MB merged segment is really quite
long, but I think one big reason here is you are allowing too many
merge threads at once.

I would set CMS.setMaxThreadCount(1) and CMS.setMaxMergeCount(2), and
I would lower the number of indexing threads to 2.  I think you IO
system is a big bottleneck here, not only because of merging and
flushing but also because presumably the source of the docs is on this
same single laptop spinning drive right?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Aug 3, 2011 at 7:31 PM, Devon H. O'Dell <[email protected]> wrote:
> For what it's worth, I've seen this happen too (using the stock Lucene
> 3.3 Java APIs), but it requires me to index many millions of
> documents, and doesn't start being a really big problem until the
> indexes get to be closer to 250GB in size. When they reach around 1TB,
> it will take around an hour for the merge to complete (which is
> frustrating). Similar to Pierre-Henri, I see virtually no disk I/O
> when it happens and the system in question is one of the Amazon EC2
> "Huge" instances (so, something like 8 cores and 32GB RAM) and disk
> I/O during indexing pushes around 100MB/s.
>
> If it would be useful to see additional reports / information from
> this scenario, I'm sure I can get something put together.
>
> --dho
>
> 2011/8/3 Pierre-Henri Toussaint <[email protected]>:
>> OK so the problem definitely comes from the slow merging.
>> I slightly increased the number merge count and thread to avoid the problem
>> described previously. But as expected, it just delayed it !
>>
>> results : 75 minutes to index the 33GB xml file, and 150 minutes to finish
>> the merge after indexer.close.
>> See uploaded  http://lucene.472066.n3.nabble.com/file/n3223874/slowmerge log
>> file  containing: logs (timems:numberofdocsindexed/current_title) +
>> infoStream + random threaddump.
>> You can spot "indexer.close (no optimize)" (line 5721) for indexing
>> completion and the beginning of merging nightmare.
>>
>> *conf :
>> */conf.setRAMBufferSizeMB(512);
>> ConcurrentMergeScheduler mergeScheduler = new ConcurrentMergeScheduler();
>> mergeScheduler.setMaxMergeCount(6);
>> mergeScheduler.setMaxThreadCount(4);
>> conf.setMergeScheduler(mergeScheduler);
>> writer = new ThreadedIndexWriter(directory, analyzer, true, 2, 5, conf);/
>>>>everything else default. no optimize called
>> *documents :
>> */pageDocument.add(new Field("title", page.getTitle(), Field.Store.YES,
>> Field.Index.NO));
>> pageDocument.add(new Field("text", page.getText(), Field.Store.NO,
>> Field.Index.ANALYZED));
>> if (page.getContributorUserName() != null)
>> pageDocument.add(new Field("contributorUserName",
>> page.getContributorUserName(), Field.Store.NO, Field.Index.ANALYZED));/
>> *infoStream info :*
>> setInfoStream
>> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@2dafae45
>> dir=org.apache.lucene.store.NIOFSDirectory@/Users/ptoussaint/Documents/workspace/wikisearch/index2
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@39dd3812
>> index=
>> version=4.0-SNAPSHOT
>> matchVersion=LUCENE_40
>> analyzer=org.pache.soundcloud.wikisearch.Indexer$WikiAnalyzer
>> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
>> commit=null
>> openMode=CREATE_OR_APPEND
>> similarityProvider=org.apache.lucene.search.DefaultSimilarityProvider
>> termIndexInterval=32
>> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler
>> default WRITE_LOCK_TIMEOUT=1000
>> writeLockTimeout=1000
>> maxBufferedDeleteTerms=-1
>> ramBufferSizeMB=512.0
>> maxBufferedDocs=-1
>> mergedSegmentWarmer=null
>> codecProvider=org.apache.lucene.index.codecs.CoreCodecProvider@6a8c436b
>> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
>> maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0,
>> expungeDeletesPctAllowed=10.0, segmentsPerTier=10.0, useCompoundFile=true,
>> noCFSRatio=0.1
>> indexerThreadPool=org.apache.lucene.index.ThreadAffinityDocumentsWriterThreadPool@1e9e5c73
>> readerPooling=false
>> readerTermsIndexDivisor=1
>> flushPolicy=org.apache.lucene.index.FlushByRamOrCountsPolicy@2ec791b9
>> perThreadHardLimitMB=1945
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Thread-locking-while-merging-ConcurrentMergeScheduler-issue-tp3222427p3223874.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

Reply via email to