Hi,

I am debugging a bulk indexing performance issue while upgrading to 6.6
from 4.5.0 . I have commits disabled while indexing total of 85G data
during 7 hours. At the end of it, I want some 30 or so big segments. But i
am getting 3000 segments.
I deleted the index and enabled infostream logging ; i have attached the
log when first segment is flushed. Here are few questions:

1. When a segment if flushed , then is it permanent or can more documents
be written to it (besides the merge scenario)?
2. It seems that 330+ threads are writing in parallel. Will each one of
them become one segment when written to the disk? In which case, i should
probably decrease concurrency?
3. One possibility is to delay flushing, the flush is getting triggered at
10000MB, probably coming from <ramBufferSizeMB>10000</ramBufferSizeMB> ;
however, the segment which is flushed is only 115MB. Is this limit for the
combined size of all in-memory segments? In which case, is it ok to
increase it further to use more of my heap (48GB).
4. How can I decrease the concurrency, maybe the solution is to use fewer
in memory segments?

In previous run, there were 110k files in the index folder after I stopping
indexing. Before doing commit, I noticed that the file count continued to
decrease every few minutes, until it reduced to 27k or so. (I committed
after it stabilized)


My Indexconfig is this:

  <indexConfig>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
    <maxIndexingThreads>10</maxIndexingThreads>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>10000</ramBufferSizeMB>
  <mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
  <int name="maxMergeAtOnce">5</int>
     <int name="segmentsPerTier">3000</int>
      <int name="maxMergeAtOnceExplicit">10</int>
      <int name="floorSegmentMB">16</int>
      <!-- 200 gb since we want few big segments during full indexing -->
      <double name="maxMergedSegmentMB">200000</double>
      <double name="forceMergeDeletesPctAllowed">1</double>
    </mergePolicyFactory>
     <mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
       <int name="maxThreadCount">10</int>
       <int name="maxMergeCount">10</int>
     </mergeScheduler>
    <lockType>${solr.lock.type:native}</lockType>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
      <str name="maxCommitsToKeep">1</str>
      <str name="maxOptimizedCommitsToKeep">0</str>
    </deletionPolicy>
    <infoStream>true</infoStream>
    <applyAllDeletesOnFlush>false</applyAllDeletesOnFlush>
  </indexConfig>


Thanks
Nawab
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to