Hmm, your merge policy changes are dangerous: that will cause too many segments in the index, which makes it longer to apply deletes.
Can you revert that and re-test? I'm not sure why DIH is using updateDocument instead of addDocument ... maybe ask on the solr-user list? Mike McCandless http://blog.mikemccandless.com On Thu, Jul 28, 2016 at 10:07 AM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > Currently I use concurrent DIH but will write some SolrJ for testing > or even as replacement for DIH. > Don't know whats behind DIH if only documents are added. > > Not tried any newer release yet, but after reading LUCENE-6161 I really > should. > At least a version > 5.1 > May be before writing some SolrJ. > > > Yes IndexWriterConfig is changed from default: > <indexConfig> > <maxIndexingThreads>8</maxIndexingThreads> > <ramBufferSizeMB>1024</ramBufferSizeMB> > <maxBufferedDocs>-1</maxBufferedDocs> > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> > <int name="maxMergeAtOnce">8</int> > <int name="segmentsPerTier">100</int> > <int name="maxMergedSegmentMB">512</int> > </mergePolicy> > <mergeFactor>8</mergeFactor> > <mergeScheduler > class="org.apache.lucene.index.ConcurrentMergeScheduler"/> > <lockType>${solr.lock.type:native}</lockType> > ... > </indexConfig> > > A unique id as example: "ftoxfordilej:ar.1770.x.x.13.x.x.u1" > Somewhere between 20 and 50 characters in length. > > Thanks for your help, > Bernd > > > Am 28.07.2016 um 15:35 schrieb Michael McCandless: > > Hmm not good. > > > > If you are really only adding documents, you should be using > > IndexWriter.addDocument, which won't buffer any deleted terms and that > > method call should be a no-op. It also makes flushes more efficient > since > > all of your indexing buffer goes to the added documents, not buffered > > delete terms. Are you using updateDocument? > > > > Can you reproduce this slowness on a newer release? There have been > > performance issues fixed in newer releases in this method, e.g > > https://issues.apache.org/jira/browse/LUCENE-6161 > > > > Have you changed any IndexWriterConfig settings from defaults? > > > > What are your unique id fields like? How many bytes in length? > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > On Thu, Jul 28, 2016 at 5:01 AM, Bernd Fehling < > > bernd.fehl...@uni-bielefeld.de> wrote: > > > >> While trying to get higher performance for indexing it turned out that > >> BufferedUpdateStreams is breaking indexing performance. > >> public synchronized ApplyDeletesResult applyDeletesAndUpdates(...) > >> > >> At IndexWriterConfig I have setRAMBufferSizeMB=1024 and the Lucene > 4.10.4 > >> API states: > >> "Determines the amount of RAM that may be used for buffering added > >> documents and deletions before they are flushed to the Directory. > >> Generally for faster indexing performance it's best to flush by RAM > >> usage instead of document count and use as large a RAM buffer as you > can." > >> > >> Also setMaxBufferedDocs=-1 and setMaxBufferedDeleteTerms=-1. > >> > >> BD 0 [Wed Jul 27 13:42:03 GMT+01:00 2016; Thread-27890]: applyDeletes: > >> infos=... > >> BD 0 [Wed Jul 27 14:38:55 GMT+01:00 2016; Thread-27890]: applyDeletes > took > >> 3411845 msec > >> > >> About 56 minutes no indexing and only applying deletes. > >> What is it deleting? > >> > >> If the index gets bigger the time gets longer, currently 2.5 hours of > >> waiting. > >> I'm adding 96 million docs with uniq id, no duplicates, only add, no > >> deletes. > >> > >> Any suggestions which config is _really_ going for high performance > >> indexing? > >> > >> Best regards, > >> Bernd > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >