Mike, > > But... how long does step 2 take? Is it an option to not commit on > every update? How many docs do you typically update?
I do not commit on every update, I call commit once every 10k documents. Indexing 10k docs takes around 10 secs. > > If you are committing only so that an outside reader can reopen, you > should consider just using an NRT reader instead (assuming the reader > is in same JVM as IndexWriter). My service is just an indexer, I don't need a reader. The new segments are pushed to a searcher box after each commit. > > Roughly how much more RAM consumption do you see when you force pooling? pooling not forced -> memory after explicit GC: 50 MB pooling forced -> memory after explicit GC: 250MB Thank you for opening the JIRA issue. Bogdan > > Mike > > On Fri, Mar 5, 2010 at 9:18 AM, Bogdan Ghidireac <bog...@ecstend.com> wrote: >> Hi, >> >> I have an index with 100 million docs that has around 20GB on disk and >> an update rate of few hundred docs per minute. The new docs are >> grouped in batches and indexed once every few minutes. My problem is >> that the update performance degraded too much over time as the index >> increased in size (distinct docs). >> >> My indexing flow looks like this .. >> >> 0. create indexWriter (only once) >> 1. get the open indexWriter >> 2. for each doc call indexWriter.updateDocument(pkTerm, doc) >> 3. indexWriter.commit >> 4. indexWriter.waitForMerges >> 5. wait for new docs and goto 1. >> >> I ran a profiler for several minutes and I noticed that most of the >> time the indexer is busy applying the deletes. This takes so much time >> because all terms are loaded for every commit (see the attached >> profiler screenshot). >> >> The index writer has a pool or readers but they are not used unless >> near real time is enabled. I changed my code to force the pool to be >> used but the only way I can do this is to request a reader that is >> never used writer.getReader(). Of course, the memory consumption is >> higher now because I have terms in memory but the steps 3+4 compete in >> 1-2 secs compared to 8-10 secs. >> >> Is is possible to enable the readers pool at the IndexWriter >> constructor level? My current method looks like a hack ... >> I am using Lucene 2.9.2. on Linux. >> >> Regards, >> Bogdan >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org